I frequently get questions about what's really happening under the hood with complex pipelines and AI/data systems. I'm planning a series of posts to address this. 

Let's talk about infrastructure

Before we dive into code and architecture choices, I think we need to talk about infrastructure. I've written about this in the past as well, start-up full-stack developers these days are also expected to dabble a bit in the hosting part of the application.

Cloudy: the public cloud

The public cloud is like IKEA for developers. Everything looks simple, but I wouldn't recommend it for couples therapy. The public cloud is a mystical beast for some, but it's the home for what they call cloud-native applications. For start-ups, there's a sweet spot to be found here for several reasons.

1  Money. Every start-up needs its cash. Luckily, the cloud companies are fighting for your wallet like Tinder dates offering free drinks. It's not uncommon for a start-up to host for free for the first 2 years using credits and free tiers. Although that might sometimes mean to burn 10k of credits using the most expensive servers, just to get the new tier of credits (looking at you Azure).

2  Managed. Every start-up needs its cash. Remember when you needed to pay an Ops person to configure SSL, backups, and database clusters? Now it’s a dropdown menu. The cloud will do it for you. The only thing it won’t manage is your expectations.

3 Money & managed. Wait, I just covered those. Yes, but combined, they make even more sense because the managed services can be significantly cheaper than the vanilla services.

The cloud-native mindset

Cloud-native requires a certain method or mindset. I've seen companies try to lift and shift their on-prem application to the cloud and expect a spiritual awakening. Afterwards, they complain that they pay twice as much for the same pain in a shinier console. It’s like upgrading from a Renault Twingo to a Ferrari, but insisting they keep the Twingo’s engine. Congratulations, you’ve built the world’s loudest lawnmower. That’s not a cloud migration, that’s a midlife crisis on wheels.

With: containerization 

With Docker.

It’s the seatbelt of deployment. Optional until the crash.

If you’re not using Docker yet, I want you to go outside, take a long walk, and rethink your life choices. And that means something coming from a Python enthusiast. Python developers love their virtual environments. However, I'm here to say that Docker will help any start-up playing the cloud game. Create a new service? Dockerise it. Always.

Docker gives you portability and runtime sanity. It’s your golden ticket out of vendor hell. Azure gives you a headache? Move your container. Kubernetes gives you an aneurysm? Move your container. Your local dev machine is a Mac, your server is Linux, your intern runs Windows on a microwave? Docker doesn’t care about your life's choices. It just runs.

A chance: on a hands-off cloud

Not a gift.

The cloud gives you a chance to focus on the things that matter, like the features in your code, while it takes care of the specialised stuff like scalability, high availability, security, data integrity, and some other -y's.

With great power comes great invoice detail. The cloud is notoriously known for its gateway to incur crazy bills. There's budget alerts, but there are not always hard stops. Wrong configuration can lead to immense costs. But that's not something that happens overnight, and you have to really ignore all the warning signs before reaching that.

OF

OF. Oh f*ck. The database is down again.

Before the age of the cloud, I remember my OF moments. Whenever the application went down, there was a 90% chance that it was due to the database. The database wasn’t a service; it was a personality test.

We tried read replicas, great in theory until replication lag turned every query into a Russian roulette. We tried multi-primary clusters, any node could take writes, which just meant all of them were on fire at the same time. In the end, I had to realise we were duct-taping hope onto scaling issues.

Now, I press a few buttons in the managed service menu, and I get the database I want, with all the scalability mentioned before, taken care of by the best industry experts in the world.

  • For SQL databases you can look at Amazon RDS, Azure SQL Database or Google Cloud SQL.
  • For caches like Redis; Amazon ElastiCache (also Memcache), Azure Cache for Redis, Google Memorystore for Redis/Memcache; are your friends.
  • For files there's Amazon S3, Azure Blob Storage and Google Cloud Storage. 
  • For documents there's Amazon OpenSearch (ElasticSearch), Amazon DocumentDB (MongoDB), Amazon Keyspaces (Cassandra), Azure Managed OpenSearch (preview), MongoDB Atlas on Azure, Azure Managed Instance for Apache Cassandra, Google Elastic Cloud, Google MongoDB Atlas and Google AstraDB (Cassandra).
  • For message queues there's Amazon MSK (Kafka), Amazon MQ (RabbitMQ), Azure Managed Kafka, Azure Managed RabbitMQ, Google Confluent Cloud (Kafka) and Google Cloud RabbitMQ.
  • For data pipelines there's Amazon Managed Workflows for Apache Airflow, Amazon Glue (Spark), Azure Data Factory Managed Airflow, Azure Databricks (Spark), Google Cloud Composer (Airflow), Google Dataproc (Apache Spark, Hadoop, Flink).

Those are all generic, but managed services giving you the same portability advantage as our Docker yields. You can use it and port it somewhere else.

Now it's time to make things more complex. Each of the big players offers even cheaper, better, stronger, specialised managed services, but with vendor lock-in. Removing the portability advantage, but giving an advantage in time to market. For example, Amazon DynamoDB, Step Functions, Redshift, SQS, Sagemaker, and Lambda. Azure CosmosDB, Data Factory, Event Grid, Service Bus, AI Search, Functions. Google BigQuery, Dataflow, Vertex AI and Cloud Functions.

How do you make that puzzle?

First of all, it's a business decision mainly related to compliance. Does the cloud provider offer the regions you need? Or do your customers have a veto for Microsoft? Enterprise love for Azure is a form of Stockholm Syndrome. But now with Teams integration for the kidnapper's daily stand-up. Hell just became cloud-ready. In many cases, it may be decided to stick to one (or two for specific services) cloud provider(s).

Just add Kubernetes for extra pain. Eternal suffering scales horizontally.

 

Just add Kubernetes for extra pain. Eternal suffering scales horizontally.

Some vendor-specific services offer Docker (I told you to use it before, right?) support. For example, AWS Lambda and Sagemaker are very vendor-specific, but both can use your codebase through a Docker image, lowering the portability disadvantage by A LOT.

Time to market. For start-ups speed is most important. Setup that AWS Step Functions, get your product out there. If you need to migrate away from it later, that's still a fair price to pay for the speed it has offered you before.

Function calls: serverless FaaS

We've talked about data storage solutions, but what compute options are there?

Serverless. For start-ups, proof of concepts, MVPs, expeditions, start looking at serverless compute using Docker images (yes, again). You don't have to think about auto-scaling, container pods or other forms of digital self-harm. Upload the image, spin it up (AWS Lambda, Azure Functions, Google Cloud Functions) and the cloud provider will scale it for you. Cheap.

To put cost into perspective: I recently created a multi-step pipeline (5 steps per bird) to ingest and process all 11000 birds of the world using AWS Lambda & Step Functions. Lambda used over a million gigabyte-seconds. After the monthly free-tier discounts, I paid $11 for Lambda and $2 for Step Functions.

Yes, the internet is full of people complaining about serverless.

  • "Serverless has limitations"
  • "Memory capped"
  • "It's 1 function per instance, it's micro-service hell"
  • "Cold starts"
  • "No state"  hurr durr.

But those are just the kind of people who still name their servers after Pokémon.

Yes, serverless is often capped, for example, 10GB memory per invocation. But I have scaled big data operations over AWS Lambda. I can do a lot with 10GB of memory.

Yes, serverless is stateless. Stateless is not a limitation; it’s a lifestyle choice. State is where bugs go to retire.

Yes, there's cold starts. But if you can develop a bit, you can make your Docker lightweight and only a fraction ever experiences a cold start.

No, it's not a micro-service tool. You can fit an entire FastAPI app into a single function (hello Mangum). Spin it up on AWS Lambda, Azure Functions, or Google Cloud Functions. Your Docker image (yes, Docker again) becomes a portable, stateless little miracle.

Eventually, you’ll grow. You might outgrow serverless, just like you outgrew your monolith, your ORM, and your optimism. That’s fine. Take your same Docker image and throw it at a container service like ECS or whatever managed Kubernetes flavour (EKS/AKS) if you're a masochist. Just remember: the more configuration options a cloud service gives you, the more chances you have to ruin it.

The cloud is both the best and worst thing to happen to startups. It gives you infinite power and infinite ways to shoot yourself in the foot.

Use it wisely. Pick managed services where you can. Dockerise everything. And when someone tells you their on-rem, custom-created orchestration, AI pipeline is production-ready; smile kindly and prepare your “OF” face.