Jun 14, 2026

Shipping a Public AI Service Without Going Broke

A case study on deploying a public, no-sign-up AI service and keeping it secure, well-behaved, and running at a fraction of a cent per story.

engineering•case-studyAIarchitecture

Available for senior roles & selected contracts — email me

Most of my posts here explain a concept. This one is different. It is a case study of something I built: Magic AI Story, a web app that turns a free-text description into an illustrated story. The hard part was not getting the AI to write a story. The hard part was the constraint nobody warns you about: how do you put a generative-AI service on the public internet, with no sign-up wall, and not wake up to a drained API budget and a pile of content you would not want your name on?

So let's talk about the real work: keeping a public AI service secure, well-behaved, and cheap to run as it grows.

Three Goals That Fight Each Other

Here is the tension that shaped every decision. A public AI service pulls in three directions at once, and you cannot fully satisfy all three:

Open access. No sign-up, no friction. Land on the page, type an idea, get a story. That is the whole point. The moment you add an account wall, half your visitors leave before they see what it does.
Cost control. Every story still costs something, namely three image generations each. Multiply that by 'anyone on the internet, as many times as they like' and even cheap calls add up fast.
Abuse prevention. An open box that takes free text and returns AI-generated images is an invitation. Some people will try to make it produce things you do not want your name attached to. Others will just hammer it for fun.

Pick any two and the third one bites you. Go open and cheap with no abuse controls and you are hosting a content-moderation incident. Keep it open and safe but skip the cost ceiling and you are funding strangers' fan fiction until your card declines. Make it safe and cheap behind a login and nobody uses it. The whole build was really about refusing to fully give up any of the three.

Keeping It Affordable

I'll lead with the number I'm proud of. A fully illustrated story (parsed input, a written narrative, and three generated images) costs me about half a cent to produce. Fractions of a penny per story, and the whole thing is built so that number stays low as traffic grows. The secret is which models you call and how you call them.

All the language work, meaning parsing the user's input, writing the story, and building the image prompts, runs on Nemotron via OpenRouter, which happens to be free. For 'write a charming tale about a dragon who's afraid of heights' you do not need a frontier model, and a capable open model that costs nothing does the job well. That single choice removes what would normally be the bulk of the per-story bill.

The only thing I actually pay for is the pictures. Image generation runs on Flux Dev through Runware, and each story produces three images. The results are not always perfect, but for the price it is a solid little model. That is the entire variable cost, and it is tiny. Everything else (the database, the rate-limiting, the hosting) adds very little.

I think about it as cost per story rather than a monthly bill because per-story economics is the number that scales with you. A flat monthly figure is meaningless the moment traffic moves. Cost per unit of value tells you whether the thing stays sustainable at ten stories a day or ten thousand. Get that number low and growth becomes a good problem instead of a budget emergency.

But cheap models are only half of it. The other half is not wasting the calls you do make:

One LLM does triple duty (input parsing, narrative generation, and image-prompt crafting) instead of a separate specialised service for each.
Prompts and generation parameters are tuned to get a usable result on the first try, because every retry is wasted budget.
The pipeline is built to fail fast. If the input is junk, it gets caught before it ever reaches the image-generation step.

Keeping the Budget From Falling Off a Cliff

Cheap-per-story is fine right up until someone decides to generate ten thousand of them. With no sign-up, I had no user accounts to attach limits to, so per-account quotas were off the table.

The answer was a rate-limiting layer keyed on the anonymous visitor, backed by Redis. Redis fits here because rate-limiting is all short-lived counters with expiry, exactly what an in-memory store with TTLs is built for, and far cheaper and faster than hitting a real database on every request. A visitor gets a sensible allowance, and once they hit it they are asked to come back later. No account required, but also no way for one person to empty the budget. It protects the wallet and keeps the front door open.

Keeping It Secure

'Secure' for a public AI service has very little to do with firewalls. What matters are the specific ways an open generative endpoint can be turned against you. I treated it as a list of concrete attacks, each with its own answer:

Garbage or malicious input. Free text is the wild west. Everything the user types gets parsed and validated before it can drive a generation, so nonsense gets rejected at the door rather than turned into images.
Inappropriate content. This was the one that kept me honest. The same LLM that powers the experience also does a moderation pass, and the prompt design steers generation toward the wholesome and away from the unpleasant. The goal was a real balance: enough creative freedom that the stories are fun and surprising, enough guardrail that the service will not produce something it should not.
Repeat abusers. Rate-limiting slows people down, but the determined ones need a harder stop. So there is a blacklist, a persistent ban backed by Supabase (Postgres), for cutting off bad actors entirely. Postgres is the right call here precisely because blacklist entries are the opposite of rate-limit counters: they need to be durable and stick around, not expire in an hour.
Flying blind. You cannot protect what you cannot see, so analytics are wired in to watch usage patterns, both to spot abuse and to learn what people actually do with the thing.

Notice the split: Redis for the short-lived stuff (rate limits), Postgres for the durable stuff (bans). Same goal of protecting the service, but matching the storage to the lifetime of the data keeps each part simple and cheap.

The Hard Part: Unpredictable Output

If I had to point at the single trickiest problem, it had nothing to do with cost or abuse. It was this: AI output is unpredictable by nature, and the rest of my app is code that expects structure. I'm asking a creative, free-wheeling model to produce something I then have to parse, validate, slot into a UI, and hand to an image generator. Creativity and reliable structure are not natural friends.

Squaring that came down to three things working together: careful prompt engineering to push the model toward a consistent shape, tuned generation parameters to keep it from wandering too far, and solid error handling for the times it wanders anyway. You treat the model as a component that will occasionally hand you something weird, and you build the surrounding system so that 'weird' degrades cleanly instead of crashing the experience. You are not eliminating the chaos, you are containing it.

What I'd Tell Past Me

The thing I underestimated going in was how much of 'building an AI product' is not AI work. The model call is maybe a tenth of the effort. The other nine-tenths is everything around it: a rate limiter so you stay solvent, moderation so the output stays respectable, validation so malformed responses do not break the UI, and a ban list for the inevitable bad actor. The model call gets all the attention, but it is the surrounding engineering that decides whether the thing survives contact with real users.

If you are building something similar, my one piece of advice is to design for the adversarial, broke, worst-case version of your service from day one. Assume someone will abuse it, assume the calls that cost money will get hammered, assume the model will misbehave, and let those assumptions shape the architecture. It is a lot cheaper than retrofitting them after the first nasty surprise.

Want to see the result? Give it a go. And if you would rather not deal with all this yourself, get in touch and I might be available to build it for you.

Running DeepSeek Locally: A Step-by-Step Guide

Learn how to install DeepSeek locally in just a few simple steps. Secure, private, and free.

Jan 11, 2025Read more