Writing Python clients in the age of LLMs

Writing a Python REST API client used to feel like one of those chores that started small and then quietly took over your life. I am here talking about the Before Times, when a transformer was still a gray box that stepped voltages, and when you could impress people with the code you wrote, with your own meat fingers. Back in these antediluvian and somewhat idyllic times, it started as follows: at first you just need one function for one route. Fine. You write it by hand. Maybe you grab a curl command from the docs and run it through something like curlconverter. You get a bit of Python out, clean it up, and move on.

Then the second route arrives. Then the third. Then you need authentication, retries, pagination, error handling, timeouts, and some way to organize all this stuff so it does not become a junk drawer full of ad hoc helper functions. Soon enough you are building modules, wrappers, shared request helpers, little abstractions around verbs, maybe some thin layer over requests or httpx. You begin by writing a client. A few weeks later you realize you have accidentally started writing a client framework.

At some point, if you have done this more than once, the obvious realization hits: most of this work is boilerplate. It is mechanical. It is not something you should be spending your time (or mental effort) on. So naturally the next stop is OpenAPI. If the server can describe itself with a spec, surely a machine should be able to generate the client. In theory this sounds like the mature answer. In practice, it is often where a different kind of pain begins.

The open source generator ecosystem looks promising at first glance. OpenAPI Generator in particular seems like the sensible choice: free, widely used, lots of targets, lots of switches. But the lived experience can be rough. Sometimes it just fails to build the client, complaining about something in the spec you don't understand. Sometimes you get inscrutable Pydantic errors. Sometimes you do not want Pydantic at all and discover that the tool has very strong opinions about what your Python client API should look like. And even when it works, the generated output can be horrible, horrible code: tens of thousands of lines for a tiny API surface, spread across a forest of files that no human would ever choose to write.

The debugging story is often worse than the code generation story. When something goes wrong (and something always will), what you want is very simple: what request was actually sent, what response actually came back, and where the transformation failed. Instead you may get a cryptic stack trace from deep inside a generated model layer, or a validation error that tells you almost nothing about the real problem. The payload is obscured. The actual request/response cycle is hard to inspect. You are trying to debug a network interaction, but the tool has helpfully inserted three or four extra layers of indirection between you and the wire.

Commercial generators such as Speakeasy and Stainless improve on some of this. The output is often nicer. There is more customization. The ergonomics are better in spots. There is a human being you can talk to when something breaks. But the basic feeling remains clunky. There is still a lot of machinery. There is still latency in the workflow. Something that should take milliseconds starts to feel oddly ceremonial. And because these are companies, not just tools, they are incentivized to keep expanding the surface area: dashboards, workflows, managed experiences, and now of course AI features, often bolted on in ways that break in new and interesting ways.

What changed recently is not that client generation became possible. It always was. What changed is that large language models made it cheap to generate the exact client you want, rather than the client some generator author imagined would work for everyone. Once you recognize that a Python API client is mostly a mechanical transcription problem, the right move stops being "pick the least annoying generator" and starts being "describe the client precisely and have the machine write it."

That is a subtle but important shift. With an LLM, you can specify the client in terms that actually matter to you. Maybe you do not want async code (because you're working inside of Jupyter and you don't want to deal with the delight that is the now dead nest_asyncio). Maybe you want httpx but not requests. Maybe you want modules split by resource, or by domain, or by stability tier. Maybe you want a thin client with minimal modeling, that traffics in raw JSON, or a heavier one with typed request/response objects. Maybe you want retries handled in one place, authentication injected from the environment, and debug mode to log raw request and response bodies when something fails. All of that is expressible -- you start asking a LLM to do this, and to do that, as your preferences crystallize.

The next step is to stop treating those preferences as one-off prompts and start treating them as design constraints. You write them down. You decide how authentication should work. Decide where auth tokens come from. Decide what transport library to use. Decide how retries behave, what counts as a retriable error, how timeouts are configured, how exceptions are surfaced, how modules are organized, what the user-facing API should feel like, and even what functions should be called. Once you have that document, the LLM can refer back to it every time a route is added or updated, and the codebase stays coherent.

This matters because client generation is not a continuously running industrial pipeline in most real projects. Typically, you do 90% of the work once. After that, the maintenance is incremental: a new route here, a schema change there, one auth flow adjusted, one endpoint deprecated. In that world, a heavyweight generator starts to look less like infrastructure and more like overhead. If an LLM can produce the code directly in the style you want, and you can trivially revise it when the API changes, the old bargain starts to fall apart.

More broadly, this feels like a small example of a larger change in software development. In the Before Times, a lot of programming was basically high-speed bricolage: type something into google, click on the first stackoverflow link, copy it, smash it against reality repeatedly until the error message goes away. LLMs have automated a large chunk of that process. This makes bespoke code dramatically cheaper to produce, and that quantitative change really does have a quality of its own.

Most importantly, it is asking the eternal question: Why? Why is anything the way it is?

Fortunately for some, and unfortunately for most, our role as human workers shifts to a more elevated plane. The scarce skill isn't "write the boilerplate correctly" and is more "design the interface well." The important questions are not how to type out yet another wrapper function, but what the wrapper should do. How should authentication work? Where should credentials come from? What should the error model be? What information should surface when debugging? How should the code be laid out so it is maintainable six months from now? What should the end-user experience feel like? Those are design questions. Once answered clearly, they are exactly the kind of thing a machine can implement.

So the real lesson is not that LLMs magically write software for free. It is that a large class of software tasks that used to require awkward general-purpose generators or a lot of manual labor can now be handled as direct synthesis from a spec plus a style guide. Writing code is increasingly machine work. Thinking is still human work. You should be the designer, not the constructor.

And if that is true, the future for companies built around OpenAPI code generation looks shaky. Their value proposition depended on the idea that client generation was a specialized capability worth paying for every month ($79/month?). But if I can spend ~$10 on tokens, generate the exact client I want, keep the result forever, and update it later with the same design document, why would I keep renting someone else's generic pipeline? There will still be room for hosted tooling, compliance, governance, and enterprise workflow glue. But as pure client generators, they look increasingly like products from before the automation apocalypse we're living through.