Microservice Foundations: Commands
For me, all of software architecture and in fact product-development organization exists in an ecosystem. The parameters of this ecosystem change over time with respect to your products, related products, and the tech landscape overall. And the great challenge, of a tech organization writ-large is to evolve.
I'm writing my method to develop an evolutionary software architecture as a series of blog posts instead of a library or framework, because I think that you should extend the principles for your environment rather than try to shoehorn my exact framework into your ecosystem.
In this first post, I'm going to introduce you to my current iteration of a Command base class and how it serves the purpose of being a core building block of a massively scalable microservice system.
Motivation
I want a core idea of a microservice that I can build into anything. I want to be able to hire someone, promote them, move them across projects or split or merge projects without them having to relearn everything. I want to be able to incorporate a new company's tech as quickly and cleanly as possible in the event of an M&A.
There's a technical practice and infrastructure to doing that. There's a method to follow that allows you to do this over and over and over again and reproduce good results. One of the core ideas in my arsenal is my approach to microservice design.
During my tenure at Teamworks as Principal Engineer, then VP and then Chief Architect, I helped the department grow from 12 engineers to hundreds. We refined the fundamental principles of microservice design into a bit of an art. I've been using these abstractions for years, although I daresay we've gotten better at them.
I should preface this all to say I think of the unit of "microservice" as a small collection of tightly related cogs focused on facilitating a core idea. I don't ascribe to the orthodoxy that says every function is a microservice, or that it all has to be serverless all the way down. I go with the casual notion that my microservice should avoid scope creep, and that I should be able to describe what it does to another technical person with a single-sentence thesis and maybe a paragraph of descriptive matter.
For my money, I think the best microservice "system" is Command Query Responsibility Separation (CQRS). I don't think you need to go whole-hog into event sourcing or use separate databases for queries vs. mutations from the get-go. Sometimes you do. Sometimes you don't. Often you eventually need one of these things if your microservice becomes a critical piece of kit, and you don't want to write code up front that makes more work for you when you have to do that.
Starting with the idea of commands (or mutations) and queries as your base unit of architecture affords you the ability to flex into these things. It also allows you to conveniently break up your microservice if it's starting to feel less "micro." We'll talk about Queries in another post. This is about Commands and how to build them.
The Command base class
Before I get into the code, let me tell you what I'm looking for in a realized Command:
- Well-defined. The command needs to have a well-defined schema.
- Serializable. Every command needs to serialize and deserialize into its schema in whatever format (JSON, Protobuf).
- Orderable. Every command needs to be able to be sequenced in the system.
- Versioned. Having the ability to version commands means you can version an API, and if at some point you do forensics of "how did this data get into this state anyway?" having versioned commands means you can replay a command log reliably.
- Composable. You should strive to write commands so they can be composed into macros without losing throughput for the user or the coder.
If your command base class has all these it is also:
- Auditable. You probably don't need to start with an audit trail for every command, but if your command class supports adding one, you get a lot of other useful things for free.
Now as promised, it's time for code. This is a Pydantic model meant to be used inside of a FastAPI-based microservice. First I'll show you the class, then I'll talk about how we achieve the above with it.
import abc
import pydantic as pyd
from datetime import datetime
from fastapi import BackgroundTasks
from uuid_utils import uuid7 # this has timestamp built in
# see https://jeffersonheard.ghost.io/production-python-setting-up-fastapi-to-talk-to-a-real-database/
from myapp.resources import Context
class Command(pyd.BaseModel, abc.ABC):
# update this when you update the class
version: str = "1.0"
# this will automatically be generated when the class is instantiated
# in the request flow.
event_id: str = pyd.Field(default_factory=lambda: uuid7().hex)
# for orderability
@computed_field
@cached_property
def origination_ts(self) -> datetime:
return self.event_id.timestamp
# for log streams
@computed_field
@cached_property
def fq_event_id(self) -> str:
return f"{self.typename}:{self.event_id}"
# this serializes the type name when it's written out for audit.
@computed_field
@property
def typename(self) -> str:
return self.__class__.__name__
async def check_perm(self, ctx: Context, raise_on_fail=True) -> bool:
return True
@abstractmethod
async def cmd(self, ctx: Context, bkg: BackgroundTasks):
pass
Note, for the definition of Context, see my previous post on databases.
"But this is so simple, Jeff!" Yes, it is. A lot of the interesting functionality is covered by Pydantic, but here we have a command that is treated like an event. It has a versioned schema, a total ordering, and can be serialized in a log stream to JSON or protobuf or msgpack. Then it can be deserialized and executed the same way each time regardless of whether the code has changed (assuming you have the need to do that, and thus are doing the required upkeep of storing old versions) and regardless of the number of log streams you're collating.
For many microservices, this is as far as you have to go. You don't really need the log stream or event sourcing to start with, and when you're doing something new it's almost always more setup than it's worth. But this command class affords you a ton that you definitely do need to start out. Even using it as a traditional API endpoint, you get:
- Good logging metadata with
typename
,event_ts
andfq_event_id
. - Complete schema, including API docs for the endpoint (this comes from FastAPI and Pydantic working together)
- Good refactorability both from being able to compose command classes, but also because if your units of design are commands and queries, you can break a few selected commands and queries out and create a new microservice with a minimum of rewritten code and boilerplate.
- Good evolvability. You can collect related commands into a single file; you can break them up into single files in a package; and you can split a package off into a new microservice with almost no extra work. When my team and I build services this way, it's really easy to get the lay of the land when you hop from one microservice to the other.
cmd
method, if you want to switch to steam processing style execution at some point, you can just factor that out into it's own processing function or write a bunch of non-cmd stream processors to read the event log. It requires more work to monitor things written that way, but there are good reasons to go that way. such as if your single Command message is ubiquitous to a lot of different asynchronous processes.The one thing that isn't well-defined in the abstract base class is composability, because it depends more on practice than framework. For that, we want to see what a concrete Command class will look like.
Writing a real command
This is a simplified version of a real command in Tableroq, my new solopreneur project. I've left out the validators and gotten rid of some of the fields to condense it, but the essence of what a command is is here.
I realize it's a relatively simple CRUD operation, but that's because most mutations still are that. Only if you're married to CRUD you will eventually stray from the spirit of it for one thing or another – like say mass altering permissions. Then you either fit the proverbial square peg into the round CRUD hole (ouch!), or you write some RPC-like API endpoint and feel bad about it because it doesn't look like all your other endpoints. With this abstraction, everything's just a Command.
class AlterDashboard(Command):
"Creates a new version of an existing dashboard with changes."
name: str | None = Field(default=None, description="...")
description: str | None = Field(default=None, description="...")
num_cols_xs: int | None = Field(default=None, description="...")
num_cols_sm: int | None = Field(default=None, description="...")
num_cols_md: int | None = Field(default=None, description="...")
num_cols_lg: int | None = Field(default=None, description="...")
num_cols_xl: int | None = Field(default=None, description="...")
aspect_ratio: float | None = Field(default=None, description="...")
dataset_definitions: dict[str, DatasetDefinition]
charts: list[dict[str, Any]]
app_metadata: dict | None = None
async def check_perm(self, ctx: Context, raise_on_fail=True) -> bool:
if not ctx.perms.check_permission(
"dashboard", ctx.object_id, "can_modify", "user", ctx.user_id
):
if raise_on_fail:
raise HTTPException(403, "Forbidden")
else:
return False
return True
async def target(self, ctx):
return await fetch_current_object_or_404(ctx, Dashboard)
async def cmd(self, ctx: Context, bkg: BackgroundTasks):
dash0 = await self.target(ctx)
# save old version
await ctx.writer.execute(
update(Dashboard)
.where(
Dashboard.slug == dash0.slug,
Dashboard.customer_id == ctx.customer_id,
)
.values(version=Dashboard.version - 1)
)
# create new version
frm = Dashboard(
id=fresh_id(Dashboard, ctx.customer_id),
customer_id=ctx.customer_id,
name=self.name or dash0.name,
slug=dash0.slug,
definition=DashboardDefinition(
dataset_definitions=self.dataset_definitions,
chart_definitions=self.charts,
num_cols_xs=alt(self, dash0.definition, 'num_cols_xs'),
num_cols_sm=alt(self, dash0.definition, 'num_cols_sm'),
num_cols_md=alt(self, dash0.definition, 'num_cols_md'),
num_cols_lg=alt(self, dash0.definition, 'num_cols_lg'),
num_cols_xl=alt(self, dash0.definition, 'num_cols_xl'),
aspect_ratio=alt(self, dash0.definition, 'aspect_ratio'),
),
app_metadata=alt(self, dash0.definition, 'app_metadata'),
created_by=dash0.created_by,
updated_by=ctx.user_id,
created_at=dash0.created_at,
updated_at=self.origination_ts,
version=0
)
# note we don't commit, this becomes important.
ctx.writer.add(frm)
I keep talking about composability. This is where practice comes in. The first thing I do is I separate a command from the request and resource context in which it was made.
I particularly avoid committing in atomic commands. I leave session creation and transaction management to the API endpoint or process or to a "macro" command. This way I can string commands together and compose them into macros without having API calls that involve say 12-100 database transactions. Imagine that we have a SaveUpdatesAndPublish command at some point. It's a Create or an Alter, followed by a Publish, followed by a NotifySubscribers. If I've written each command to use a separate transaction:
- I've needlessly replicated the most expensive part of the command three times
- I've also added a "partial commit" scenario that could turn into a hard-to-track down error later if a bug in one of the commands or a network problem causes it to only partly fail. You just have to write so much more "error correcting" code if the transaction doesn't fail in one go.
Putting it together in FastAPI
import logging
from fastapi import Request, Depends, BackgroundTasks
from .auth import get_user_context
from trq.data import Context
from trq.commands.dashboards import AlterDashboard
log = logging.getLogger(__name__)
from .routes import app_router
Ctx = Annotated[Context, Depends(Context())]
TokenCtx = Annotated[Context, Depends(get_user_context)]
@app_router.api_route(
"/dash.alter/{object_id:path}",
methods=["HEAD", "POST", "OPTIONS"],
)
async def dashboard(
cmd: AlterDashboard,
ctx0: TokenCtx,
bkg: BackgroundTasks
):
log.info(
"%s from API. Altering dashboard %s for customer %s by user %s",
cmd.fq_event_id,
ctx.object_id,
ctx.customer_id,
ctx.user_id
)
async with ctx0:
await cmd.check_permission(ctx)
await cmd.cmd(ctx, bkg)
log.info("%s completed successfully.", cmd.fq_event_id)
So there's a lot going on here in a small space. We've taken our command, yes, and we're executing it, but let's spend a bit on what surrounds it. Context in particular does a lot of work as does the Annotated and Depends.
Effectively what's happening here is that
- FastAPI constructs the request
- FastAPI's dependency injection helps construct the Context given the
object_id
in the path and the data about who the user is and what customer they're acting on behalf of fromget_user_context
(OAuth, JWT, whatever auth happens, happens inside of there). - Because FastAPI knows what Pydantic classes are, it automatically validates and constructs our AlterDashboard command for us from the JSON post body.
So that means that validation and unauthorized conditions are automatically handled for us, as well as acquiring the resources needed to perform our alter, like the database connection. Now we just need to:
- Begin a transaction.
- Check object existence.
- Check the user's permission to alter the object.
- Run our command.
To begin a transaction, we use async with ctx0
. To check existence, see the target
method in AlterDashboard. If that fails, it'll raise a 404. To check permissions, we hit the check_perm
method. If that fails, it raises a 403. Then we await our cmd.cmd(ctx, bkg)
.
And that, my friends, is our first command from soup to nuts.
Wrapping it up
I realize all this has been in the weeds and technical, so let me take a moment to bring it back.
The point of all this is to balance productivity, simplicity, flexibility, and power for the developer. The most expensive and hard to plan for resource in your technical organization is never infrastructure. It's not subscriptions. It's not the 5 extra pods you run in EKS to provide the same response-times you would have if you'd written clever code in your favorite framework. It's people.
You will, as an architect or a manager or an executive or a team leader spend more time, money, and effort getting people to row their boat in the same direction than you ever will writing code. By starting with a strong, clear foundation based on good principles you will pay dividends throughout the life of the software you steward. It will evolve faster, more reliably, and people will be happier when you value clear over clever.
And that's the goal of this series, to provide a framework that makes the engineering clear. I hope this gives you a good taste of what's to come.