Devin Bernosky

Free Datadog for the fleet in one Nix module

2026-06-07T00:00:00-07:00

There’s a pattern in this blog: something that should eat a weekend turns out to eat an afternoon, because Nix and comin do most of the work. This one’s another.

I’d been meaning to put SigNoz on the fleet for months. Fleetwide host telemetry felt like the kind of project that needed a full weekend. Sunday morning, with coffee and Claude, I gave it a shot.

Two pieces. Netdata for the realtime side: “carbon’s CPU is pegged right now, what process?” Its per-process accounting is excellent, and its 18-model ML detector flags anomalies automatically. SigNoz for the historical side: “search journald across every host for that error from Tuesday.” ClickHouse-backed log search, structured fields, OTLP ingest. Both self-hostable. Both with sane Docker compositions.

The stack

Both run as Compose Manager projects on my always-on Unraid box, racer5. The compose files come straight from upstream with two tweaks: ports bind to my Tailscale IP only (no LAN exposure), and bind-mounts point at /mnt/user/appdata/... because Unraid’s /boot is a vfat partition that refuses real Unix permissions. ClickHouse runs as UID 101 inside its container, tried to read its config off the vfat mount, got Access to file denied, and the fix was moving the configs onto the actual storage array.

The compose configs are checked in alongside everything else in /boot/config/plugins/compose.manager/projects/. Once they’re up, racer5 has a Netdata Parent on :19999 and a SigNoz on :3301, both reachable only over Tailscale.

The Nix module

The host side is one file, modules/nixos/observability.nix, that does the whole thing:

config = lib.mkIf cfg.enable {
  sops.secrets.netdata_stream_api_key = {};
  sops.templates."netdata-stream.conf" = {
    owner = "netdata";
    content = ''
      [stream]
          enabled = yes
          destination = ${cfg.netdataParent}
          api key = ${config.sops.placeholder.netdata_stream_api_key}
    '';
  };

  services.netdata = {
    enable = true;
    configDir."stream.conf" = config.sops.templates."netdata-stream.conf".path;
  };

  environment.etc."otel-collector/config.yaml".text = ''
    receivers:
      hostmetrics: {...}
      journald: {directory: /var/log/journal, all: true}
    exporters:
      otlphttp/signoz:
        endpoint: ${cfg.signozEndpoint}
    ...
  '';

  systemd.services.otel-collector = {
    serviceConfig = {
      ExecStart = "${pkgs.opentelemetry-collector-contrib}/bin/otelcol-contrib --config=/etc/otel-collector/config.yaml";
      # ...sandboxed per my hardening rule
    };
  };
};

One import in modules/nixos/common.nix, and every NixOS laptop and workstation in the flake suddenly streams metrics to the Parent and ships journald to SigNoz. No per-host setup, no installer to run, no agent to babysit. The two appliance VMs (hermes and herqules, my agent gateways) are lean one-off nixosSystem entries that skip common.nix, so I added the same import directly to their module lists in flake.nix. Three lines total to bring them in.

The fan-out

I push the commit. Every host running comin, my GitOps deploy daemon polling main every 60 seconds, pulls it, rebuilds itself, and starts streaming. From git push to gram, hermes, and herqules all appearing in the Netdata Parent’s host picker took about four minutes, most of which was waiting for comin to poll. Nix and comin made this absurdly easy.

Making it laptop-friendly

Out of the box, Netdata samples everything at one second and burns about 14% of one core. On the desktop that’s fine. On a laptop already short on battery it’s a problem. Two changes brought it down to under 5%:

Bump update every from 1 to 5 seconds. Single biggest lever. Applies to the daemon and most plugins. The per-host page still feels live because every metric is fresh within 5 seconds, and the “which process is eating CPU right now” answer is unchanged.
Drop plugins that earn nothing on a workstation. debugfs reads ZSWAP/BTRFS/intel-rapl out of /sys/kernel/debug; I run ZRAM, have no BTRFS on the laptops, and hostmetrics already covers rapl. go.d ships postgres/redis/nginx/k8s collectors, none of which any laptop runs. systemd-journal is the Netdata UI’s journal tail, fully redundant once SigNoz has the full journald firehose. otel-signal-viewer is for using Netdata as an OTel sink, which I don’t. freeipmi is for server BMCs, and on a laptop it doesn’t just sit idle, it spams internal error into the system journal.

I kept apps.plugin, the per-process accounting, because it’s the killer feature, the reason I’d reach for Netdata over an alternative in the first place. The disables and the polling bump are all in the same services.netdata.config block, so it’s one commit. Comin propagates it everywhere.

Why Nix did most of the work

The bit that always lands twice in my own posts about Nix is that the same module is what wires up a workstation, a Proxmox VM, and an LG Gram. There’s no “configuration management” beyond writing the file and importing it. The whole fleet converges by polling and rebuilding.

The other piece is that the observability module is itself a small file alongside a much larger system config. The hardening section sits next to the Netdata block. The sops secret is declared in three lines. The dedicated user, the systemd unit, the sandboxing, all in the same file. When I want to know what observability does to a host, there’s exactly one place to look.

The compose files were written by SigNoz and Netdata. The streaming protocol was written by Netdata. The hostmetrics receiver was written by the OpenTelemetry project. The fan-out was written by comin. All I really did was the small glue module. The Nix part is what makes the glue stick to every machine I own at the same time, over a single cup of coffee.

Agentic memory: a shared brain for my coding agents

2026-06-03T00:00:00-07:00

The thing that makes coding agents feel disposable is that they forget everything. Every session starts from nothing. Claude Code works out some detail about my setup, a port already taken by another container, a quirk in how a service boots, and then the session ends and that knowledge is gone. Next time it rediscovers the same thing from scratch, burning tokens and my patience. If I switch to Codex, it never knew in the first place. Three agents, three separate cases of amnesia.

I wanted one memory they all share. Local, so it stays private and free. Persistent, so what one agent learns sticks around. Shared, so a fact Claude figures out is there for Codex and Hermes too. This was the bigger sibling to the scheduler bridge I wrote about separately, and the part I learned the most from.

The store

The memory itself is unglamorous, which is the point. It’s a vector database (Qdrant) running on a box I already keep on, with a small server in front of it that speaks MCP, the protocol these CLIs use to reach external tools. It exposes two operations: store a piece of text, and search for the text most relevant to a query. Each of my agents points at the same endpoint. One writes, the others read.

The question I expected to wrestle with was how to decide what’s worth storing. The common answer is to run a language model over each session and have it pull out the important facts. I didn’t want a paid API in the loop, and then I realized I didn’t need one. The agent doing the writing is already a language model. Whatever it chooses to save is already distilled. The memory layer doesn’t have to be smart, it has to store text and find it again by meaning. The intelligence is the agent I’m already paying for, so the memory itself costs nothing to run.

What a hook is, and why the memory needs them

A memory the agents can reach is not automatically a memory they use. Agents are focused on the task; they don’t stop to take notes. To make the memory part of the loop, I leaned on hooks.

A hook is a script your CLI runs on its own at a specific moment: when a session starts, when the agent finishes a turn, before it runs a tool, and so on. The CLI passes the script some JSON about what’s going on, and the script can pass text back that becomes part of the conversation. It wraps the model in a bit of scripted scaffolding without touching the model itself, which is how you get deterministic behavior out of something that’s otherwise improvising.

I wired up two. One runs when a session starts: it asks the memory for anything relevant to the project I’m in and drops it into the context, so the agent opens already knowing what earlier sessions worked out. The other runs when the agent finishes a turn: it reads the final message and saves anything I marked with a ... tag. The agent chooses what’s worth keeping, and the hook guarantees the saving actually happens instead of relying on the agent to call a tool it might forget.

My first marker was [REMEMBER: ...]. It broke instantly, because facts are full of brackets, an array index here, a regex there, and the script kept slicing each fact off at the first ]. The XML-style tag has no such problem, so I switched.

Why Nix makes this practical

Wiring a memory into three different agents on every machine I own sounds like the kind of fiddly per-host setup that never stays consistent. In Nix it’s a single description. One entry in my config defines the memory server, and it emits the correct configuration for both Claude Code and Codex on its own. The hooks are scripts pinned to exact versions of their dependencies, so they behave the same everywhere instead of depending on whatever Python happens to be on the machine. I commit it once, and every host wires its agents to the same brain.

The store itself, the database on my always-on box, is the one stateful piece that lives outside Nix. Everything that connects to it, every agent, every hook, every credential, is declared and version-controlled. If I rebuild a laptop from scratch, it rejoins the shared memory automatically, because rejoining is just part of what the config already says the machine is.

What I expect to learn

This is the part I’m genuinely unsure about, and that’s why I built it instead of theorizing. The open question is whether a memory like this compounds into something useful or just fills with noise. Does the store gradually become a real shared base of knowledge about my systems, or a junk drawer of half-duplicated facts? I already know some of the limits. There’s no deduplication yet, so the same fact saved twice in different words shows up twice. And coverage is partial by design, since it only remembers what an agent thought to flag, which means it catches explicit decisions well and lets the ambient stuff slip by.

The honest test is time. In a few weeks I’ll be able to tell whether sessions actually feel more continuous, whether recall surfaces things I’m glad to see or things I scroll past, and whether the gaps annoy me enough to add the next layer. That layer would be a job that runs a local model over old sessions to backfill what got missed, still on my own hardware, still free. I set myself a reminder to check in, by handing it to my assistant, which felt like the right way to close the loop.

Either way, the foundation is the part I’m confident about. The agents share one local memory, it costs nothing to run, and the whole thing is described in a config I can read top to bottom. What grows on top of that is the experiment.

A scheduler skill: delegating to my assistant from the terminal

2026-06-01T00:00:00-07:00

I get my best ideas for things to remember while I’m in the middle of something else. I’ll be deep in a refactor and think “I should check whether that upstream fix landed next week,” and the only way to capture it is to stop, switch to my phone or a calendar, type it out, and climb back into what I was doing. The context switch is small and it happens constantly, and small constant friction is exactly the kind I want to remove.

I already have an assistant that’s good at this. Hermes is a personal agent on a small VM that I talk to over Telegram. It sets reminders and runs scheduled jobs. The problem was that the agents I spend my day with, Claude Code and Codex in the terminal, had no way to reach it. So I built a bridge, and the more interesting half of the story is how Nix makes that bridge something I never have to think about again.

The bridge

Hermes already knew how to schedule work. It just needed a door other programs could knock on. Most agent frameworks can expose an HTTP API, so I turned Hermes’ on, put a bearer key in front of it, and limited it to my Tailscale network. Reachable from my laptops, invisible to the public internet.

Then I gave Claude Code a skill: when I ask for a reminder or a scheduled task, send a plain-English instruction to that API. I don’t teach the coding agent anything about cron syntax or job formats. I hand Hermes a sentence and it works out the rest with its own scheduling tool.

So now I type “remind me in two weeks to check on the new setup” into the same terminal I’m already working in. Claude passes the sentence along, Hermes makes the job, and two weeks later my phone buzzes. One agent doing the part it’s good at, handing the rest to an agent that’s good at something else.

The reminders don’t have to be dumb timers. The same day I built this, I had a few temporary workarounds in my config that I wanted to remove once the upstream fixes shipped. Instead of a reminder that just nags me on a date, I had Claude set up a recurring job that checks whether each fix has actually landed and only messages me when one is ready to pull out. A reminder that does the legwork before deciding to bother me.

Why Nix is doing the heavy lifting

This is the part that makes it more than a one-off script. Every piece of that bridge is declared in my Nix config: turning on the API, the firewall rule that keeps it on the tailnet, the encrypted secret for the bearer key, and the skill itself. None of it is something I set up by hand on one machine and hope I remember to repeat on the next.

The skill lives in my config as a directory that gets discovered automatically and handed to both Claude Code and Codex. When I push it to my repo, every machine I own picks it up and rebuilds itself. The laptop I set up next month gets the same skill, wired to the same assistant, without me touching it. The config is the setup, and the config is the documentation.

That matters more than it sounds. The failure mode for this kind of personal automation is that it works great on the machine you built it on and quietly rots everywhere else. You tweak a dotfile here, forget to copy it there, and six months later you can’t remember how any of it was wired. Describing it in Nix instead means the wiring is one source of truth that applies identically across every machine. Rather than configuring each host by hand, I describe what I want once and let them all converge on it.

Why you’d want this, and what I expect to learn

The immediate payoff is the friction that disappears, but the part I’m actually curious about is the pattern underneath. This is one specialized agent calling another in plain language, and it works because each side is good at a different thing. The coding agent is where my attention already is; the assistant is where scheduling and notifications already live. Letting them talk turns two separate tools into something closer to a team.

I don’t know yet how far that goes, which is most of why I find it interesting. Does plain natural language stay good enough as the interface between agents, or do I start wanting more structure once they hand each other more than reminders? Does this stay a one-way bridge, or do I end up with the assistant delegating back into the coding tools? How much does erasing that little context switch actually change how often I capture a thought, versus letting it slip? I’ll have a better sense after a few weeks of living with it.

If you want to try the same thing, the shape is small: expose your assistant’s API, lock it to a private network, and teach your coding tool to send it natural language. The scheduling smarts already exist on the assistant side. You’re just giving it a sentence, and if you describe the whole thing declaratively, giving it to every machine you own at once.

I also gave these agents a shared memory, which turned out to be the bigger project. That’s the next post.

Hands-off NixOS across my laptops with Attic + comin

2026-05-25T00:00:00-07:00

Nix’s whole appeal is reproducibility: declare a system once and rebuild it anywhere to get the same machine, the same configuration everywhere. The catch is that reproducibility is a property of one build on one machine. It says nothing about keeping a fleet in step, or about sharing build work so the same package doesn’t get compiled on each box. That’s the gap I wanted to close across my laptops.

I run NixOS on three laptops, an LG Gram, an ASUS Zenbook Duo, and a ThinkPad, all built from a single flake. I only ever use one at a time, whichever suits my mood that day.

The goal: whichever laptop I open should already be running my latest config, and any builds it needs should come from a local cache instead of compiling on the machine. No rebuild waiting when I lift the lid, and no battery burned on something another machine already built.

Two pieces get there. A self-hosted Attic binary cache holds the builds, and comin handles automatic switching as a GitOps pull-deploy.

The problem: catch-up rebuilds eat laptop battery

Before this, picking a laptop meant catching it up. Whichever one I grabbed had fallen behind whatever I’d changed since I last used it, so I’d rebuild, and on a laptop some of those rebuilds are a real battery and thermal event.

The reason is that some packages have to be built locally no matter what. Unfree packages aren’t on the public Hydra cache by policy, and veracrypt is a good example. Its license was historically marked unfree, so Hydra hasn’t built it since 2021. Every flake bump that perturbs one of its transitive deps (wxGTK, fuse, lvm2, gtk3) means a local rebuild of around ten minutes. Electron apps are worse, often 30 to 60 minutes from source. Pinning to an older nixpkgs doesn’t help, because no commit has a cached binary either.

On a desktop that’s just annoying. On a laptop it costs you fan noise and a chunk of charge. And because I rotate between machines, the same ten-minute veracrypt build happened independently on each one, three separate compiles for a single config change, spread out over whenever I happened to pick each laptop up.

Attic: build once, pull everywhere

Attic is a self-hosted Nix binary cache. Mine runs as a native container on my always-on Unraid box, reachable over Tailscale. Pull is unauthenticated over the tailnet, since it’s only ever reachable on my private network.

The point is that nothing builds twice across the fleet. Every host pushes new store paths as it produces them, so the first machine to rebuild after a change seeds the cache and the others just pull. There’s no dedicated builder. Whichever laptop I’m on when I first rebuild eats that compile once, and my desktop (a 5950X with an RTX 3090) chips in its CUDA builds on the occasions I wake it over Wake-on-LAN for GPU work, like running a 27b model in opencode. Pushing is an async systemd service rather than a build hook:

systemd.services.attic-watch-store = {
  description = "Push new nix store paths to the racer5 attic cache";
  wantedBy = ["multi-user.target"];
  serviceConfig = {
    ExecStart = "${pkgs.attic-client}/bin/attic watch-store nix-config";
    Restart = "always";
    RestartSec = 30;
  };
};

watch-store runs in the background and never blocks a build. I tried a nix post-build-hook first and it was the wrong tool, since it’s synchronous and pushes whole closures, which stalled deploys. Attic filters out cache.nixos.org paths automatically, so only genuinely uncached stuff gets stored, and it’s content addressed, so two hosts pushing the same path store it once. One caveat: keep -j at 5 or below, because atticd’s SQLite serializes writes and a higher count exhausts the connection pool.

The payoff is exactly the problem above. That veracrypt build now happens once, anywhere, and every other machine substitutes the binary.

comin: push to main, laptops switch themselves

The cache solves what gets built. comin solves who triggers the build. It’s a GitOps pull-deploy: each host polls the repo and rebuilds its own config, selected by hostname, whenever main advances, with automatic rollback if the new generation fails.

services.comin = {
  enable = true;
  remotes = [{
    name = "origin";
    url = "https://github.com/devindudeman/nix-config.git";
    branches.main.name = "main";
    auth.access_token_path = config.sops.secrets.github_pat.path;
    poller.period = 60;  # new commits land within 60s
  }];
};

No control node pushing anywhere; each laptop pulls for itself. comin runs as a service, and the result is that whichever machine I open is already current. It caught up in the background the last time it was awake. If it ever gets intrusive on battery, I can raise poller.period or require a deploy confirmation.

Every device on the same config, all the time

This is the part I didn’t expect to love so much. Because every host converges on main automatically, the laptop I’m not using stays just as current as the one I am. Switching machines on a whim stops being a “let me rebuild first” moment. That matters most for the things I tweak constantly:

Agent skills. My Claude Code and Codex skills are declared in the flake. I add a skill in one place, push, and the next time each laptop is awake it has it. No copying files around, no wondering which machine has the good version.
MCP servers. Same story. The declarative MCP config lands on every host, so an agent behaves identically whether I’m on the Gram or the ThinkPad.
Everything else. Shell, editor, fonts, secrets wiring. The same on whichever machine I reach for, continuously, instead of being the same only on the day I last remembered to rebuild it.

just update ties it together: bump flake inputs, rebuild locally, auto-commit, push, and the bump fans out to every laptop via comin.

What’s next: CI that builds each host ahead of time

Today the cache only warms when a machine happens to build first, and that machine is usually a laptop eating the compile I was trying to avoid. The next step is to make seeding deliberate, with a CI pipeline that builds every host’s full configuration on each push to main and pushes the results to Attic before any laptop polls.

That flips the timing. Instead of the first machine I happen to open eating the rebuild and seeding the cache for the rest, the cache would already be warm by the time any host checks for changes. Every laptop would do a pure substitute with no local compilation at all, even for the unfree and Electron packages. It also turns a broken commit into a red build instead of a failed deploy I notice later. Garnix or a small self-hosted runner pointed at the same Attic cache would both do the job.

Gotchas worth knowing

comin watches main, so merge before you activate comin on a new host, or it’ll happily deploy the old main.
Pull is unauthenticated over the tailnet by design. It’s only reachable on the private network, so don’t expose that port.
Keep -j at 5 or below on pushes. A higher count exhausts atticd’s SQLite connection pool.

Just Plane Mosher: Real-Time Flights on a 7-Color E-Ink Display

2026-04-10T00:00:00-07:00

A friend of mine, Mosher, lives in San Francisco. He’s into planes. I wanted to build him something that would show what’s flying overhead right now, updated every few minutes, displayed on something that looks good sitting on a shelf. Not a phone app, not a web dashboard. A physical thing.

The result is just-plane-mosher: a Raspberry Pi Zero 2 W connected to a Pimoroni Inky Impression 7.3” e-ink display. It pulls live aircraft positions from free ADS-B APIs, plots them on a Stamen Watercolor map centered on the Haight, and renders the whole thing to a 7-color 800x480 screen. Planes show up as little colored arrows pointing in their heading direction, labeled with callsigns and routes. An info bar along the bottom shows the flight count and last update time.

The display refreshes every five minutes. Between refreshes it draws zero power from the screen. The whole thing runs headless off a micro USB cable, tucked into a 3D-printed black frame that makes it look like a small picture frame.

The display

The Inky Impression is a 7-color ACeP (Advanced Color ePaper) panel. “7-color” means it can show black, white, red, orange, yellow, green, and blue. That’s it. Every pixel on the screen is exactly one of those seven colors. No gradients, no alpha blending, no antialiasing. If you want to show a photograph or a watercolor map, you have to quantize the entire image down to seven values per pixel and use dithering to fake the rest.

This is where the interesting problem starts.

Two-layer rendering

Floyd-Steinberg dithering does a good job of making a 7-color image look like it has a much wider palette. The watercolor map tiles come back from Stadia Maps as full RGB, and after dithering they look beautiful on the display. Soft blues for the bay, warm tans for land, the kind of thing you’d actually want on a shelf.

But dithering destroys small details. Text becomes unreadable. Thin lines dissolve into noise. A callsign label like “SWA2046” rendered onto the map before dithering comes out as a smeared mess of scattered pixels. The dithering algorithm doesn’t know that those pixels are supposed to be letters. It just sees color values and spreads the quantization error around.

The fix is to never dither the things that need to be crisp. The renderer works in two passes:

Layer 1 renders the watercolor map and a 10-nautical-mile range ring as a normal RGB image, then quantizes it to the 7-color palette with Floyd-Steinberg dithering. This produces a beautiful, soft background.

Layer 2 draws directly onto the palette-indexed result using exact palette indices. Aircraft arrows, callsign labels, the altitude legend, and the info bar are all placed after dithering, pixel by pixel, in pure palette colors. Black text on white backgrounds. Colored arrows with black borders for contrast.

The key insight is that the Inky library skips its own internal dithering when it receives a pre-quantized palette image. So the crisp Layer 2 content passes through to the hardware untouched. Text stays sharp. Arrows stay clean. The map underneath stays beautifully dithered. Two rendering strategies on one screen, and the display driver doesn’t need to know about either of them.

Flight data

Aircraft positions come from ADSB.lol, which aggregates data from volunteer-run ADS-B receivers worldwide. The API is free, requires no authentication, and returns every aircraft within a configurable radius of a lat/lon point. Each aircraft record includes position, altitude, heading, ground speed, callsign, registration, and aircraft type.

Callsigns alone aren’t that interesting. “UAL875” tells you it’s a United flight but not where it’s going. So each flight gets enriched with route data from ADSBdb, another free API that maps callsigns to airline names and origin/destination airports. The label for a United flight becomes two lines: “UAL875” on top, “SFO>NRT” underneath. Now you’re looking at a map and you can see that one is headed to Tokyo.

ADSBdb gets rate-limited to one request every 200ms, and results are cached for an hour. Callsigns that return 404 (charter flights, military, private aviation) get cached as misses so they don’t keep hammering the API.

Altitude as color

The seven available colors map naturally to altitude bands:

Red: below 5,000 feet (departures, arrivals, low approaches)
Orange: 5,000–15,000 feet (climbing, descending)
Yellow: 15,000–30,000 feet (mid-altitude)
Blue: above 30,000 feet (cruise)

Against the watercolor map, this works well. You can glance at the display and immediately tell which planes are coming or going (red/orange near SFO and OAK) versus which are passing through at cruise altitude (blue dots crossing the bay). Aircraft without heading data render as circles instead of arrows, which usually means they’re on the ground or the receiver has incomplete data.

Labels that don’t collide

Thirteen flights over San Francisco means thirteen labels, and they overlap. The renderer checks each label’s bounding box against every previously placed label. If there’s a collision, it shifts the new label down. If the label would run off the right edge of the screen, it flips to the left side of the arrow. It’s simple box collision, not a layout solver, but it handles the common case of three planes stacked on the SFO approach without turning the display into an unreadable mess.

Map tiles and caching

The background map is assembled from Stamen tiles fetched through Stadia Maps. The tiles are 256x256 PNGs that get stitched together and cropped to fit the display’s viewport. Three styles are available (Watercolor, Toner, and Terrain), and you can cycle between them with the buttons on the back of the display.

Map tiles are cached to disk. Stamen’s tile set is static (the watercolor paintings aren’t going to change), so the cache effectively never expires. The setup script pre-fetches all the tiles needed for the configured location and zoom level, so the first boot doesn’t have to wait for network requests before it can render.

Change detection

E-ink refreshes are slow. The 7-color ACeP panel takes about 40 seconds for a full refresh. You can watch the colors settle in waves across the screen. You don’t want to do that if nothing has changed.

Before pushing a frame to the display, the renderer computes a SHA-256 hash of the image buffer and compares it to the last one sent. If the hash matches, it skips the refresh entirely. Late at night when air traffic drops off, the display might go an hour without updating. During the morning departure rush, it refreshes every cycle.

Buttons

The Inky Impression has four physical buttons on the back, exposed via GPIO. Two of them are wired up:

Button A: force an immediate refresh (wakes the main loop from its sleep)
Button B: cycle through map styles

The button listener is interrupt-driven using gpiod edge detection, so it burns zero CPU while waiting. A press just sets a flag and nudges the main loop.

Running it

The whole thing runs as a systemd service on Raspbian. A setup script handles SPI/I2C configuration, Python venv creation, dependency installation, tile pre-caching, and service registration. After setup, it starts on boot and restarts automatically on failure with exponential backoff.

The project is on GitHub. It’s built for one specific display and one specific location, but the location is configurable via .env and the rendering approach would work for any 7-color e-ink panel. If you have an Inky Impression and want to watch planes, it’s a git clone and a setup script away.

Streaming Expedition 33 from a Headless NixOS Desktop

2026-04-07T00:00:00-07:00

I wanted to play Expedition 33 well, and the Steam Deck couldn’t do it. The game is built on Unreal Engine 5 and it asks for a lot of GPU. On the Deck it launched as “Unsupported,” eventually got upgraded to “Playable,” and even after the post-launch optimization update the recommended settings cap you at 30fps with the rendering preset cranked all the way down. Combat still drops below 25fps in some areas. People have written entire performance mods just to make it tolerable. The Deck does its best, but the APU is being asked to do something it can’t.

The display side was already fine. I plug a pair of Viture Pro XR glasses into the Deck over USB-C and get a 1080p 120Hz virtual screen at around 135 inches floating in front of me. The Deck plus the glasses is a great portable display setup. The rendering is what falls over.

Meanwhile, my actual gaming desktop (5950X, 3090) sits in another room, doing nothing most of the time, and I’d rather play a JRPG on the couch than at a desk. So: Sunshine on the desktop, Moonlight on the Deck, full-fat NVENC streaming over the LAN. The 3090 renders Expedition 33 at high settings and a real frame rate, NVENC encodes the result, the Deck decodes it and pipes it straight to the glasses. The Deck stops trying to be a renderer and goes back to what it’s actually good at: receiving input, decoding video, and sitting in my hands.

The catch is that the desktop has to behave like a normal GNOME session with no human and no monitor present, because there isn’t one. Every weird piece of this config exists because of that constraint.

Lying to the GPU

The 3090 won’t bring up a video output without EDID data on the wire. With no monitor plugged in, GNOME boots into a dummy headless mode and Sunshine has nothing to capture. The fix is a passive HDMI EDID emulator: a $10 dongle from Amazon that returns a fake “I am a 4K monitor” handshake. The kernel and NVIDIA driver bring up DP-1 with real modes, GNOME boots a normal session, and Sunshine has something to capture.

This is the cheapest part of the build. Without it, the rest of this config has nothing to point at.

Lying to GDM

For Sunshine to capture a session, a session has to exist. So GDM auto-logs me in on boot:

services.displayManager.autoLogin = {
  enable = true;
  user = "devinbernosky";
};

A few extra knobs disable the foot-guns:

services.xserver.displayManager.gdm.autoSuspend = false. By default GDM suspends the box at the login screen if nobody moves a mouse, which would defeat the entire point.
The gdm-autologin PAM stack gets enableGnomeKeyring = true so the keyring unlocks without typing a password. Otherwise Steam, browsers, and everything else spam keyring prompts forever.
Screen lock and idle activation are off in dconf, but I left inactive suspend after 30 minutes alone to save power. Wake-on-LAN handles the rest. Moonlight has built-in WoL support, so the Deck can cold-start the box from the couch.

Lying to bwrap about Sunshine’s parent

This was the hardest part to figure out and the most worth writing about.

Sunshine on Wayland captures via KMS, and KMS capture needs CAP_SYS_ADMIN. The NixOS module exposes that as one knob:

services.sunshine = {
  capSysAdmin = true;
  package = pkgs.sunshine.override { cudaSupport = true; };
};

cudaSupport = true flips Sunshine onto NVENC, which is the entire reason a 3090 is worth using as a streaming source. It also means this build can’t come from the binary cache, it has to compile locally.

The non-obvious part is that capSysAdmin = true is poison for bubblewrap. Steam (and anything else sandboxed via bwrap) refuses to launch from a process tree that carries elevated capabilities, which is reasonable on bwrap’s part but breaks every Sunshine “launch app” entry that just calls steam. I tracked this down through nixpkgs#463989 after spending way too long on Steam launching and immediately dying with no useful error.

The fix is that any command Sunshine launches has to drop back to a normal user context first. My “Steam Big Picture” entry looks like this:

sudo -u devinbernosky setsid steam -bigpicture

sudo -u strips the inherited capabilities. setsid detaches the new process from Sunshine’s process group, so closing the stream from the Deck doesn’t kill Steam along with it. Two small flags, a lot of pain saved.

Lying to mutter about which monitor it has

The desktop’s actual panel, when one is plugged in, is a 5120x1440 ultrawide at 120 Hz. The Steam Deck with the Viture glasses attached asks for 1080p at 120 Hz, which is what the glasses want. I don’t want games launching at ultrawide resolutions and getting downscaled, and I don’t want the Deck and the glasses negotiating with weird non-standard modes.

Sunshine has a global_prep_cmd setting that runs a script when a client connects and an “undo” script when it disconnects. Mine uses gdctl, the new GNOME display CLI that ships with mutter 49+, to actually reconfigure the compositor on the fly:

sunshine-switch reads $SUNSHINE_CLIENT_WIDTH and $SUNSHINE_CLIENT_HEIGHT, asks gdctl show -v for a matching mode on DP-1, and switches to it.
sunshine-restore puts the desktop back to 5120x1440@119.999 when the stream ends.

The wrinkle is that gdctl has to talk to the user’s mutter over D-Bus, and Sunshine’s user service doesn’t inherit that environment cleanly. Each call gets wrapped:

sudo -u devinbernosky DBUS_SESSION_BUS_ADDRESS=unix:path=/run/user/1000/bus gdctl ...

Same sudo -u trick as the Steam launcher, doing double duty: dropping caps and re-entering the right session bus.

Cleaning up after suspend

Resuming from suspend leaves NVENC in a strange state. Sunshine keeps running but its encoder handles are stale, so streams fail to start until I restart the service manually. A small oneshot fixes this automatically:

systemd.services.sunshine-resume = {
  after = [ "systemd-suspend.service" "nvidia-resume.service" ];
  wantedBy = [ "suspend.target" "hibernate.target" "hybrid-sleep.target" ];
  serviceConfig = {
    Type = "oneshot";
    ExecStartPre = "${pkgs.coreutils}/bin/sleep 5";
    ExecStart = "systemctl --user --machine=devinbernosky@.host restart sunshine.service";
  };
};

The --machine=devinbernosky@.host flag is what lets a system unit poke the user’s systemd instance. It’s the cleanest way to bounce a user service from PID 1 without writing a polkit rule.

There’s also an ExecStartPre = sleep 10 on the Sunshine user service itself, to give GNOME enough time to bring DP-1 up before Sunshine probes for displays on first boot. Without it, Sunshine occasionally latches onto a “no monitors” state and just sulks.

Input

hardware.uinput.enable = true, plus adding my user to the input group, is the workaround for nixpkgs#455737. Sunshine needs to write to /dev/uinput to inject controller and keyboard events from the Deck. Without it the stream connects fine but the controller does nothing, which is its own kind of frustrating.

Desktop-only gaming polish

A bunch of this stuff doesn’t belong on my travel laptops. They shouldn’t be opening Steam Remote Play firewall ports or disabling OpenSnitch. So I split it out into hosts/desktop/gaming.nix and only the desktop pulls it in:

proton-ge-bin and protontricks for games that need community Proton builds. Expedition 33 was a GE-fork target early on.
programs.gamemode enabled with renice = 10 so launched games get nice -10.
OpenSnitch off, because per-connection prompts will absolutely ruin a streaming session.
programs.steam.remotePlay.openFirewall = true and localNetworkGameTransfers.openFirewall = true.

And on the Home Manager side, in hosts/desktop/home.nix:

Tiling Shell instead of Pop Shell. Pop’s tiling assumes 16:9-ish geometry and is miserable at 5120x1440. Tiling Shell lets me draw custom snap zones for the ultrawide.
NVIDIA shader cache pinned to 10 GB with __GL_SHADER_DISK_CACHE_SIZE and __GL_SHADER_DISK_CACHE_SKIP_CLEANUP=1. The driver’s default tiny cache evicts compiled shaders mid-game and you get stutter every time it has to recompile.
An XDG autostart entry that launches steam -silent on login. The moment GDM auto-logs in, Steam is already sitting in the tray waiting for Moonlight to connect.

The little things

boot.kernelPackages = pkgs.linuxPackages_latest and boot.initrd.systemd.enable = true for a fast headless boot. I dropped Plymouth because there’s nobody to look at the splash screen.
hardware.nvidia.open = true because Ampere is on the open kernel modules now per NVIDIA’s recommendation.
services.ollama.acceleration = "cuda". The same 24 GB of VRAM that streams Expedition 33 also runs local LLMs when nobody is gaming.
networking.interfaces.enp39s0.wakeOnLan.enable = true so the Deck can wake the box from a cold suspend.
CoolerControl for the NZXT Kraken AIO, so the 5950X doesn’t thermal-throttle mid-session.

Does it work?

Yes, perfectly. The desktop sits suspended most of the time. When I want to play, I put on the glasses, pick up the Deck, and open Moonlight. Moonlight’s built-in Wake-on-LAN wakes the box, GDM auto-logs in, Sunshine comes up, the resolution switches to 1080p120 to match what the glasses want, and Steam launches into Big Picture. The 3090 renders Expedition 33, NVENC encodes the stream, the Deck decodes it, and the glasses show me the result. When I close the stream the desktop drops back to idle, hits the 30-minute inactivity timeout, and goes back to sleep on its own.

The Wake-on-LAN piece matters more than it might sound. A 3090 at idle still pulls real wattage, and the whole system sitting up 24/7 just to be “available” would burn 80-100W around the clock for nothing. With WoL doing the heavy lifting, the desktop is at near-zero power most of the day. The Deck wakes it on demand, I get full 3090 performance for as long as I want, and then it puts itself back to sleep without any thought from me.

The whole thing is in my NixOS config. If you find this post by searching for some variant of “Sunshine launches Steam and Steam immediately dies on Wayland,” the answer is sudo -u $USER setsid . The bubblewrap-vs-CAP_SYS_ADMIN interaction took me longer to track down than I’d like to admit, and I’m leaving this here so the next person doesn’t have to.

What Does Claude Code Actually Do? Building laudec to Find Out

2026-03-26T00:00:00-07:00

If you use Claude Code, you’ve probably had the thought: what is actually happening right now?

You type a prompt, Claude does… something, files change, tokens get burned, and you pay for it. But you can’t really see what happened between your prompt and the result. How many API calls did that take? What did the system prompt look like? Did it spawn subagents? How fast is the context window filling up? What tools did it decide to use, and which did it reject?

I wanted to know. So I built laudec.

Claude Code is more transparent than you think

Claude Code already exposes a lot about its own operation. The surface area is there, it’s just that nobody has wired it all together in one place. There are three channels worth knowing about.

1. The API proxy surface

Claude Code reads the ANTHROPIC_BASE_URL environment variable. If set, all API traffic routes through that URL instead of going directly to api.anthropic.com. This is a first-class configuration point, not a hack. It means you can place anything you want between Claude Code and Anthropic’s API: a logging proxy, a cache, a rate limiter, an audit trail.

Every request that flows through this proxy carries the full conversation context: the system prompt, the message history, the tool definitions, the model parameters. Every response carries token usage, cache statistics, rate limit headers, and the complete model output (streamed as SSE events). This is the richest data source available, and capturing it requires nothing more than an HTTP server and an environment variable.

2. OpenTelemetry

Claude Code ships with native OpenTelemetry support. Set a few environment variables and it starts emitting structured telemetry over gRPC:

CLAUDE_CODE_ENABLE_TELEMETRY=1
OTEL_METRICS_EXPORTER=otlp
OTEL_LOGS_EXPORTER=otlp
OTEL_EXPORTER_OTLP_PROTOCOL=grpc
OTEL_EXPORTER_OTLP_ENDPOINT=http://127.0.0.1:14317

Two categories of data come out. Metrics are counters and gauges exported on a regular interval: session.count, token.usage, cost.usage, active_time.total, lines_of_code.count, commit.count, pull_request.count, and code_edit_tool.decision (tracking accept/reject rates on edits). Events are point-in-time log records for discrete actions (user_prompt, api_request, tool_decision, tool_result). Each event carries a session.id attribute that ties everything back to a single Claude Code session, and a prompt.id that links all the events triggered by a single user prompt: the API calls it caused, the tools it invoked, the decisions that were made. This is the correlation key that makes it possible to trace a single prompt through the entire chain of actions it triggered.

This is a different view than the proxy gives you. The proxy sees raw HTTP traffic. OTEL sees Claude Code’s internal model of what happened: “I decided to use the Read tool,” “the tool succeeded in 120ms,” “that API call cost $0.0043.” You want both.

3. Settings and hooks

Claude Code reads project-level configuration from .claude/settings.local.json. This file can set environment variables, sandbox rules, and tool permissions. It’s the glue that connects the first two channels: you write a settings file that points Claude Code’s OTEL exporter at your collector and its API base URL at your proxy, and everything starts flowing.

But Claude Code also has a full lifecycle hook system. Hooks are shell commands, HTTP endpoints, or even LLM prompts that fire at specific points during a session. There are 20+ hook events: SessionStart, PreToolUse, PostToolUse, PermissionRequest, Stop, SubagentStart, SubagentStop, PreCompact, Notification, and more. Each one receives structured JSON about what’s happening and can respond with decisions (allow, deny, block, modify the tool input, inject context into the conversation).

A PreToolUse hook can inspect every Bash command before it runs and block destructive operations. A PostToolUse hook can auto-format every file after Claude edits it. A SessionStart hook can inject git status and open TODOs into the conversation at the top of every session. A Stop hook can run your test suite before letting Claude declare it’s done, and force it to keep working if tests fail (exit code 2).

These hooks are deterministic. They don’t depend on the model remembering your instructions. They fire every time.

For observability purposes, hooks are a third channel alongside the proxy and OTEL. You could log every tool call, capture every permission decision, track subagent lifecycle events, and feed all of it into whatever backend you want. laudec doesn’t use hooks yet, just the settings file to wire up the proxy and OTEL collector. But the hook system is sitting right there as a future extension point for even finer-grained visibility.

What laudec does with all of this

laudec is a single Rust binary that wires up all three channels and gives you a place to look at the results. When you run laudec . in a project directory, it:

Starts a local HTTP proxy on port 18080
Starts a gRPC OTEL collector on port 14317
Writes a temporary .claude/settings.local.json that routes Claude Code’s traffic through both
Launches Claude Code as a child process
Serves a web dashboard on port 18384
Stores everything in a single SQLite database

When the session ends, it restores the original settings file, computes a session summary (duration, cost, tokens, git diff, tool usage), and prints it to the terminal.

No Docker, no external services, no configuration for the default case. The proxy, collector, dashboard, and database are all in the same binary.

The proxy view

The proxy tab shows Claude Code’s actual API conversations. Every call is classified by type:

MAIN calls are the primary conversation turns, where Claude Code sends the full context window with extended thinking enabled. These are labeled by turn number so you can track the flow of a session.

SUBAGENT calls are spawned by Claude Code’s internal delegation system. When it decides a subtask is better handled by a focused agent, it creates a new API call with a specialized system prompt and a constrained tool set. laudec detects these by inspecting the request body and tags them by role: EXPLORE (file search), WEB SEARCH, CC GUIDE, and so on.

QUOTA calls are lightweight checks (max_tokens=1) that Claude Code uses to verify API access before committing to an expensive request.

TOKEN COUNT calls hit the count_tokens endpoint to measure context size without generating a response.

For each call, you see the user query and model response rendered as markdown, the tool usage summary (e.g., “Read x3, Edit x2”), token counts, cache statistics, latency, and the raw request/response bodies with syntax highlighting. System-injected blocks like and are parsed out and displayed in collapsible sections so you can see exactly what Claude Code appends to your messages behind the scenes.

The OTEL view

The events tab groups telemetry by conversation turn, anchored by each user prompt. Within a turn, you see the chain of decisions Claude Code made: which API calls it fired, which tools it considered, which it used, whether they succeeded, and how long they took.

Cost visibility comes from this channel. Each api_request event carries the exact cost breakdown from Claude Code’s own accounting: input tokens, output tokens, cache read tokens, cache creation tokens, and the computed USD cost. The proxy can tell you token counts from the response headers, but only the OTEL data gives you the cost as Claude Code calculated it.

Insights

The insights tab derives higher-order patterns from the raw data:

Context growth shows input tokens per API call over the session. Cache analysis shows hit rate and estimated cost savings. Rate limits track x-ratelimit-remaining-requests and x-ratelimit-remaining-tokens from Anthropic’s response headers per call. Stop reasons aggregate why each API call ended: end_turn, tool_use, or max_tokens.

What I learned by watching

Building laudec was partly about the tool and partly about what it showed me. I spent a lot of time staring at the dashboard during real sessions, and some of the behavior I found was not what I expected.

The system prompt is not one thing

Claude Code doesn’t have a single monolithic system prompt. What you see in the proxy is a composite assembled from dozens of modular pieces at runtime. Thanks to projects like Piebald-AI/claude-code-system-prompts, which extracts and catalogs these pieces from each Claude Code release, we know the current version (v2.1.x) contains over 110 distinct prompt strings that get composed based on context.

The pieces include: the core system section, tool descriptions for each of the 20+ builtin tools (Bash, Read, Write, Edit, Glob, Grep, WebFetch, TodoWrite, and others), behavioral guidelines for tone and output style, task-doing instructions (avoid over-engineering, read before modifying, no premature abstractions, no unnecessary error handling, minimize file creation), git safety rules, sandbox policy, fork/subagent delegation guidelines, and whatever CLAUDE.md context exists in your project.

The tool descriptions alone are substantial. The Bash tool description is assembled from over 30 fragments covering sandboxing policy, sleep behavior, git commit conventions, parallel command execution, when to prefer builtin tools over shell equivalents, and more. The TodoWrite tool description runs over 2,000 tokens. These aren’t decorative. They’re the behavioral contract that shapes how Claude Code wields each tool.

When you open a session, all of this gets packed into the first API call and cached. Watching it happen in the proxy, you can see the system field span tens of thousands of tokens. Then on the second call, cache_read_input_tokens lights up and cache_creation_input_tokens drops to zero. The entire system prompt is served from cache at a fraction of the cost for every subsequent call in the session.

System reminders are injected into your messages

This surprised me. Claude Code doesn’t just set a system prompt at the start of the conversation. It actively injects content into subsequent user messages as the session progresses. These show up as XML-tagged blocks appended to what you typed.

blocks carry context-sensitive instructions: file-was-modified-externally notifications, TodoWrite reminders, token usage stats, plan mode activation (which alone is over 1,000 tokens of multi-phase planning instructions). blocks remind the model about tool constraints. lists tools that can be loaded on demand.

The catalog of known system reminders is extensive. There are ~40 distinct reminder types covering everything from “file exists but is empty” warnings, to hook success/failure notifications, to LSP diagnostic alerts, to team coordination instructions for multi-agent swarm mode. These are injected conditionally based on session state. You might never see most of them, but the ones that fire directly shape what the model does next.

In laudec’s proxy tab, these blocks are parsed out and displayed in collapsible sections beneath your actual message. You can see exactly what Claude Code appends on your behalf, and how much context budget it eats.

Subagents are a parallel conversation you can’t see

A single user prompt can spawn half a dozen API calls. When Claude Code decides to explore a codebase, it doesn’t do it in the main conversation thread. It launches an Explore subagent with its own system prompt, a read-only tool set (Read, Glob, Grep, LS), and its own conversation history. The subagent does its work, returns a summary, and the main agent incorporates the result.

The subagent architecture goes deeper than just Explore. Claude Code has specialized agents for planning (Plan mode, with its own enhanced prompt), web fetching (a summarizer agent that distills verbose page content), bash command risk assessment (a policy spec evaluator that classifies command prefix risk levels), conversation compaction (for summarizing history when context gets long), session title generation, CLAUDE.md creation, security review, verification, and even “dream memory consolidation” for cross-session knowledge synthesis.

In the proxy, this looks like a MAIN call, then two or three SUBAGENT calls firing in quick succession with noticeably smaller context windows (no extended thinking, focused system prompts, limited tool sets), then the main conversation resuming. The subagent calls are often cheaper per-call, but they add up. A complex refactoring prompt might generate 15+ API calls total, and you’d never know from the terminal output.

laudec tags each subagent by role by inspecting the system prompt content. “file search specialist” becomes EXPLORE. “web search tool use” becomes WEB SEARCH. “Claude Code Guide” becomes CC GUIDE. These heuristics are fragile (they depend on prompt wording that Anthropic changes between releases), but they make the multi-agent orchestration visible.

Tool decisions happen before tool results

The OTEL telemetry separates tool_decision events from tool_result events. This means you can see what Claude Code considered doing, not just what it did. A tool_decision fires when Claude Code evaluates whether to allow a tool the model requested, capturing the accept/reject outcome and the decision source (config rule, hook, user approval). A tool_result fires when the tool actually executes. The gap between them is where permission checks, sandbox validation, and user approval happen.

In the events tab, you can trace the full chain: user_prompt → api_request → tool_decision (Read) → tool_result (success, 45ms) → tool_decision (Edit) → tool_result (success, 12ms) → api_request → … You can see exactly where time goes. In sessions with many tool calls, the cumulative tool execution time can rival or exceed the API latency.

The tool_result events also carry a success boolean, and when OTEL_LOG_TOOL_DETAILS=1 is set, they include the tool_input with file paths, search patterns, and command arguments (truncated to ~4K characters). This means you can see not just that a tool was used, but what it was asked to do and whether it worked. Failed tool calls show up in laudec’s metrics tab as red failure counts next to each tool name.

Context growth is predictable, mostly

In a typical session, input tokens grow roughly linearly. Each turn adds your prompt, the model’s response, and any tool results to the conversation history. The context growth chart in laudec’s insights tab makes this staircase pattern visible.

But there are disruptions. A large file read (the Read tool pulling in a 2,000-line source file) causes a sudden spike. Claude Code’s internal conversation compaction, which fires when context approaches the model’s limit, causes a sharp drop. And subagent calls don’t grow the main context at all since they have their own isolated conversation.

Worth paying attention to: the relationship between cache reads and context size. As the session progresses and the context window fills, the ratio of cached tokens to fresh input tokens increases. The system prompt and early conversation history stay cached while only the newest messages are “fresh.” Longer sessions are actually more cost-efficient per-turn than short ones, up to the point where compaction fires and reshuffles the cache.

The quota check

Before the first real API call in a session, Claude Code sends a request with max_tokens: 1. This is a quota check: a near-zero-cost probe to verify that the API key is valid and rate limits haven’t been hit before committing tokens to a real call.

You can see these in the proxy tab as QUOTA-type calls. They return almost instantly (usually under 200ms) and consume negligible tokens. If you’re troubleshooting authentication or rate limit issues, these are the first calls to inspect.

Rate limit headroom

Anthropic’s API responses include headers like x-ratelimit-remaining-requests and x-ratelimit-remaining-tokens. Claude Code doesn’t surface this information anywhere in its UI. But the proxy captures every response header, and laudec’s insights tab tracks these values over time.

In normal usage, rate limits are a non-issue. But in heavy sessions, especially those with many subagent calls, you can watch the remaining-requests counter drop. If you’re running multiple Claude Code instances or using agentic orchestration tools that spawn parallel sessions, this visibility matters. laudec’s threshold warnings (red highlights when remaining requests drop below 10 or remaining tokens below 10,000) make it possible to anticipate rate limit problems rather than discovering them mid-session.

Stop reasons tell you how the model is being used

Every API call ends with a stop_reason: end_turn (the model finished its response), tool_use (the model wants to call a tool and is yielding control), or max_tokens (the response hit the token limit).

In a healthy session, you’ll see a mix of tool_use and end_turn stops. tool_use stops dominate during active work (the model is in a loop of reading, editing, running commands), and end_turn appears when the model reports back to you.

A session full of max_tokens stops tells a different story. It means the model is repeatedly hitting the output ceiling, which usually indicates the context window is nearing its limit and responses are getting truncated. Watching the stop reason distribution in laudec’s insights tab alongside the context growth chart gives you early warning that a session is running hot.

Cost scales with decisions, not prompts

What you pay has almost nothing to do with how many prompts you type. It depends on what Claude Code decides to do with each one. A 10-prompt session where each prompt triggers a single tool call costs far less than a 3-prompt session where each prompt triggers a multi-step tool chain with subagent exploration, file reads, edits, and verification.

The OTEL api_request events make this visible. Each event carries a cost_usd attribute calculated by Claude Code itself. Sorting sessions by cost and comparing them to prompt count reveals that the biggest cost driver is usually one or two prompts that trigger deep exploration or complex multi-file edits. The “fix the tests” prompt that spawns 8 subagent calls and reads 15 files costs more than the rest of the session combined.

Once you see this, it affects how you write prompts. Specific, well-scoped requests (“fix the type error in parser.rs line 42”) generate simple tool chains. Broad requests (“refactor the authentication system”) trigger deep subagent exploration. Both are fine. But without visibility into the actual call graph, you can’t know what each one costs or why.

Try it

laudec is on GitHub. It’s MIT licensed, written in Rust with a Svelte dashboard, and I’d welcome feedback on what’s useful and what’s missing.

The point of laudec is to learn. I wanted to see what was actually happening when I handed my project to an AI coding agent and said “fix the tests.” Now I can. If you’re curious about the same thing, give it a try.

NixClaw: Declarative AI Agents on NixOS

2026-03-11T00:00:00-07:00

I wanted AI agents I could spin up per-project, each with its own workspace and chat channel, running on infrastructure I control. A proper declarative system where the entire machine — disk layout, services, secrets, agent bindings — lives in version-controlled Nix.

NixClaw is what I ended up building. It’s a dedicated NixOS VM on Proxmox that runs an OpenClaw agent gateway connected to my self-hosted Mattermost. Each agent gets a private Gitea repo, a Mattermost channel, and a workspace that auto-syncs every 15 minutes. The whole thing deploys from my MacBook in one command.

The Stack

The pieces:

Proxmox 8.x — QEMU/KVM hypervisor, already running my homelab
NixOS 25.11 — the OS, declared in ~300 lines of Nix
nixos-anywhere + disko — remote provisioning, declarative disk partitioning
OpenClaw — the agent gateway, MIT licensed, Mattermost-native
Mattermost — self-hosted chat at mattermost.fitzsky.com
Gitea — self-hosted git at gitea.fitzsky.com, HTTPS auth
sops-nix + age — encrypted secrets, one key, one file
Tailscale — SSH access, zero open ports
Podman — rootless containers for agent tool sandboxing
Brave Search API — gives agents web search as a built-in tool

Everything is Nix except the mutable agent bindings file (agents.json5), which the gateway hot-reloads. That’s intentional — I don’t want to nixos-rebuild every time I create an agent.

Architecture

Proxmox VM "nixclaw" (NixOS 25.11, x86_64-linux)
├─ devinbernosky (admin) — SSH over Tailscale
└─ openclaw (service user)
   ├─ openclaw-gateway — HM user service, loopback-only :18789
   ├─ Mattermost bot "Operator" — routes messages to agents
   ├─ Per-project workspaces — git repos on Gitea
   ├─ Shared files — USER.md + TOOLS.md symlinked into all workspaces
   ├─ Podman sandbox — rootless containers for tool execution
   ├─ Brave web search — built-in tool
   └─ git-sync timer — auto-commits every 15min

Two users. The admin (devinbernosky) SSHs in over Tailscale and manages config. The service user (openclaw) runs the gateway as a Home Manager user service and owns all the workspaces. Clean separation.

Prerequisites

Before touching Nix, I needed three external services ready.

Mattermost: Create a bot account (I called mine “Operator”), save the token. Grab your team ID:

curl -s https://mattermost.fitzsky.com/api/v4/teams \
  -H "Authorization: Bearer " | jq '.[0].id'

Gitea: Create a “NixClaw” organization, a skills repo for shared agent skills, and an access token. One thing to know: if Gitea runs behind Docker, SSH won’t work because the host’s sshd grabs port 22 first. Everything goes over HTTPS.

Brave Search: Grab an API key from brave.com/search/api. That’s it.

The VM

In Proxmox, create a VM with UEFI (OVMF), q35 machine type, VirtIO SCSI, and QEMU agent enabled. I gave mine 256GB disk, 10 CPU threads, and 8GB RAM. That’s probably overkill for what amounts to a gateway process, but I had the headroom.

Boot the NixOS 25.11 minimal ISO. At the boot menu, pick “Linux LTS” — that’s a kernel option in the boot menu, not a separate ISO.

Deploying in One Shot

This is where it gets fun. nixos-anywhere lets you go from a live ISO to a fully configured NixOS install in one command, from your local machine.

On the VM console, set a temp root password and note the IP:

sudo passwd root
ip addr

On your Mac, stage the age key so sops-nix can decrypt secrets on the new system:

mkdir -p /tmp/nixclaw-extra/root/.config/sops/age
cp ~/.config/sops/age/keys.txt /tmp/nixclaw-extra/root/.config/sops/age/keys.txt
chmod 600 /tmp/nixclaw-extra/root/.config/sops/age/keys.txt

Then deploy:

nix run github:nix-community/nixos-anywhere -- \
  --flake "path:$HOME/Github/nix-config#nixclaw" \
  --extra-files /tmp/nixclaw-extra \
  root@

This partitions the disk (via disko), installs NixOS from the flake, copies the age key into place, and reboots. The whole system — users, services, secrets, agent gateway — materializes from the flake definition.

One subtlety: nixos-anywhere SSHs into the live ISO’s sshd, which allows root login by default. The PermitRootLogin = "no" in my system config only takes effect after install. No conflict.

Post-Boot

SSH in as the admin user with the default password:

ssh devinbernosky@
# password: changeme

Tailscale first:

sudo tailscale up --ssh
passwd  # change from default immediately

After Tailscale is up, ssh nixclaw works from any device on the tailnet. No passwords, no port forwarding, no DNS records. From here on, everything goes through Tailscale.

Clone the nix-config repo. On a fresh system, Home Manager hasn’t activated yet, so gh isn’t on PATH. Bootstrap with nix run:

nix run nixpkgs#gh -- auth login
nix run nixpkgs#gh -- repo clone devindudeman/nix-config ~/nix-config

Rebuild:

cd ~/nix-config/hosts/nixclaw
just deploy

This activates everything — sops secrets get decrypted, the gateway starts, GH_TOKEN lands in fish shell, SSH sessions start auto-cd’ing to the host config directory. Reconnect and verify:

ssh nixclaw
just status   # gateway should be active
just logs     # should show "connected as Operator"

One more thing — clone the shared skills repo:

just clone-skills

Creating Agents

This is the daily workflow. One command:

just new-agent

Behind the scenes, this:

Creates a private Gitea repo in the NixClaw org
Initializes a workspace from template (real files, not symlinks)
Symlinks shared USER.md and TOOLS.md into the workspace
Pushes initial commit to Gitea
Creates a public Mattermost channel (or restores it if soft-deleted)
Adds the bot and my user as channel members
Patches agents.json5 — the gateway hot-reloads, no restart needed

Send a message in the new channel. The agent responds immediately, no @mention required (chatmode: "onmessage").

Tearing one down is just as clean:

just delete-agent

Day-to-Day

SSH sessions land in the host config directory automatically. Everything runs through the justfile:

# Gateway
just status     # is it running?
just logs       # tail the log
just restart    # bounce it

# Config updates
just pull       # git pull --rebase
just deploy     # nixos-rebuild switch
just push       # push changes back

# Agents
just new-agent 
just delete-agent 
just sync       # trigger manual git sync

Config changes go through the normal Nix workflow: edit, rebuild, push. Agent lifecycle is entirely outside Nix — just the justfile and the mutable agents.json5.

What’s Automated, What’s Not

Automated:

Gateway starts on boot (systemd lingering)
Mattermost config injected via ExecStartPre
Secrets decrypted from sops at service start
Workspaces sync to Gitea every 15 minutes
GH_TOKEN available in shell from sops

Manual:

Initial gh auth login (chicken-and-egg with GH_TOKEN on first deploy)
One-time Tailscale auth
Agent creation (just new-agent)
Updating shared USER.md and TOOLS.md content

The line between automated and manual is intentional. Agent creation is a human decision. Everything after that decision is automated.

Design Decisions Worth Noting

Why not SSH for Gitea? I run Gitea in Docker, and the host’s sshd intercepts port 22. I could remap ports, but HTTPS with token auth works fine and is one less thing to debug.

Why a mutable agents.json5? I didn’t want agent creation to require a full Nix rebuild. The gateway watches this file and hot-reloads when it changes. Nix manages the system, the justfile manages agents.

Why public Mattermost channels? I want to be able to browse agent conversations from any device. The Mattermost instance is self-hosted and private anyway — “public” just means visible within the team.

Why file-based gateway logs? OpenClaw logs to /tmp/openclaw/openclaw-gateway.log, not journald. That’s how the upstream packages it. just logs wraps tail -f on that path.

Why Podman, not Docker? Rootless. The openclaw user runs containers without root privileges. This matters when you’re giving AI agents the ability to execute code.

What I’d Do Differently

Honestly, not much. The deploy story with nixos-anywhere is excellent — going from bare VM to running agents in one command still feels like magic. If I were starting over, I might explore running the gateway in a container itself for even more isolation, but the current setup with a dedicated service user and rootless Podman for tool execution is clean enough.

The biggest friction point is the initial gh auth login bootstrap. On a completely fresh system, you need gh to clone the repo that provides gh. The nix run nixpkgs#gh workaround handles it, but it’s one of those things that makes you appreciate the chicken-and-egg problems in declarative systems.

If you’re running OpenClaw or thinking about self-hosted AI agents, the NixOS approach is worth the investment. Declarative config means I can blow away the VM and rebuild it from scratch in minutes. Every decision is documented in code. And when something breaks, just logs is one command away from the answer.

Setting Up This Blog

2026-03-05T00:00:00-08:00

Been meaning to set one of these up for a while. I’ve always liked GitHub Pages, so I went with that. I’d never used Jekyll before, but it felt like the right path for a personal site where the main job is writing and publishing cleanly.

The Structure

I started from an empty repo and scaffolded a basic Jekyll structure — layouts, includes, posts, a few pages. The first design pass was a little too playful, so I tightened it into something minimal. Sharper typography, cleaner spacing, less decoration. The style should stay out of the way of the writing.

The config is intentionally small:

title: Devin Bernosky
url: "https://devin.fitzsky.com"
permalink: /:title/
markdown: kramdown
plugins:
  - jekyll-feed
  - jekyll-seo-tag
defaults:
  - scope:
      path: ""
      type: "posts"
    values:
      layout: post
      author: Devin

The permalink: /:title/ is key — URLs are just the slug with no date prefix. Posts live in _posts/ as YYYY-MM-DD-title.md, but the URL comes out clean: devin.fitzsky.com/setting-up-this-blog/.

Deploy

Deploys run through GitHub Actions on pushes to main. The workflow builds the Jekyll site and ships it to Pages automatically.

I hit one gotcha early: configure-pages returned a Not Found response on the first run. Turns out GitHub Pages isn’t enabled on the repo until you either flip it on manually in Settings, or — what I did — set enablement: true in the action:

- name: Setup Pages
  uses: actions/configure-pages@v5.0.0
  with:
    enablement: true

That bootstraps Pages on the first deploy. After that it’s invisible.

Nix Dev Environment

I use Nix and devenv everywhere, so this was an easy choice. The repo has a flake with a devenv shell that pins Ruby, Bundler, and the system dependencies:

{ pkgs, ... }:
{
  packages = [
    pkgs.ruby_3_4
    pkgs.bundler
    pkgs.libyaml
    pkgs.gnumake
  ];

  scripts.setup.exec = "bundle install";
  scripts.serve.exec = "bundle exec jekyll serve --livereload --host 0.0.0.0 --port 4000";
}

Enter the shell, run devenv run setup once, then devenv run serve. Same result on every machine.

This mattered fast. GitHub Pages gems currently require commonmarker, which caps at Ruby < 4.0. I pinned Ruby 3.4.8 and locked the github-pages gem at version 232. Without the pin, you’ll hit dependency resolution failures the moment Ruby 4.x shows up on your system.

The Gemfile is two lines:

source "https://rubygems.org"
gem "github-pages", "= 232", group: :jekyll_plugins
gem "webrick", "= 1.9.2"

webrick is there because Ruby 3.x dropped it from stdlib and Jekyll’s local server needs it.

DNS and Domain

DNS lives in Cloudflare for all things Fitzsky. Most of it routes through Cloudflare tunnels to self-hosted services — Mattermost, Gitea, that kind of thing. The blog is the exception: it’s just a CNAME pointing at GitHub Pages.

devin  CNAME  devindudeman.github.io

A CNAME file in the repo root tells GitHub Pages to serve the custom domain, and HTTPS enforcement is on in the repo settings.

Writing Flow

This is the best part. Open a file, write Markdown, push. That’s it.

# new post
hx _posts/2026-03-11-whatever-im-writing-about.md

# front matter
---
title: Whatever I'm Writing About
description: One-line summary.
---

# write, commit, push
git add . && git commit -m "New post" && git push

Live in about 90 seconds. No build step to think about, no deploy to trigger. Push to main and it’s on the internet.

Let’s see how well I can keep this updated.