Automating Rollbar Bug Fixes with Codex App Daily Runs

06 March 2026 at 13:54 ~ 6 min read

Production errors should be a queue you can work through, not just a dashboard you occasionally inspect, and create tickets from to address issues.

I wanted a daily Codex App automation that would look at new Rollbar issues, work out which ones had not already been addressed, prepare a fix, and open a pull request for each one.

The main blocker was not Rollbar itself. It was the lack of a good command-line interface for querying and triaging Rollbar items in a way an agent could use reliably.

The gap: Rollbar had a CLI, but not the one I needed

The official Rollbar CLI is useful, but its focus is deployment reporting and sourcemap uploads.

That is a sensible scope for Rollbar, but it does not solve the workflows I wanted to automate:

list new production errors from the last 24 hours
fetch occurrences and request payload context
filter by status, environment, level, and time window
update item state after a fix is prepared

For that, I wrote rollbar-cli, a small Go CLI for querying and triaging Rollbar items and occurrences.

It gives me both human-friendly terminal output and stable JSON/NDJSON output for automation. That second part matters more than it sounds. If you want an agent to do useful work repeatedly, it needs stable machine-readable input rather than scraping arbitrary terminal text.

The first building block: a Rollbar CLI that agents can trust

The CLI covers the operations I needed for daily triage:

rollbar-cli items list \
  --status active \
  --environment production \
  --level error \
  --level critical \
  --last 24h \
  --sort counter_desc \
  --limit 25 \
  --json

Then, when an item looks worth fixing:

rollbar-cli items get --id 275123456 --instances --json
rollbar-cli occurrences list --item-id 275123456 --ndjson

That sounds simple, but this is the point where the workflow becomes composable.

Instead of asking Codex to “go and inspect Rollbar somehow”, I can give it a concrete tool with predictable flags, stable output, and a small command surface. That removes a lot of ambiguity from the run.

The second building block: teach the agent the workflow once

The CLI alone is not enough. An agent also needs a repeatable operating manual.

So I added a rollbar-cli skill to the same repo and documented the core commands and triage flow there. That means I can give Codex a short task and let the skill supply the detailed process:

start with production error and critical
narrow to the last 24 hours
inspect occurrences for payload context
decide whether the issue is already addressed
only then move into a code change

I also paired that with the Yeet skill for the GitHub side of the workflow, so opening and managing PRs became part of the same repeatable system.

This combination matters more than one giant prompt. The prompt stays short, while the operational detail lives in tools and skills that can be re-used across repos.

The daily Codex App automation

Once the tooling existed, the automation itself became straightforward.

I set up a daily Codex App automation to run this prompt:

Can you look at rollbar errors in the last day that are new and for those not already addressed create fixes with an associated PR for each fix

I run that against repositories that already use Rollbar for production error reporting.

The key phrase is “not already addressed”. Without that constraint, the automation would happily re-open work that already has a branch, a fix in progress, or an existing PR.

At a high level, the daily run looks like this:

Query recent Rollbar items from the last 24 hours.
Filter down to new, relevant production issues.
Inspect item details and occurrences for enough debugging context.
Check whether the error already has a fix or PR in flight.
If not, create the code change, verify it, and open a dedicated PR.

I prefer one bug, one branch, one PR. It keeps the review surface small and makes it obvious what should be deployed if the fix is correct.

Why this works better than treating the agent like a chatbot

There are four useful layers here:

Rollbar provides the production signal.
rollbar-cli turns that signal into a stable command-line interface.
Skills teach the agent how to use that interface and how to handle GitHub PR flow.
Codex App automations run the whole process on a schedule.

If any one of those layers is missing, the system gets much weaker.

Without the CLI, the agent has no clean way to query incidents. Without the skill, it has to rediscover the workflow on every run. Without the PR skill, you still end up doing the last operational step manually. Without the automation, it remains a clever demo rather than an ongoing process.

The broader lesson is that good agentic systems are usually built from small reliable interfaces, not from increasingly long prompts.

Guardrails I would keep in place

I would not let this sort of workflow merge directly to main without review.

Opening PRs daily is already useful. It means the expensive part, identifying the issue, reproducing context, preparing the patch, and packaging it for review, has happened before a human even starts looking.

The guardrails that matter most are:

only inspect fresh issues in a bounded time window
scope runs to production errors that matter
skip work already covered by an existing fix or PR
keep fixes small and isolated
run repo checks before opening the PR
require normal review and CI before merge

That gives you acceleration without pretending observability data is always enough to guarantee a safe patch.

The practical payoff

The interesting part is not that an AI tool can draft code from an error report. Plenty of tools can do that once.

The interesting part is that this now runs as a recurring operational loop:

production errors arrive in Rollbar
a daily automation queries only the new ones
Codex uses the Rollbar skill to investigate them
Codex uses the PR workflow to open reviewable fixes

That is a different category of usefulness from ad hoc prompting. It turns error triage into a scheduled engineering routine.

In other words, the value was not just “use Codex on a bug”. The value was creating the missing interfaces so Codex could do that job repeatedly with less supervision.

Final thoughts

The hardest part of this was not writing the prompt. It was making the environment legible to the agent.

That meant:

a CLI for the missing Rollbar query surface
a skill that explains how to triage with that CLI
a PR skill to handle the GitHub hand-off
a Codex App automation to run the workflow every day

Once those pieces were in place, the prompt became almost boring. That is usually a good sign.