Automated bug fixing pipeline with Claude Cowork

Q: What happens if a tool call fails or the agent gets stuck?

The MaxSteps limit (typically 4–5) prevents infinite loops. If the agent can't safely determine a fix, the escape hatch triggers: it posts a structured analysis comment to Linear with what was found and why it couldn't proceed, moves the ticket back to "Triage" with a "needs-human" label, and stops. The output is still useful, it's a head start for whoever picks it up next.

The manual bug-fixing loop is a tax on every engineering team. Sentry fires an alert, someone creates a Linear ticket, a developer picks it up hours later, digs through stack traces, writes a fix, writes a test, opens a PR. Each step has a hand-off. Each hand-off has latency. With Claude Cowork's scheduled task feature, you can automate everything up to the human review, and end up with a Pest test that proves the fix works.

The Architecture of an Automated Bug Fixing Pipeline#

The pipeline connects four systems:

Sentry, where production errors land, with stack traces, breadcrumbs, and Seer AI analysis. If you're self-hosting Sentry, installing Sentry self-hosted on EC2 with Docker, Forge, and SSL covers the full setup.
Linear, where bugs get triaged and tracked
Your Laravel codebase, where the actual fix lives
GitHub, where the PR lands for human review

flowchart LR
    A[Sentry Error] --> B[Linear Ticket\nTriage status]
    B --> C[Claude Cowork\nScheduled Task]
    C --> D[Pull error context\nfrom Sentry + Seer]
    D --> E[Read source files\nCreate git branch]
    E --> F[Implement fix\nWrite Pest test]
    F --> G[Run composer test]
    G --> H[Open GitHub PR]
    H --> I[Update Linear\nIn Review]
    H --> J[Resolve Sentry issue]

The scheduled task runs every 4 hours. It scans Linear for tickets in "Triage" status that have a Sentry issue ID attached. For each one, it pulls the full error context, implements a fix, and hands the PR to you for review.

Setting Up the Automated Bug Fixing Scheduled Task#

Cowork's schedule feature lets you define a task with a name, cron expression, and a prompt. The prompt is the full set of instructions Claude runs autonomously on each tick.

Create the task via the schedule shortcut in Cowork, then give it the following prompt, adapt the project names to your own setup:

Name: fix-production-bugs
Schedule: 0 */4 * * *
Plugins required: Sentry, Linear, GitHub

The prompt drives everything. Here's a condensed version of what mine looks like:

You are an automated bug-fixing agent for my-laravel-app.

## Step 1: Scan Linear for bugs to fix

Search Linear for issues in team "MyTeam" with:
- Status: Triage
- Label: bug
- A Sentry issue URL in the description

Take the first unassigned ticket. If none exist, stop and report "No bugs to fix."

## Step 2: Move ticket to In Progress

Update the Linear ticket (e.g. TICKET-123) status to "In Progress".
Post a comment: "🤖 Bug-fixing agent starting work on this issue."

## Step 3: Pull error context from Sentry

Extract the Sentry issue ID from the ticket description.
Fetch from Sentry:
- Full stack trace and affected files
- Breadcrumbs from the last 10 events
- Seer AI root cause analysis if available

## Step 4: Read the codebase

Read the source files identified in the stack trace.
Read the relevant test files in tests/Feature/.
Understand the data flow that leads to the error.

## Step 5: Create a branch and implement the fix

Create a git branch: fix/TICKET-123-{short-description}

Implement a targeted fix following Laravel best practices.
Run Pint on changed files only: ./vendor/bin/pint {changed-files}
Do not refactor unrelated code.

## Step 6: Write a Pest test

Write a test that reproduces the original bug using the exact input
shape from the Sentry breadcrumbs. It must fail on unfixed code
and pass with the fix applied.

## Step 7: Run the full test suite

Run: composer test

If any test fails, diagnose and fix before proceeding.
Do not open a PR if tests are failing.

## Step 8: Commit and push

Commit: "fix: {description} (closes TICKET-123)"
Push the branch to origin.

## Step 9: Open a GitHub PR against develop

Title: "[BUG] {description} (TICKET-123)"
Body: stack trace summary, Seer root cause, files changed, test added.

## Step 10: Close the loop

Mark the Sentry issue as "Resolved in next release".
Update the Linear ticket to "In Review" with a comment linking the PR.

## If you cannot determine a safe fix

Post a detailed analysis comment on the Linear ticket explaining
what was found and why a fix couldn't be determined.
Move the ticket back to "Triage" with label "needs-human".
Stop, do not open a partial or speculative PR.

That's the full pipeline. Claude reads it top-to-bottom, executing each step with the Sentry, Linear, and GitHub MCPs connected as plugins.

The Pipeline Step by Step#

Here's what actually happens for a real bug, a TypeError thrown in a payment webhook handler because an optional field arrives as null.

Sentry gives Claude everything it needs. The MCP pulls the stack trace, the breadcrumbs showing the exact payload that triggered the error, and Seer's root cause analysis. For a null-coalescing bug, Seer typically identifies the missing null guard right away.

The fix is targeted. Claude reads the webhook handler, identifies the line, adds the null guard or validation. It doesn't refactor adjacent code or second-guess the wider design.

The Pest test reproduces the original failure. That's where the real value is, a test that would have caught this before it hit production.

Writing Tests That Reproduce the Bug#

The test has one job: fail on the unfixed code, pass after the fix. Here's the pattern I want Claude generating:

it('handles null customer_email in Stripe webhook payload', function () {
    // Arrange, the exact payload shape that triggered the Sentry error
    $payload = json_encode([
        'type' => 'payment_intent.succeeded',
        'data' => [
            'object' => [
                'id'             => 'pi_test_123',
                'customer_email' => null, // This was the trigger in production
            ],
        ],
    ]);

    // Act, post to the actual handler endpoint
    $response = $this->postJson('/webhooks/stripe', json_decode($payload, true), [
        'Stripe-Signature' => computeStripeSignature($payload),
    ]);

    // Assert, graceful handling, not a 500
    $response->assertOk();
});

The key is using the exact broken input from the Sentry breadcrumbs, not a sanitised version, not a happy-path payload with the field restored. The actual data that caused the failure, flowing through the actual handler. Anything softer than that and you're not really testing the fix.

Graceful Failure Handling#

Not every bug is automatable. When Claude can't determine a safe fix, ambiguous root cause, insufficient context, a race condition, anything touching auth or billing logic at a system level, it should not guess.

The explicit escape hatch in the prompt handles this: post a structured analysis comment to Linear and move the ticket back to "Triage" with needs-human. The comment includes what was pulled from Sentry, what was considered, and why a fix couldn't be safely implemented. That's still useful output, it's a head start for whoever picks it up next.

The discipline here matters. A partial or speculative PR is worse than no PR. Forcing an explicit analysis comment before stopping means the agent's failures are productive, not silent.

Reusing Across Projects#

The prompt structure is stable across projects. What changes per-project:

Linear team name and project identifier
Sentry organisation/project slug
The git remote and base branch (develop vs main)
Project-specific test conventions, factory names, helper functions, test database setup
The composer test command (some projects alias this differently)

I keep a base prompt in my vault and copy it for each project, swapping five or six lines. Setting up the pipeline for a new service takes about ten minutes once the MCPs are connected.

The Human in the Loop#

Every PR from this pipeline requires human review before it merges. That's intentional.

Claude handles null guards, type coercions, missing input validation, simple off-by-one errors, the class of bug that has a clear, localised fix visible in the stack trace. It does not handle bugs that require understanding business intent, data migration edge cases, or anything where the "correct" fix depends on product decisions.

For those, the graceful failure path is the right outcome. A thorough analysis comment is more valuable than a wrong fix.

Trust builds incrementally. Watch the first five or six PRs closely. Check that the Pest tests actually reproduce the original failure. Check that the diff is minimal. If the quality is there, you start merging them faster. If not, tighten the prompt, usually the escape hatch criteria or the test reproduction instructions.

Wrapping Up#

The pipeline earns its keep not just by fixing bugs, but by leaving a test behind every time. After a month of running this, you end up with a regression suite that reflects your actual production failures, not just happy-path cases someone wrote on day one. Set it up, run it for a week with close review, and watch how many Sentry issues close before they reach your morning standup.

The PRs this pipeline opens flow directly into your existing deployment process, Zero-Downtime Laravel Deployments with GitHub Actions and Forge covers the CI/CD pipeline that runs tests and ships to production once a fix is reviewed and merged. For deepening the test coverage that catches regressions before they hit Sentry again, enforcing Laravel architecture rules with Pest's arch() helper adds structural guardrails that run in milliseconds on every commit.

FAQ#

What does the automated bug-fixing pipeline actually automate?

The pipeline automates everything except the final review and merge: scanning Sentry for new errors, triaging them into Linear, reading the codebase and stack trace, implementing a targeted fix, writing a Pest test, running the full test suite, and opening a GitHub PR with the results. A human still reviews and merges before it ships.

Can the agent fix any bug, or are there limits?

The agent handles well-scoped, localised bugs with a clear stack trace: null guards, type coercions, missing validation, simple off-by-one errors. It does not handle bugs requiring business logic decisions, data migrations, or anything where "correct" depends on product intent. That's what the graceful failure path is for, it posts a detailed analysis to Linear instead of guessing.

How do I know the Pest test is actually testing the fix?

The test uses the exact broken input from the Sentry breadcrumbs, not a sanitised version. It fails on the unfixed code and passes after the fix is applied. That's the entire contract. Anything softer than that and you're not really testing the fix, you're testing happy-path scenarios.

What happens if a tool call fails or the agent gets stuck?

The MaxSteps limit (typically 4–5) prevents infinite loops. If the agent can't safely determine a fix, the escape hatch triggers: it posts a structured analysis comment to Linear with what was found and why it couldn't proceed, moves the ticket back to "Triage" with a "needs-human" label, and stops. The output is still useful, it's a head start for whoever picks it up next.

How does this compare to traditional CI/CD linting and static analysis?

Linters catch style issues; static analysis catches type errors; CI/CD gates catch test failures. The automated bug-fixer catches production runtime bugs by reading the actual error context and implementing a fix. It's complementary to those tools, not a replacement. You keep the linters; the fixer handles the rare production bugs that slipped through.

How often should I run this pipeline, and what's the cost?

Running every 4 hours is typical, that's hourly enough to catch and fix production issues quickly without saturating your queue. Cost is purely API calls for context extraction (Sentry, Linear, GitHub) plus the Claude Cowork prompt execution. A month of this pipeline on a 5–10 bug app costs less than a few dollars. Larger apps with more errors will cost more, but still negligible compared to the engineering time saved.

Recent Articles

Building an Automated Bug-Fixing Pipeline with Claude Cowork

The Architecture of an Automated Bug Fixing Pipeline#

Setting Up the Automated Bug Fixing Scheduled Task#

The Pipeline Step by Step#

Writing Tests That Reproduce the Bug#

Graceful Failure Handling#

Reusing Across Projects#

The Human in the Loop#

Wrapping Up#

FAQ#

What does the automated bug-fixing pipeline actually automate?

Can the agent fix any bug, or are there limits?

How do I know the Pest test is actually testing the fix?

What happens if a tool call fails or the agent gets stuck?

How does this compare to traditional CI/CD linting and static analysis?

How often should I run this pipeline, and what's the cost?

Laravel 13 Reverb Database Driver: WebSockets Without Redis

Production AI Agents in Laravel: Reliable Loops with Prism and Laravel Workflow

Laravel 13 whereVectorSimilarTo(): Native Semantic Search in the Query Builder

Building an Automated Bug-Fixing Pipeline with Claude Cowork

The Architecture of an Automated Bug Fixing Pipeline#

Setting Up the Automated Bug Fixing Scheduled Task#

The Pipeline Step by Step#

Writing Tests That Reproduce the Bug#

Graceful Failure Handling#

Reusing Across Projects#

The Human in the Loop#

Wrapping Up#

FAQ#

What does the automated bug-fixing pipeline actually automate?

Can the agent fix any bug, or are there limits?

How do I know the Pest test is actually testing the fix?

What happens if a tool call fails or the agent gets stuck?

How does this compare to traditional CI/CD linting and static analysis?

How often should I run this pipeline, and what's the cost?

Keep reading

Laravel 13 Reverb Database Driver: WebSockets Without Redis

Production AI Agents in Laravel: Reliable Loops with Prism and Laravel Workflow

Laravel 13 whereVectorSimilarTo(): Native Semantic Search in the Query Builder