A customer emails to say checkout has been "slow all morning." You open your production server, tail laravel.log, and scroll. There are no traces, no latency percentiles, no dashboard — just a wall of text and a growing sense that you are guessing. This is where most Laravel apps live, and it is a solvable problem.
Laravel observability is the practice of instrumenting your app so that when something is slow or broken, the system can tell you what and where without you SSH-ing into a box. In this guide we will wire up the three pillars — metrics, traces, and logs — using Laravel Pulse, Laravel Nightwatch, and OpenTelemetry, then finish with an alerting strategy tied to the golden signals so you get paged for things that actually hurt users.
What you'll learn#
- The difference between monitoring and observability, and why the three pillars matter for a Laravel app
- How to run Laravel Pulse in production with a custom recorder, sensible thresholds, and a locked-down dashboard
- How to add hosted monitoring and alerting with Laravel Nightwatch, including sampling
- How to add OpenTelemetry distributed tracing with auto-instrumentation and propagate trace IDs into your logs and queue jobs
Monitoring versus observability, and why Laravel needs both#
Monitoring and observability get used interchangeably, but they answer different questions. Monitoring is checking known failure modes against thresholds you set in advance: CPU over 80%, error rate over 2%, queue depth over 1,000. You decide what to watch, and the system tells you when a number crosses a line. Observability is the broader ability to ask new questions of your running system after the fact — "why was this one user's checkout slow at 09:14?" — without shipping new code to find the answer. Monitoring tells you that something is wrong; observability lets you explore why.
You need both, and in a Laravel app they map onto familiar pieces. Monitoring is your Pulse dashboard and your alert rules. Observability is the trace data and structured logs that let you reconstruct a single request's journey through middleware, controller, database, cache, and queue. The mistake teams make is buying one tool and assuming it delivers both. A metrics dashboard will never explain a single slow request, and a pile of raw traces will never page you at 3 a.m. — you assemble observability from layers.
The conventional model is the three pillars: metrics, traces, and logs. Metrics are cheap, aggregated numbers over time — request rate, p95 latency, queue length. Traces follow one request end to end and show where the milliseconds went. Logs are timestamped events with context. Each pillar answers a different question, and the magic happens when you can pivot between them: spot a latency spike in metrics, jump to a slow trace, then read the logs for that exact request. If you are new to the tooling landscape, the comparison of Telescope, Debugbar, and Pulse is a good primer on which tool is built for local debugging versus production.
Over the top of the three pillars sit the four golden signals from Google's SRE practice: latency, traffic, errors, and saturation. Latency is how long requests take. Traffic is how much demand the system is under. Errors is the rate of failed requests. Saturation is how full your resources are — queue backlog, connection pool, memory. These four are the shortlist worth alerting on because each maps directly to user pain. Keep them in mind; they are the thread that ties the rest of this guide together.
Map each observability tool to the question it answers#
Before installing anything, it helps to decide which tool owns which question. Laravel gives you three first-party-or-standard options, and they overlap less than their feature lists suggest.
Laravel Pulse answers "is my app healthy right now, and where are the obvious bottlenecks?" It is a self-hosted dashboard of aggregated metrics — slow queries, slow jobs, slow routes, busy users, cache hit rates — stored in your own database. It is fast, free, and gives you the golden-signal metrics pillar with almost no effort.
Laravel Nightwatch answers "what happened in production, across every event, and who was affected?" It is a hosted service that ingests requests, jobs, queries, exceptions, mail, notifications, and outgoing HTTP, then connects them into a single timeline. It turns exceptions into trackable, assignable issues and sends smart alerts. Where Pulse aggregates, Nightwatch retains the detail and the relationships, which is what you need when a bug only reproduces for one customer.
OpenTelemetry answers "for this specific request, where did the time actually go, including across service boundaries?" It is a vendor-neutral standard for distributed tracing that you export to a backend of your choice — Grafana Tempo, Jaeger, Honeycomb, Uptrace, SigNoz, or a commercial APM. It is the traces pillar done properly, and it is the only one of the three that follows a request out of your app into a queue worker or a downstream API.
The pragmatic order is metrics first (Pulse), then hosted monitoring and alerting (Nightwatch), then deep tracing (OpenTelemetry) once you need to find where time goes. You do not have to run all three, but understanding what each is for stops you from forcing one tool to do a job it was never built for.
Run Laravel Pulse in production with a custom recorder#
Pulse is the fastest win. Install it, publish its assets, and run the migration that creates its tables:
composer require laravel/pulse
php artisan vendor:publish --provider="Laravel\Pulse\PulseServiceProvider"
php artisan migrate
The dashboard lives at /pulse. The single most important production step is to authorize it — by default it is only viewable in local, and if you expose it you must gate it. Define the viewPulse gate in a service provider so only admins can see your internal metrics:
use App\Models\User;
use Illuminate\Support\Facades\Gate;
public function boot(): void
{
Gate::define('viewPulse', function (User $user) {
return $user->is_admin;
});
}
Pulse ships with recorders for slow queries, slow requests, slow jobs, exceptions, and more, all configured in config/pulse.php (the official Pulse documentation lists every built-in recorder). The defaults are tuned for demos, not production. The slow-query threshold defaults to 1,000ms; if your app normally answers queries in under 100ms, a one-second threshold hides everything interesting. Lower it:
// config/pulse.php
'recorders' => [
Recorders\SlowQueries::class => [
'enabled' => true,
'threshold' => 300, // ms — anything slower gets recorded
'location' => true, // capture the calling file and line
'sample_rate' => 1,
],
// ...
],
The complement to slow-query tracking is preventing the slow queries in the first place; pairing Pulse with Eloquent strict mode to catch N+1 problems in development means most of them never reach production.
The real power of Pulse is custom recorders. A recorder listens for an event and records a metric you care about — order value, signup rate, webhook latency. Here is a recorder that captures the value of every shipped order so you can watch revenue throughput on the same dashboard as your infrastructure:
<?php
namespace App\Pulse\Recorders;
use App\Events\OrderShipped;
use Laravel\Pulse\Pulse;
class OrderValueRecorder
{
public string $listen = OrderShipped::class;
public function __construct(protected Pulse $pulse) {}
public function record(OrderShipped $event): void
{
$this->pulse->record(
type: 'order_value',
key: (string) $event->order->currency,
value: $event->order->total_in_cents,
)->sum()->count();
}
}
Register it in the recorders array and Pulse handles aggregation and storage for you. You can also call Pulse::record() anywhere in your code — inside a job, a listener, or a controller — without writing a recorder class at all, which is handy for one-off metrics.
Pulse trims its own tables so they never grow unbounded. Trimming runs probabilistically as data is ingested, using a lottery (for example [1, 1000], meaning roughly one ingest in a thousand triggers a trim) and a keep window such as '7 days'. Seven days is plenty for an operational dashboard; you are not using Pulse for long-term analytics. On a busy app, point Pulse at a dedicated database connection via PULSE_DB_CONNECTION so its writes never contend with your application queries.
Add hosted monitoring and alerting with Laravel Nightwatch#
Pulse aggregates, which is exactly why it cannot answer "show me the one request that failed for customer #4821." That is Nightwatch's job. Nightwatch is Laravel's first-party hosted monitoring product: you install a package, point it at your account with a token, and it streams every event — requests, jobs, commands, scheduled tasks, exceptions, queries, notifications, mail, and outgoing requests — into a dashboard that links them together.
Create a project at nightwatch.laravel.com, copy your token, then install the package:
composer require laravel/nightwatch
# .env
NIGHTWATCH_TOKEN=your-project-token
On Laravel Cloud, Nightwatch is a toggle. On Forge or a self-managed server, the package buffers events locally and a lightweight agent process forwards them, so you run the agent as a supervised service alongside your queue workers. Because it ships every event, the cost lever is sampling — Nightwatch's free tier includes 200,000 events per month and the Pro plan starts at $20/month for 5 million events with 30-day retention, so on a high-traffic app you sample. Use the fine-grained sampling controls to keep 100% of exceptions and a fraction of healthy requests; that focuses your event quota on the data that helps you debug rather than on the happy path you already trust.
What you get for that is the production story Pulse cannot tell. Exceptions are automatically turned into trackable issues you can assign to a teammate and mark resolved, with visibility into the exact route, message, and affected users. Timelines for requests, queries, and jobs show how long middleware, the controller, and each query took. This is the "team-ready" layer — the same data Geocodio uses to monitor 550M+ HTTP requests. If you have built anything like the Nginx-and-Nightwatch approach to blocking malicious 404 traffic, you have already seen how its event stream doubles as a security signal, not just a performance one.
Add OpenTelemetry distributed tracing#
Pulse and Nightwatch are Laravel-aware and easy. OpenTelemetry is neither, and that is the point: it is a vendor-neutral standard, so the traces you produce are portable across every major backend, and it follows a request out of your app — into a queue worker, into a downstream microservice, into a third-party API. When you need to answer "where did the 1,200ms go?" with span-level precision, this is the pillar that delivers.
OpenTelemetry for PHP has two parts: a PHP extension that enables zero-code instrumentation hooks, and Composer packages for the SDK, the exporter, and the framework instrumentation. Install the extension (PHP 8.1+ is required, 8.3+ recommended), then the packages:
pecl install opentelemetry
# add `extension=opentelemetry.so` to your php.ini
composer require \
open-telemetry/sdk \
open-telemetry/exporter-otlp \
open-telemetry/opentelemetry-auto-laravel \
php-http/guzzle7-adapter
Configuration is entirely environment-driven, which fits Laravel's twelve-factor instincts. Point it at an OTLP endpoint — your own OpenTelemetry Collector, or a backend like Tempo, Jaeger, or SigNoz:
OTEL_PHP_AUTOLOAD_ENABLED=true
OTEL_SERVICE_NAME=checkout-api
OTEL_SERVICE_VERSION=2026.06.0
OTEL_TRACES_EXPORTER=otlp
OTEL_EXPORTER_OTLP_PROTOCOL=http/protobuf
OTEL_EXPORTER_OTLP_ENDPOINT=http://otel-collector:4318
OTEL_PROPAGATORS=tracecontext,baggage
OTEL_TRACES_SAMPLER=parentbased_traceidratio
OTEL_TRACES_SAMPLER_ARG=0.1
With opentelemetry-auto-laravel installed, you immediately get spans for incoming requests, Eloquent queries, cache operations, HTTP client calls, and queue dispatches — no code changes. When the framework spans are not granular enough, drop a manual span around a chunk of business logic. Always activate the span, record exceptions onto it, and end it in a finally so it closes even when the code throws:
use OpenTelemetry\API\Globals;
use OpenTelemetry\API\Trace\StatusCode;
$tracer = Globals::tracerProvider()->getTracer('checkout');
$span = $tracer->spanBuilder('checkout.charge')->startSpan();
$scope = $span->activate();
try {
$span->setAttribute('order.id', $order->id);
$this->paymentGateway->charge($order);
} catch (\Throwable $e) {
$span->recordException($e);
$span->setStatus(StatusCode::STATUS_ERROR);
throw $e;
} finally {
$scope->detach();
$span->end();
}
Note the sampler in the config above: parentbased_traceidratio at 0.1 keeps roughly 10% of traces while always respecting an upstream sampling decision. Tracing every request in production is expensive and rarely necessary — sample, and rely on Pulse and Nightwatch for the always-on metrics.
Correlate logs with traces using a shared trace ID#
A trace tells you where time went; a log tells you what the code was thinking. They are far more valuable together, and the bridge is a shared identifier on both. Laravel's Context facade (Laravel 11+) is purpose-built for this: anything you add to the context is automatically attached to every subsequent log entry, and it survives across queue job boundaries. Stamp the active trace ID into context once, in middleware, and every log line for that request carries it:
namespace App\Http\Middleware;
use Closure;
use Illuminate\Support\Facades\Context;
use OpenTelemetry\API\Trace\Span;
class AddTraceIdToLogs
{
public function handle($request, Closure $next)
{
$traceId = Span::getCurrent()->getContext()->getTraceId();
if ($traceId !== '00000000000000000000000000000000') {
Context::add('trace_id', $traceId);
}
return $next($request);
}
}
Now a Log::info('Payment captured', ['order' => $order->id]) anywhere downstream automatically includes trace_id, so you can copy it from your tracing backend, search your logs, and read the exact narrative for that request. To make those logs machine-parseable, switch your production channel to JSON formatting so a log pipeline can index trace_id as a field:
// config/logging.php
'channels' => [
'stack' => [
'driver' => 'stack',
'channels' => ['stderr'],
'formatter' => \Monolog\Formatter\JsonFormatter::class,
],
],
Laravel's logging configuration makes this switch a one-line change. If you then ship those JSON logs off-box — for example from the Kubernetes deployment pattern where every container writes to stdout and a collector forwards it — trace_id becomes the join key between your log store and your trace backend. This same trace-and-log correlation pattern is what makes LLM observability with Prism and Langfuse legible when an AI call is one slow span among many.
Trace across queue jobs and external HTTP calls#
Distributed tracing earns its name the moment a request hands work to a queue. Without propagation, the web request is one trace and the job that runs three seconds later is a completely separate, orphaned trace — and you lose the causal link that explains why a user's action triggered a slow background process. The fix is to propagate the trace context from the dispatcher into the job.
Two mechanisms cooperate here. First, Laravel's Context automatically dehydrates when a job is queued and rehydrates when it runs, so any trace_id you stored is already present inside the job for log correlation. Second, for the span parentage you inject the OpenTelemetry context into the job's payload and extract it on the other side:
use OpenTelemetry\API\Globals;
use OpenTelemetry\API\Trace\Propagation\TraceContextPropagator;
class ProcessRefund implements ShouldQueue
{
public array $otelCarrier = [];
public static function fromRequest(Order $order): self
{
$job = new self($order);
TraceContextPropagator::getInstance()->inject($job->otelCarrier);
return $job;
}
public function handle(): void
{
$context = TraceContextPropagator::getInstance()->extract($this->otelCarrier);
$span = Globals::tracerProvider()
->getTracer('jobs')
->spanBuilder('refund.process')
->setParent($context)
->startSpan();
$scope = $span->activate();
try {
// refund logic; the outgoing HTTP span to the payment API
// is auto-instrumented and nests under this job span
} finally {
$scope->detach();
$span->end();
}
}
}
The result is a single trace that spans the HTTP request, the queued job, and the outgoing API call — exactly the picture you want when a refund "sometimes takes ten seconds." If you are running queues at any real volume, this tracing pairs naturally with the operational tuning in the guide to scaling Laravel queues in production, where worker saturation is one of the golden signals you will be watching.
Build an alerting strategy around the golden signals#
Instrumentation without alerting just means you find out about outages from customers. But over-alerting is worse than no alerting: a channel that pages for every blip trains the team to ignore it, and the one real incident drowns. The discipline is to alert on the golden signals — latency, traffic, errors, saturation — because each maps to something a user feels, and to route by severity.
Split alerts into two tiers. Page alerts wake someone up: error rate above a threshold, p95 latency past your SLO, queue saturation that will soon drop jobs. Dashboard alerts are everything else — they inform but do not interrupt. A useful rule of thumb: if the alert does not require a human to act within minutes, it is a dashboard, not a page. If you export metrics to Prometheus, an error-rate page rule reads cleanly:
groups:
- name: laravel-golden-signals
rules:
- alert: HighErrorRate
expr: |
sum(rate(http_server_request_errors_total[5m]))
/ sum(rate(http_server_requests_total[5m])) > 0.02
for: 5m
labels:
severity: page
annotations:
summary: "Error rate above 2% for 5 minutes"
- alert: QueueSaturation
expr: laravel_queue_pending_jobs > 5000
for: 10m
labels:
severity: page
annotations:
summary: "Queue backlog growing — workers cannot keep up"
If you do not run a metrics backend, Nightwatch's built-in smart alerts cover the same ground without the YAML — it groups exceptions into issues and notifies the right people based on the type of issue. Either way, the principle holds: alert on saturation and error rate, dashboard the vanity metrics, and tie every page to a runbook so the person woken up knows what to do.
Test your observability instrumentation#
Instrumentation is code, and code that is never tested rots silently — a refactor removes the listener, the recorder stops firing, and you do not notice until an incident reveals the gap. You can assert that spans are produced without exporting anything to a real backend by wiring the SDK to an in-memory exporter inside a Pest test:
use OpenTelemetry\SDK\Trace\TracerProvider;
use OpenTelemetry\SDK\Trace\SpanExporter\InMemoryExporter;
use OpenTelemetry\SDK\Trace\SpanProcessor\SimpleSpanProcessor;
it('emits a charge span during checkout', function () {
$exporter = new InMemoryExporter();
$tracerProvider = new TracerProvider(
new SimpleSpanProcessor($exporter)
);
// run the code under test using $tracerProvider...
$spans = $exporter->getSpans();
expect($spans)->toHaveCount(1)
->and($spans[0]->getName())->toBe('checkout.charge')
->and($spans[0]->getAttributes()->get('order.id'))->not->toBeNull();
});
For Pulse recorders, dispatch the event the recorder listens for and assert that an entry lands in the pulse_aggregates or pulse_values table. For log correlation, fake the log channel and assert your trace_id is present in the context. None of this needs a running collector, so it belongs in your normal composer test run and CI pipeline alongside everything else.
Common mistakes#
The most expensive mistake is cardinality blowup — putting high-uniqueness values like user IDs, order IDs, or full URLs with query strings into metric labels or span attributes that get indexed as dimensions. Each unique value multiplies storage and can take down your metrics backend. Keep IDs as log fields and low-cardinality values (route name, status code, queue name) as metric dimensions.
The second is alert fatigue from paging on causes instead of symptoms. A spike in CPU is a cause; a user-facing latency SLO breach is a symptom. Page on symptoms — the golden signals — and let the causes live on a dashboard you consult during the investigation.
The third is sampling carelessly: either tracing 100% of traffic and drowning in cost, or sampling so aggressively that you never capture the rare slow request. Sample healthy requests heavily, keep all errors, and use parent-based sampling so a sampled trace stays whole across service boundaries.
Finally, do not leave Pulse or Telescope ungated in production. An open /pulse or /telescope route leaks query patterns, user activity, and internal structure to anyone who finds the URL. Gate them, every time.
Wrapping up#
Production observability in Laravel is not one tool — it is the three pillars assembled deliberately. Pulse gives you always-on metrics and a health dashboard for almost no effort. Nightwatch adds the hosted, team-ready detail and alerting that aggregated metrics cannot provide. OpenTelemetry delivers vendor-neutral distributed tracing that follows a request across queues and external services, and a shared trace_id stitches your traces to your logs. Layer alerting on the golden signals on top, and a vague "checkout is slow" becomes a trace you can open and a log you can read.
If you are rolling this out, start with Pulse this week, add Nightwatch when you need the production detail, and reach for OpenTelemetry once you have a real "where did the time go?" question to answer. From here, three next steps: harden the surrounding stack with the 2026 Laravel developer toolchain, make sure your background work is observable by tuning queue workers with max-jobs and max-time, and if you self-host your error tracking, see the walkthrough on running self-hosted Sentry on EC2 with Forge.
FAQ#
What is the difference between monitoring and observability?
Monitoring checks known failure modes against thresholds you define in advance — error rate over 2%, CPU over 80% — and tells you when a number crosses a line. Observability is the broader ability to ask new questions of your running system after the fact, like "why was this one request slow?", without shipping new code to find the answer. Monitoring tells you something is wrong; observability helps you explore why. In a Laravel app, your Pulse dashboard and alert rules are monitoring, while your traces and structured logs are what make the system observable.
Is Laravel Pulse enough for production monitoring?
Pulse is excellent for the metrics pillar — slow queries, slow jobs, busy routes, and custom business metrics on a self-hosted dashboard — and for many small to mid-sized apps it is enough to catch the obvious bottlenecks. What it cannot do is retain per-request detail or relationships, so it will never show you the single failed request for one customer or trace a request across a queue. For that you add Nightwatch for production detail and alerting, and OpenTelemetry for distributed tracing. Think of Pulse as the always-on health view, not the whole story.
What is Laravel Nightwatch and how does it differ from Pulse?
Nightwatch is Laravel's first-party hosted monitoring service. It ingests every event in your app — requests, jobs, queries, exceptions, mail, notifications, and outgoing HTTP — and connects them into a single timeline, turning exceptions into trackable, assignable issues with smart alerts. Pulse is self-hosted and stores aggregated metrics in your own database; Nightwatch is hosted and retains the detailed, connected event data with team workflows on top. Pulse answers "is the app healthy?"; Nightwatch answers "what exactly happened in production, and who was affected?"
How do I add OpenTelemetry tracing to a Laravel app?
Install the opentelemetry PHP extension via PECL, then add the Composer packages open-telemetry/sdk, open-telemetry/exporter-otlp, open-telemetry/opentelemetry-auto-laravel, and php-http/guzzle7-adapter. Configure it through environment variables — set OTEL_PHP_AUTOLOAD_ENABLED=true, a service name, the OTLP exporter and endpoint, and a sampler. The auto-instrumentation package then produces spans for requests, queries, cache, HTTP client calls, and queue dispatches with no code changes, and you can add manual spans around business logic where you need more detail. Export the traces to any OTLP-compatible backend like Tempo, Jaeger, or SigNoz.
What are the golden signals and which should I alert on?
The four golden signals from Google's SRE practice are latency (how long requests take), traffic (how much demand the system is under), errors (the rate of failed requests), and saturation (how full your resources are, such as queue backlog or memory). Alert — meaning page a human — on the signals that map directly to user pain: error rate, latency past your SLO, and saturation that will soon cause failures. Traffic and most other metrics belong on a dashboard you consult during an investigation rather than as a page, because they describe conditions rather than user-facing harm.
How do I trace a job that runs on a queue worker?
You propagate the trace context from the dispatcher into the job so the job's span becomes a child of the originating request's trace instead of an orphaned, separate trace. Inject the OpenTelemetry context into a carrier on the job before dispatch, then extract it in handle() and pass it as the parent when you start the job's span. Laravel's Context facade helps too: it automatically survives the queue boundary, so any trace_id you stored for log correlation is already present inside the job. Together you get one trace spanning the request, the job, and any outgoing HTTP calls the job makes.
Should I self-host observability or use a SaaS APM?
It depends on your data-sensitivity, scale, and how much operational time you want to spend. Self-hosting with Pulse plus an OpenTelemetry Collector and a backend like Tempo or SigNoz keeps all data in your own infrastructure and avoids per-event pricing, but you own the upkeep and scaling. A SaaS option like Nightwatch or a commercial APM gets you running in minutes with alerting and retention handled for you, at the cost of event-based billing and sending data off-site. Many teams blend the two — self-hosted Pulse for always-on metrics, a hosted service for production detail and paging — and OpenTelemetry's vendor-neutral format means you can switch backends later without re-instrumenting.