# 5 Laravel Queue Failures That Only Show Up in Production

**Author:** Mozex | **Published:** 2026-04-07 | **Tags:** Laravel, PHP, DevOps | **URL:** https://mozex.dev/blog/12-5-laravel-queue-failures-that-only-show-up-in-production

---


Your queue works perfectly in local. Every job dispatches, processes, and completes without a hitch. Then you deploy to production with real traffic, real concurrency, and real third-party APIs, and things start breaking in ways your test suite never predicted.

I've been running Laravel queues in production for years across multiple applications. Every failure on this list caught me off guard at least once. Not because the documentation doesn't cover them, but because you don't think about them until they bite you at 2 AM.

<!--more-->

## 1. Your Job Runs Before the Data Exists

This one is subtle and maddening. You create a record, dispatch a job to process it, and the job fails with "model not found." The record is right there in the database when you check manually. So what happened?

You dispatched inside a database transaction.

```php
DB::transaction(function () {
    $order = Order::create([
        'user_id' => $user->id,
        'total' => $cart->total(),
    ]);

    ProcessOrder::dispatch($order);
});
```

The job gets pushed to Redis immediately, but the transaction hasn't committed yet. If the queue worker picks it up before the commit, the `Order` row doesn't exist. The job fails, retries a few times, and maybe succeeds on the third attempt when the transaction has finally landed. Or it exhausts retries and dies.

The fix is one method call:

```php
ProcessOrder::dispatch($order)->afterCommit();
```

Or set it globally in `config/queue.php`:

```php
'connections' => [
    'redis' => [
        'driver' => 'redis',
        'after_commit' => true,
        // ...
    ],
],
```

With `after_commit` enabled, Laravel holds the dispatch until the transaction commits. If the transaction rolls back, the job never enters the queue. I set this globally on every app now. I've never once wanted a job to fire before its transaction commits.

## 2. Workers Silently Eating All Your Memory

Queue workers are long-running PHP processes. Unlike HTTP requests that die after serving a response, workers persist for hours or days. Every job they process leaves a tiny memory footprint behind. Eloquent model caches, event listeners, service container bindings, log contexts. None of it gets fully cleaned up.

After a few thousand jobs, your worker is sitting at 500MB. Your server starts swapping. Other processes slow to a crawl. And because nothing "failed," there's no alert.

The fix is to let workers die on purpose:

```bash
php artisan queue:work redis --max-jobs=1000 --max-time=3600 --memory=128
```

`--max-jobs=1000` means the worker exits after processing 1,000 jobs. `--max-time=3600` gives it a one-hour hard cap. `--memory=128` exits if memory crosses 128MB. When the worker exits, Supervisor restarts it fresh. Clean slate.

Here's the Supervisor config that makes this work:

```ini
[program:laravel-worker]
process_name=%(program_name)s_%(process_num)02d
command=php /var/www/app/artisan queue:work redis --sleep=3 --tries=3 --max-jobs=1000 --max-time=3600 --memory=128
autostart=true
autorestart=true
stopasgroup=true
killasgroup=true
numprocs=4
stopwaitsecs=300
stdout_logfile=/var/www/app/storage/logs/worker.log
```

Two settings here matter more than people realize: `stopwaitsecs=300` gives a running job up to 5 minutes to finish before Supervisor kills it. And `stopasgroup=true` ensures child processes die with the parent, preventing zombie workers that consume resources but don't process anything.

If you're using Horizon, it manages restarts and memory limits for you. But with raw Supervisor, these settings are the difference between a stable queue and a server that degrades over time.

## 3. Deployments Killing Jobs Mid-Execution

You deploy new code. Supervisor restarts your workers. But one of those workers was halfway through a job: it charged the customer's credit card, but it hasn't recorded the payment in your database yet.

The worker dies. The job returns to the queue. It runs again with fresh code. The customer gets charged twice.

Two defenses prevent this.

First, give workers time to finish. That `stopwaitsecs=300` from the previous section is critical. When Supervisor sends `SIGTERM`, Laravel's worker finishes its current job before exiting. But only if Supervisor waits long enough. Without sufficient `stopwaitsecs`, Supervisor sends `SIGKILL` almost immediately, and the job gets interrupted at whatever point it reached.

Second, make every job idempotent. Assume it will run more than once, because eventually it will:

```php
public function handle(): void
{
    $existing = Payment::where('order_id', $this->order->id)
        ->where('idempotency_key', $this->idempotencyKey)
        ->first();

    if ($existing) {
        return;
    }

    $charge = $this->paymentGateway->charge(
        $this->order->total_in_cents,
        $this->order->payment_method_id,
    );

    Payment::create([
        'order_id' => $this->order->id,
        'charge_id' => $charge->id,
        'idempotency_key' => $this->idempotencyKey,
    ]);
}
```

The idempotency key gets generated when the job is dispatched, not when it runs. If the same job executes twice, the second run finds the existing payment and exits cleanly. No double charges.

## 4. Unique Job Locks That Expire Too Early

Laravel's `ShouldBeUnique` interface prevents duplicate jobs. Dispatch a job, and any identical dispatch is dropped until the first one completes.

Except when the lock expires before the job finishes.

```php
class ProcessReport implements ShouldQueue, ShouldBeUnique
{
    public $uniqueFor = 60;

    public function handle(): void
    {
        // Takes 90 seconds on large datasets
        $this->generateReport();
        $this->emailReport();
    }
}
```

If report generation takes longer than 60 seconds on a large dataset, the lock expires. A second `ProcessReport` starts. Both send the email. The user gets duplicate reports.

Set `uniqueFor` well above your worst-case execution time. If a job normally takes 30 seconds but occasionally takes 90, set the lock to 300. A lock that lasts too long is harmless. A lock that expires too early causes duplicate processing.

Also watch your cache driver. Unique locks use your application's cache. If that's the `file` driver, locks don't work across multiple servers. If it's Redis and Redis restarts, every lock vanishes and all pending unique jobs can run simultaneously.

## 5. Retry Storms Against External APIs

A third-party API goes down. Your jobs that call it start failing. Each failure triggers a retry. With 3 retries per job and 50 failed jobs, you're suddenly sending 150 requests to a service that's already struggling. Multiply by the number of workers, and you're making things worse for everyone.

The default retry behavior is fast: fail, wait a few seconds, try again. Fine for transient blips. Terrible for sustained outages.

Use exponential backoff:

```php
public function backoff(): array
{
    return [30, 60, 300];
}
```

First retry waits 30 seconds. Second waits a minute. Third waits 5 minutes. This gives the external service breathing room instead of piling on.

For critical integrations, add a circuit breaker:

```php
public function handle(): void
{
    $failures = Cache::get('external-api:failures', 0);

    if ($failures > 10) {
        $this->release(300);
        return;
    }

    try {
        $response = Http::timeout(10)
            ->throw()
            ->post('https://api.example.com/process', $this->payload);

        Cache::forget('external-api:failures');
    } catch (ConnectionException $e) {
        Cache::increment('external-api:failures');
        throw $e;
    }
}
```

When failures accumulate past a threshold, jobs stop attempting the call and release themselves back to the queue with a 5-minute delay. The first successful request clears the counter. Simple, effective, and it prevents your queue from participating in someone else's outage.

## The Common Thread

Every failure here shares a root cause: production has concurrency, timing, and persistence characteristics that local development doesn't. One worker, one database, no real traffic, and no external services having a bad day. That's local. Production is the opposite of that.

The fix isn't more testing, though that helps. It's defensive design: assume jobs will run twice, assume workers will die mid-execution, assume external services will fail, and assume your deploy will happen at the worst possible moment. Build for those assumptions and your queues will survive the real world.