Processing large CSV files with Laravel's lazy collections

Use Laravel's LazyCollection to process large CSV files line-by-line. Avoid memory limit errors while keeping the full Collection API with generators.

Steven Richardson
Steven Richardson
· 5 min read

Importing large files with collect(file(...)) loads the entire CSV into memory before you touch a single row. On a 100k-row file that's 80–120 MB gone before any processing starts. Laravel lazy collections solve this: LazyCollection reads one row at a time using PHP generators, keeps memory under 5 MB, and still gives you the full Collection API.

The memory problem with eager collections#

Here's the pattern that bites people. Read the file, collect it, process each row:

// ❌ Loads the entire file into an array before iteration begins
$rows = collect(file(storage_path('imports/products.csv')));

$rows->each(function (string $line) {
    $fields = str_getcsv($line);
    Product::updateOrCreate(['sku' => $fields[0]], [
        'name'  => $fields[1],
        'price' => (float) $fields[2],
    ]);
});

file() reads the whole file into an array. collect() wraps that array. By the time your each() runs, PHP has already allocated memory for every row. On modest imports this works fine. Past 50k rows you start seeing Allowed memory size exhausted — especially if each row triggers an Eloquent call.

Laravel lazy collections large files — how LazyCollection works#

LazyCollection wraps a PHP generator. A generator yields one value at a time and suspends between yields. The collection methods (filter, map, chunk, each) operate on these yielded values one-by-one rather than materialising the entire sequence upfront.

use Illuminate\Support\LazyCollection;

$rows = LazyCollection::make(function () {
    $handle = fopen(storage_path('imports/products.csv'), 'r');

    fgetcsv($handle); // skip the header row

    while (($row = fgetcsv($handle)) !== false) {
        yield $row; // one row at a time, then pause
    }

    fclose($handle);
});

The file handle opens once. Each yield hands one parsed row to the collection pipeline and waits. At no point is the full file in memory.

Chaining collection methods lazily#

LazyCollection implements the same interface as Collection, so you chain methods exactly as you would with an eager collection:

$rows
    ->filter(fn (array $row) => ! empty($row[0]))      // skip blank/malformed rows
    ->map(fn (array $row) => [                          // transform to named keys
        'sku'   => trim($row[0]),
        'name'  => trim($row[1]),
        'price' => (float) $row[2],
    ])
    ->chunk(500)                                        // group into 500-row batches
    ->each(function (LazyCollection $chunk) {
        ImportProductsJob::dispatch($chunk->all());     // all() materialises the chunk array
    });

filter() and map() stay lazy — they don't run until a terminal method pulls values through. chunk() returns a LazyCollection of LazyCollection objects; each $chunk inside each() is itself a lazy collection. Calling ->all() on the chunk converts it to a plain array for the job payload — that's intentional here since the job needs a serialisable value.

Dispatching chunk jobs for database upserts#

The job receives a plain array of prepared rows and hands them to upsert() in a single query:

// app/Jobs/ImportProductsJob.php
class ImportProductsJob implements ShouldQueue
{
    public function __construct(private readonly array $rows) {}

    public function handle(): void
    {
        Product::upsert(
            $this->rows,
            uniqueBy: ['sku'],
            update: ['name', 'price', 'updated_at'],
        );
    }
}

500 rows per job means a single INSERT ... ON DUPLICATE KEY UPDATE per batch — far cheaper than 500 individual updateOrCreate() calls. Adjust the chunk size based on your row width and database write latency. I've found 250–500 works well for typical product imports.

Gotchas and edge cases#

count(), all(), toArray(), and reverse() force eager evaluation. Calling $rows->count() on a LazyCollection pulls every value through the generator to produce a number. Avoid these at the top of your pipeline — use them only inside chunk callbacks where you want a concrete value.

chunk() after values() if you use yield from. If your generator delegates to another generator with yield from, numeric index collisions can cause chunk() to silently drop rows. Call ->values() before ->chunk() to reset indices:

LazyCollection::make(function () {
    yield from $this->firstBatch();  // yields [0 => ..., 1 => ..., ...]
    yield from $this->secondBatch(); // same indices — chunk() will lose rows
})
->values()   // re-key to 0, 1, 2, 3... across both generators
->chunk(500)
->each(...);

Don't use this pattern for database-backed LazyCollection. Eloquent's ->lazy() and ->lazyById() are separate — they use cursor-based pagination, not file generators. The pattern above is for file imports. For large Eloquent queries use ->lazyById() to avoid duplicate rows from non-unique ordering.

Memory isn't truly zero. Each chunk of 500 rows is materialised in memory when ->all() is called. You're trading 80 MB for ~2–5 MB of working memory per chunk. The file itself is never fully loaded.

Wrapping up#

Swap collect(file(...)) for a LazyCollection::make() generator and your memory footprint drops from triple digits to single digits — no change to the Collection API you already know. Add ->chunk() with a queued job and you get parallel database writes on top. For any import over a few thousand rows, this is the pattern I reach for first.

Once you're dispatching chunk jobs at scale, Scaling Laravel queues in production covers the queue topology and Horizon configuration to handle high-throughput batch imports reliably. Monitoring Laravel queues with Horizon gives you the dashboard visibility to track batch job progress in real time. If your import rows map to domain objects, PHP readonly classes as value objects is a clean pattern for typing each row before passing it to upsert().

FAQ#

How do I know if LazyCollection is the right choice for my import?

Use LazyCollection for files larger than a few thousand rows where you can't afford to load them all into memory at once. For typical CSV imports (products, users, orders), LazyCollection + chunked batch jobs is the pattern. If your file is under 1,000 rows, collect(file(...)) is fine.

What happens if a job fails while processing a chunk?

The chunk is re-queued for retry. If you're using upsert() without idempotency guards, the same rows will be inserted again on retry. Add a unique constraint or check for existing records before upserting to make the operation idempotent.

Can I use LazyCollection with database queries instead of files?

Yes, but use Eloquent's ->lazy() or ->lazyById() methods instead of LazyCollection::make(). Eloquent's lazy methods use cursor-based pagination internally and handle offset issues automatically.

Why would chunk() drop rows if I use yield from incorrectly?

When you delegate between generators with yield from, both generators use the same numeric indices (0, 1, 2...). When the second generator starts, its index 0 collides with the first's index 0. Calling ->values() re-keys the collection to 0, 1, 2, 3... across all values, preventing the collision.

Steven Richardson
Steven Richardson

CTO at Digitonic. Writing about Laravel, architecture, and the craft of leading software teams from the west coast of Scotland.