Processing large CSV files with Laravel's lazy collections

4 min read

Importing large files with collect(file(...)) loads the entire CSV into memory before you touch a single row. On a 100k-row file that's 80–120 MB gone before any processing starts. Laravel lazy collections solve this: LazyCollection reads one row at a time using PHP generators, keeps memory under 5 MB, and still gives you the full Collection API.

The memory problem with eager collections

Here's the pattern that bites people. Read the file, collect it, process each row:

// ❌ Loads the entire file into an array before iteration begins
$rows = collect(file(storage_path('imports/products.csv')));

$rows->each(function (string $line) {
    $fields = str_getcsv($line);
    Product::updateOrCreate(['sku' => $fields[0]], [
        'name'  => $fields[1],
        'price' => (float) $fields[2],
    ]);
});

file() reads the whole file into an array. collect() wraps that array. By the time your each() runs, PHP has already allocated memory for every row. On modest imports this works fine. Past 50k rows you start seeing Allowed memory size exhausted — especially if each row triggers an Eloquent call.

Laravel lazy collections large files — how LazyCollection works

LazyCollection wraps a PHP generator. A generator yields one value at a time and suspends between yields. The collection methods (filter, map, chunk, each) operate on these yielded values one-by-one rather than materialising the entire sequence upfront.

use Illuminate\Support\LazyCollection;

$rows = LazyCollection::make(function () {
    $handle = fopen(storage_path('imports/products.csv'), 'r');

    fgetcsv($handle); // skip the header row

    while (($row = fgetcsv($handle)) !== false) {
        yield $row; // one row at a time, then pause
    }

    fclose($handle);
});

The file handle opens once. Each yield hands one parsed row to the collection pipeline and waits. At no point is the full file in memory.

Chaining collection methods lazily

LazyCollection implements the same interface as Collection, so you chain methods exactly as you would with an eager collection:

$rows
    ->filter(fn (array $row) => ! empty($row[0]))      // skip blank/malformed rows
    ->map(fn (array $row) => [                          // transform to named keys
        'sku'   => trim($row[0]),
        'name'  => trim($row[1]),
        'price' => (float) $row[2],
    ])
    ->chunk(500)                                        // group into 500-row batches
    ->each(function (LazyCollection $chunk) {
        ImportProductsJob::dispatch($chunk->all());     // all() materialises the chunk array
    });

filter() and map() stay lazy — they don't run until a terminal method pulls values through. chunk() returns a LazyCollection of LazyCollection objects; each $chunk inside each() is itself a lazy collection. Calling ->all() on the chunk converts it to a plain array for the job payload — that's intentional here since the job needs a serialisable value.

Dispatching chunk jobs for database upserts

The job receives a plain array of prepared rows and hands them to upsert() in a single query:

// app/Jobs/ImportProductsJob.php
class ImportProductsJob implements ShouldQueue
{
    public function __construct(private readonly array $rows) {}

    public function handle(): void
    {
        Product::upsert(
            $this->rows,
            uniqueBy: ['sku'],
            update: ['name', 'price', 'updated_at'],
        );
    }
}

500 rows per job means a single INSERT ... ON DUPLICATE KEY UPDATE per batch — far cheaper than 500 individual updateOrCreate() calls. Adjust the chunk size based on your row width and database write latency. I've found 250–500 works well for typical product imports.

Gotchas and edge cases

count(), all(), toArray(), and reverse() force eager evaluation. Calling $rows->count() on a LazyCollection pulls every value through the generator to produce a number. Avoid these at the top of your pipeline — use them only inside chunk callbacks where you want a concrete value.

chunk() after values() if you use yield from. If your generator delegates to another generator with yield from, numeric index collisions can cause chunk() to silently drop rows. Call ->values() before ->chunk() to reset indices:

LazyCollection::make(function () {
    yield from $this->firstBatch();  // yields [0 => ..., 1 => ..., ...]
    yield from $this->secondBatch(); // same indices — chunk() will lose rows
})
->values()   // re-key to 0, 1, 2, 3... across both generators
->chunk(500)
->each(...);

Don't use this pattern for database-backed LazyCollection. Eloquent's ->lazy() and ->lazyById() are separate — they use cursor-based pagination, not file generators. The pattern above is for file imports. For large Eloquent queries use ->lazyById() to avoid duplicate rows from non-unique ordering.

Memory isn't truly zero. Each chunk of 500 rows is materialised in memory when ->all() is called. You're trading 80 MB for ~2–5 MB of working memory per chunk. The file itself is never fully loaded.

Wrapping up

Swap collect(file(...)) for a LazyCollection::make() generator and your memory footprint drops from triple digits to single digits — no change to the Collection API you already know. Add ->chunk() with a queued job and you get parallel database writes on top. For any import over a few thousand rows, this is the pattern I reach for first.

laravel
laravel-12
collections
performance
Steven Richardson

Steven is a software engineer with a passion for building scalable web applications. He enjoys sharing his knowledge through articles and tutorials.