paulund

Working with Large Datasets

Working with Large Datasets

Loading an entire table into memory works fine for small datasets. It does not work when your table has millions of rows. Laravel provides several tools to process large volumes of data efficiently without exhausting your server's memory.

Chunking

chunk() retrieves a fixed number of rows at a time and passes them to a callback. Laravel issues a new query for each chunk, keeping memory usage flat:

User::orderBy('id')->chunk(1000, function (Collection $users) {
    foreach ($users as $user) {
        // Process each user
        $user->update(['last_synced_at' => now()]);
    }
});

Return false from the callback to stop early:

User::orderBy('id')->chunk(500, function (Collection $users) {
    foreach ($users as $user) {
        if ($user->balance < 0) {
            // Stop processing when we hit a negative balance
            return false;
        }
    }
});

Lazy Collections

Laravel's lazy() method streams results one row at a time using PHP generators. It uses even less memory than chunk() because it never holds more than one model in memory:

User::orderBy('id')->lazy()->each(function (User $user) {
    $user->update(['status' => 'migrated']);
});

You can chain standard collection methods on a lazy collection — filter, map, take — and they evaluate lazily:

User::orderBy('id')->lazy()
    ->filter(fn(User $user) => $user->is_active)
    ->take(100)
    ->each(function (User $user) {
        SendNewsletterJob::dispatch($user);
    });

Cursor Pagination

Offset-based pagination (LIMIT x OFFSET y) becomes painfully slow on large tables because the database must scan and skip y rows every time. Cursor pagination avoids this by using a WHERE id > last_seen_id clause instead:

$users = User::orderBy('id')->cursorPaginate(25);

The resulting paginator includes nextCursor and previousCursor values that the client passes back on the next request. This works efficiently regardless of table size.

paginate() vs simplePaginate() vs cursorPaginate()

Method Knows total pages? Efficient on large tables? Use when
paginate() Yes No (runs a COUNT query) Small-to-medium tables, admin interfaces
simplePaginate() No Yes You only need next/previous links
cursorPaginate() No Yes Very large tables, infinite scroll

Inserting Large Datasets

Inserting rows one at a time is slow. Use insert() to batch them:

$records = collect(range(1, 10000))->map(fn($i) => [
    'name'       => "User {$i}",
    'email'      => "user{$i}@example.com",
    'created_at' => now(),
    'updated_at' => now(),
]);

// Insert in batches of 1000
$records->chunk(1000)->each(function (Collection $batch) {
    User::insert($batch->toArray());
});

For upserts (insert or update if a row already exists), Laravel provides upsert():

User::upsert($records, ['email'], ['name', 'updated_at']);

The second argument is the unique column(s) used to detect conflicts; the third is the list of columns to update on conflict.

Tips

  • Prefer lazyById() over lazy() when your table has an auto-incrementing primary key. It uses keyed chunking under the hood, which is more efficient and avoids issues with rows being added or deleted mid-scan.
  • Always orderBy when chunking or using cursor pagination. Without a stable sort order, rows can appear in different positions between queries and you may miss or duplicate data.
  • If you are running a long-running batch job, consider dispatching it as a Laravel job so it does not block a web request.