Building a RAG Pipeline with Laravel AI SDK and pgvector

8 min read

Most AI tutorials have you sending prompts to a generic model and hoping for the best. Building a Laravel RAG pipeline with pgvector changes that: you embed your own documents, store vectors in PostgreSQL, and retrieve the most relevant context before the model sees the question. The model responds based on your data, not its training weights.

This article walks through the full implementation — chunking, embedding, storing, searching, and generating grounded answers — using Laravel 13's native AI SDK.

Setting Up pgvector for Laravel RAG

Before writing any PHP, you need the pgvector extension enabled in PostgreSQL. On most managed databases (RDS, Supabase, Neon) it's a single SQL command:

CREATE EXTENSION IF NOT EXISTS vector;

For local development, use the official pgvector Docker image:

# docker-compose.yml
services:
  postgres:
    image: pgvector/pgvector:pg17
    environment:
      POSTGRES_DB: laravel
      POSTGRES_USER: laravel
      POSTGRES_PASSWORD: secret
    ports:
      - "5432:5432"

Install the AI SDK if you haven't already:

composer require laravel/ai

Add your OpenAI key to .env:

AI_DEFAULT_PROVIDER=openai
OPENAI_API_KEY=sk-...

If you're new to the AI SDK, The Complete Guide to the Laravel AI SDK covers the full provider setup — Anthropic, Gemini, Ollama, and how to configure multiple providers.

Creating the Embeddings Migration

Laravel 13's migration builder has native vector column support. No extra PHP packages needed — the vector() method maps directly to pgvector:

<?php

use Illuminate\Database\Migrations\Migration;
use Illuminate\Database\Schema\Blueprint;
use Illuminate\Support\Facades\Schema;

return new class extends Migration
{
    public function up(): void
    {
        // Enables the pgvector extension if not already active
        Schema::ensureVectorExtensionExists();

        Schema::create('documents', function (Blueprint $table) {
            $table->id();
            $table->string('title');
            $table->text('content');       // full source text, kept for reference
            $table->text('chunk');         // the specific chunk that was embedded
            $table->unsignedInteger('chunk_index');
            $table->vector('embedding', dimensions: 1536)->index(); // HNSW index with cosine distance
            $table->timestamps();
        });
    }

    public function down(): void
    {
        Schema::dropIfExists('documents');
    }
};

The ->index() call on a vector column creates an HNSW (Hierarchical Navigable Small World) index with cosine distance by default. This gives you approximate nearest-neighbour search at scale — much faster than a full table scan once you have thousands of rows.

Dimensions are 1536 because that's what OpenAI's text-embedding-3-small model outputs. If you switch to text-embedding-3-large, change this to 3072 — and re-embed everything, as there's no in-place resize.

Create the Eloquent model:

<?php

namespace App\Models;

use Illuminate\Database\Eloquent\Model;

class Document extends Model
{
    protected $fillable = ['title', 'content', 'chunk', 'chunk_index', 'embedding'];

    protected function casts(): array
    {
        return [
            // Cast the vector column to a PHP array for storage and retrieval
            'embedding' => 'array',
        ];
    }
}

Chunking Documents Before Embedding

You can't embed an entire document in one API call. Token limits aside, large chunks produce blurry, averaged-out embeddings that surface poorly in similarity search. Smaller, focused chunks return sharper results.

I use a sliding-window chunker that tries to break at sentence boundaries:

<?php

namespace App\Services;

class DocumentChunker
{
    public function __construct(
        private int $chunkSize = 500,   // target characters per chunk
        private int $overlap = 50,       // overlap to preserve context across boundaries
    ) {}

    /**
     * Split text into overlapping chunks, breaking at sentence boundaries where possible.
     *
     * @return string[]
     */
    public function chunk(string $text): array
    {
        // Collapse whitespace so chunk sizes are predictable
        $text = trim(preg_replace('/\s+/', ' ', $text));
        $chunks = [];
        $length = strlen($text);
        $start = 0;

        while ($start < $length) {
            $chunk = substr($text, $start, $this->chunkSize);

            // Try to end at a sentence boundary rather than mid-word
            if ($start + $this->chunkSize < $length) {
                $lastPeriod = strrpos($chunk, '. ');
                if ($lastPeriod !== false && $lastPeriod > $this->chunkSize * 0.5) {
                    $chunk = substr($chunk, 0, $lastPeriod + 1);
                }
            }

            $chunks[] = trim($chunk);
            $start += strlen($chunk) - $this->overlap;
        }

        return array_filter($chunks);
    }
}

500-character chunks with 50-character overlap works well for most prose. Technical documentation with dense code examples might need larger windows. FAQs or short-form content works better smaller — experiment and check your similarity scores.

Generating and Storing Embeddings

Create an Artisan command to drive the embedding pipeline:

<?php

namespace App\Console\Commands;

use App\Models\Document;
use App\Services\DocumentChunker;
use Illuminate\Console\Command;
use Laravel\Ai\Embeddings;

class EmbedDocuments extends Command
{
    protected $signature = 'documents:embed {--fresh : Delete existing embeddings and start over}';
    protected $description = 'Generate vector embeddings for all source documents';

    public function handle(DocumentChunker $chunker): int
    {
        if ($this->option('fresh')) {
            Document::query()->delete();
            $this->info('Cleared existing embeddings.');
        }

        // Swap this for your real data source — database query, file scan, API fetch, etc.
        $sources = collect([
            [
                'title'   => 'Laravel Queue Documentation',
                'content' => file_get_contents(storage_path('docs/queues.txt')),
            ],
        ]);

        $bar = $this->output->createProgressBar($sources->count());
        $bar->start();

        foreach ($sources as $source) {
            $chunks = $chunker->chunk($source['content']);

            foreach ($chunks as $index => $chunk) {
                $response = Embeddings::for([$chunk])
                    ->cache()   // cache for 30 days — avoids redundant API calls on re-runs
                    ->generate();

                Document::create([
                    'title'       => $source['title'],
                    'content'     => $source['content'],
                    'chunk'       => $chunk,
                    'chunk_index' => $index,
                    'embedding'   => $response->embeddings[0],
                ]);
            }

            $bar->advance();
        }

        $bar->finish();
        $this->newLine();
        $this->info('All documents embedded.');

        return self::SUCCESS;
    }
}

Run it:

php artisan documents:embed

# Start fresh (re-embed everything)
php artisan documents:embed --fresh

For large document sets — thousands of files or pages — dispatch each document as a queued job instead of looping synchronously. See scaling Laravel queues in production for the Horizon setup that handles this well. If your workers run for long periods between restarts, controlling worker memory with --max-jobs and --max-time prevents gradual memory creep during large embedding batches.

Semantic Search with Vector Similarity — the Laravel RAG Pipeline Core

With embeddings in the database, semantic search is a single query builder call. whereVectorSimilarTo accepts either a pre-computed vector array or a plain string — when you pass a string, Laravel generates the embedding behind the scenes:

<?php

namespace App\Http\Controllers;

use App\Models\Document;
use Illuminate\Http\Request;

class SearchController extends Controller
{
    public function search(Request $request): \Illuminate\Http\JsonResponse
    {
        $request->validate(['q' => 'required|string|max:500']);

        // Pass the raw query string — Laravel handles embedding generation automatically
        $results = Document::query()
            ->whereVectorSimilarTo('embedding', $request->string('q'), minSimilarity: 0.5)
            ->select(['id', 'title', 'chunk'])
            ->limit(5)
            ->get();

        return response()->json($results);
    }
}

Results come back ordered by similarity. minSimilarity: 0.5 is a reasonable default — 1.0 means identical, 0.0 means completely unrelated. Technical documentation often needs 0.7+; broader knowledge bases work better around 0.4. Check your scores in development and tune accordingly.

To expose the raw distance score alongside results:

$results = Document::query()
    ->whereVectorSimilarTo('embedding', $query, minSimilarity: 0.4)
    ->selectVectorDistance('embedding', $query, as: 'distance')
    ->orderByVectorDistance('embedding', $query)
    ->limit(5)
    ->get();

foreach ($results as $doc) {
    // Distance is the cosine distance (lower = more similar)
    echo "{$doc->title}: {$doc->chunk} (distance: {$doc->distance})\n";
}

Grounding LLM Responses with Retrieved Context

Search alone isn't RAG. The full loop is: retrieve relevant chunks → inject as context → generate an answer constrained to that context.

<?php

namespace App\Services;

use App\Models\Document;
use Illuminate\Support\Facades\AI;

class RagService
{
    public function answer(string $question): string
    {
        // Step 1: retrieve the most relevant chunks for this question
        $context = Document::query()
            ->whereVectorSimilarTo('embedding', $question, minSimilarity: 0.5)
            ->select(['chunk'])
            ->limit(5)
            ->get()
            ->pluck('chunk')
            ->implode("\n\n---\n\n");

        if ($context === '') {
            return 'I could not find relevant information to answer that question.';
        }

        // Step 2: inject retrieved context into the system prompt and generate a grounded answer
        return AI::text(
            messages: [
                [
                    'role'    => 'system',
                    'content' => <<<PROMPT
                        You are a helpful assistant. Answer the user's question using ONLY the context
                        provided below. If the answer is not present in the context, say so clearly —
                        do not invent information.

                        Context:
                        {$context}
                    PROMPT,
                ],
                [
                    'role'    => 'user',
                    'content' => $question,
                ],
            ]
        );
    }
}

The critical phrase is using ONLY the context provided below. Without that constraint, the model fills gaps with training knowledge and the grounding breaks down.

For agentic setups where the model should decide when to search — and potentially rephrase and search again — the AI SDK's built-in SimilaritySearch tool handles retrieval automatically:

<?php

namespace App\Agents;

use App\Models\Document;
use Laravel\Ai\Contracts\Agent;
use Laravel\Ai\Contracts\HasTools;
use Laravel\Ai\Promptable;
use Laravel\Ai\Tools\SimilaritySearch;

class KnowledgeAgent implements Agent, HasTools
{
    use Promptable;

    public function instructions(): string
    {
        return 'You are a helpful assistant with access to a knowledge base. '
             . 'Use the similarity search tool to find relevant documentation before answering. '
             . 'Only answer based on what you find — do not make things up.';
    }

    public function tools(): iterable
    {
        return [
            SimilaritySearch::usingModel(
                model: Document::class,
                column: 'embedding',
                minSimilarity: 0.5,
                limit: 5,
            ),
        ];
    }
}

// Usage
$answer = (new KnowledgeAgent)->prompt('How do I configure Redis queues in Laravel?');

The agent decides when to call the search tool and can call it multiple times with different phrasings. Compare this approach with building tool-calling agents with Laravel Prism — Prism gives you a more explicit agentic loop if you need fine-grained control over each step.

Chunking and Indexing Strategies

A few things worth knowing before you go to production:

Fixed-size vs. semantic chunking. The sliding-window approach above works for most prose. For structured content — product descriptions, FAQ entries, changelog notes — chunk at the natural record boundary instead. One database row or one FAQ item per embedding is often better than slicing across them.

Chunk size trade-offs. Smaller chunks (200–400 chars) give precise retrieval but lose surrounding context. Larger chunks (800–1000 chars) preserve context but produce blurrier embeddings. If precision matters, store the parent document reference alongside each chunk and retrieve the full parent passage when building the LLM context.

Metadata filtering. You can combine vector similarity with standard SQL filters. Scope searches by user, category, or date before the similarity comparison:

$results = Document::query()
    ->where('category', 'billing')   // filter first, then rank by similarity
    ->whereVectorSimilarTo('embedding', $question, minSimilarity: 0.5)
    ->limit(5)
    ->get();

This keeps the result set relevant without sacrificing semantic ranking.

Gotchas and Edge Cases

Stale embeddings after content edits. If you update a document, its stored embedding no longer matches the new content. Add an Eloquent observer or a dirty flag in your embed command to detect changed records and re-embed them. The .cache() helper won't mask this — cache keys include the content, so edited text generates a fresh API call automatically.

Wrong dimensions on existing data. Switching from text-embedding-3-small (1536 dims) to text-embedding-3-large (3072 dims) requires dropping and recreating the vector column, then re-embedding everything. There's no in-place conversion.

Token limits on input. text-embedding-3-small accepts up to 8,191 tokens per input. At 500 characters per chunk you're well under that, but strip HTML tags and minify JSON before chunking structured data — whitespace inflates token counts quickly.

Cost at scale. text-embedding-3-small costs $0.02 per million tokens. A 1,000-word document split into 10 chunks of ~75 tokens each costs roughly $0.000015 per document — negligible until you're processing millions of documents. The .cache() call eliminates re-embedding costs when you re-run the command on unchanged content.

pgvector requires PostgreSQL 14+. SQLite and MySQL don't support vector columns. If you're on shared hosting without PostgreSQL, Supabase and Neon both offer managed PostgreSQL with pgvector enabled by default.

Wrapping Up

The full Laravel RAG pipeline with pgvector comes down to five steps: chunk your documents, embed each chunk, store the vectors in PostgreSQL, retrieve the top-k similar chunks at query time, and inject them as context before calling the LLM. Each step is a handful of lines with the Laravel AI SDK doing the heavy lifting.

The natural next step is moving the embedding command off the main process — scaling Laravel queues in production covers the Horizon configuration for processing thousands of documents in parallel without blocking your web workers. If you want to push further on the agent side, building tool-calling agents with Laravel Prism walks through a manual agentic loop that gives you explicit control over every retrieval and generation step.

laravel
ai
pgvector
rag
postgresql
Steven Richardson

Steven is a software engineer with a passion for building scalable web applications. He enjoys sharing his knowledge through articles and tutorials.