The Complete Guide to the Laravel AI SDK in Laravel 13
Laravel 13 shipped with a first-party AI SDK, and within a week I'd ripped three vendor SDKs and a home-rolled OpenAI wrapper out of a client project and replaced them with laravel/ai. The official docs are deliberately concise — they show you the shape of the API but leave the "how do I actually build something with this" part to you. This guide fills that gap. We'll walk the full surface of the Laravel AI SDK by building a real docs-search chatbot: agents, streaming, structured output, tools, embeddings, pgvector RAG, testing, and production hardening.
What you'll learn
- How the Laravel AI SDK works end-to-end and how it differs from Prism
- How to build typed agents with instructions, conversation memory, and tools
- How to stream responses over SSE and into a Livewire component
- How to generate embeddings and run cosine-similarity search with pgvector
- How to build a production-grade RAG pipeline with failover, caching, and cost controls
- How to test AI-backed code with
fake()andpreventStrayPrompts()
Why Laravel 13 ships its own AI SDK
Before November 2026, if you wanted AI in a Laravel app you had three realistic choices. You could bolt on openai-php/client (or the vendor SDK of your chosen provider), accept the vendor lock-in, and scatter API-shaped code through your services. You could adopt Prism, which gave you a clean fluent API and provider abstraction but lived outside the framework. Or you could build your own service layer over Http::post() and regret it at 2 a.m. three months later.
The Laravel AI SDK, introduced in Laravel 13 and maintained by the core team, absorbs the best ideas from Prism and from Vercel's AI SDK for JavaScript. It ships with a unified API across OpenAI, Anthropic, Gemini, Groq, xAI, DeepSeek, Mistral and Ollama. It adds structured output backed by Laravel's own JsonSchema contract, tool calling with automatic execution, streaming over SSE, file storage, vector stores, reranking, audio (TTS/STT), and first-class testing helpers. Crucially, it's tied into other Laravel 13 additions: native vector column support on PostgreSQL via pgvector, a Str::of(...)->toEmbeddings() helper, and Eloquent query builder methods like whereVectorSimilarTo.
You should prefer the first-party SDK over Prism on any new Laravel 13 project. Prism is still excellent and is not going away — its author TJ Miller now works at Laravel and much of the SDK's design reflects that lineage — but for a greenfield app the reduced dependency surface, the tighter framework integration, and the built-in migrations for conversation memory are hard to beat. If you're on Laravel 12 or earlier, stick with Prism for now.
For this guide I'm assuming Laravel 13.0 or later (the SDK requires it) and PHP 8.3+. Version numbers matter here because the SDK's JsonSchema contract only exists in Laravel 13, and whereVectorSimilarTo is a new query builder method.
Installing and configuring the SDK
Install the package via Composer, publish its config and migrations, and run them:
composer require laravel/ai
php artisan vendor:publish --provider="Laravel\Ai\AiServiceProvider"
php artisan migrate
The migrations create two tables — agent_conversations and agent_conversation_messages — that back the optional RemembersConversations trait. You can skip the migrate step if you plan to manage conversation history yourself, but I'd keep it: the default store is good enough for most use cases and you can always ignore it.
Next, add API keys to .env for whichever providers you plan to use:
OPENAI_API_KEY=sk-...
ANTHROPIC_API_KEY=sk-ant-...
OLLAMA_API_KEY=
Ollama needs no key for local use, but the SDK expects the variable to exist. In config/ai.php you can pin default models per capability and configure custom base URLs for gateways like LiteLLM or Azure OpenAI. For this guide, we'll accept the defaults and override per-agent where it matters.
A quick sanity check before writing real code — in routes/web.php:
use function Laravel\Ai\agent;
Route::get('/ping-ai', function () {
return (string) agent(
instructions: 'You are terse. Reply in five words or fewer.',
)->prompt('Is Laravel 13 out?');
});
The agent() helper gives you an anonymous agent for quick experiments. If you see a five-word response, your credentials are good.
Your first agent: text generation with memory
Anonymous agents are fine for spikes, but real applications want dedicated agent classes. Generate one with the included Artisan command:
php artisan make:agent DocsAssistant
That scaffolds app/Ai/Agents/DocsAssistant.php. We'll flesh it out as a conversational assistant that remembers previous turns for the current user:
<?php
namespace App\Ai\Agents;
use Laravel\Ai\Attributes\MaxTokens;
use Laravel\Ai\Attributes\Model;
use Laravel\Ai\Attributes\Provider;
use Laravel\Ai\Attributes\Temperature;
use Laravel\Ai\Concerns\RemembersConversations;
use Laravel\Ai\Contracts\Agent;
use Laravel\Ai\Contracts\Conversational;
use Laravel\Ai\Enums\Lab;
use Laravel\Ai\Promptable;
#[Provider(Lab::Anthropic)]
#[Model('claude-haiku-4-5-20251001')]
#[MaxTokens(1024)]
#[Temperature(0.3)]
class DocsAssistant implements Agent, Conversational
{
use Promptable, RemembersConversations;
public function instructions(): string
{
return <<<PROMPT
You are a helpful Laravel documentation assistant. Answer concisely,
quote APIs verbatim, and never invent package names. If you are not
sure, say so and suggest where the user should look in the docs.
PROMPT;
}
}
A few things are worth pausing on. PHP attributes configure the agent declaratively — #[Provider], #[Model], #[Temperature] and #[MaxTokens] are read at runtime to build the prompt. RemembersConversations plugs in the built-in store so that prior messages are loaded automatically, and Conversational is the interface it satisfies. The Promptable trait is what gives your class the fluent prompt(), stream() and queue() methods.
Calling the agent from a controller looks like this:
use App\Ai\Agents\DocsAssistant;
use Illuminate\Http\Request;
Route::post('/chat', function (Request $request) {
$user = $request->user();
$response = (new DocsAssistant)
->forUser($user)
->prompt($request->string('message'));
return [
'reply' => (string) $response,
'conversation' => $response->conversationId,
];
});
The first call creates a new row in agent_conversations. Pass the returned conversationId back on subsequent requests with ->continue($conversationId, as: $user)->prompt(...) and the SDK reloads the last N messages for you. This is the single biggest ergonomics win over rolling your own: you get persistent chat without touching a migration.
Streaming responses into a Livewire component
For anything longer than a sentence, streaming is mandatory — users will abandon a 10-second blank screen. The AI SDK makes this trivial. Returning a StreamableAgentResponse directly from a route gives you a Server-Sent Events endpoint:
Route::post('/chat/stream', function (Request $request) {
return (new DocsAssistant)
->forUser($request->user())
->stream($request->string('message'));
});
That's the entire backend. On the frontend, you can consume it with vanilla EventSource, but I usually reach for Livewire 4 because it already understands streamed attributes. Here's a minimal component — if you haven't upgraded yet, see the Livewire 3 to 4 migration guide for the relevant breaking changes.
<?php
namespace App\Livewire;
use App\Ai\Agents\DocsAssistant;
use Livewire\Attributes\Computed;
use Livewire\Component;
class DocsChat extends Component
{
public string $message = '';
public string $reply = '';
public ?string $conversationId = null;
public function send()
{
$this->reply = '';
$agent = (new DocsAssistant)->forUser(auth()->user());
$stream = $this->conversationId
? $agent->continue($this->conversationId, as: auth()->user())->stream($this->message)
: $agent->stream($this->message);
foreach ($stream as $event) {
if ($event->isText()) {
$this->reply .= $event->text;
$this->stream('reply'); // push token to the browser
}
}
$this->conversationId = $stream->response()->conversationId;
}
public function render()
{
return view('livewire.docs-chat');
}
}
In the Blade view, <div wire:stream="reply">{{ $reply }}</div> renders tokens as they arrive. If you're building a client that speaks the Vercel AI SDK protocol (e.g. a Next.js frontend), swap ->stream(...) for ->stream(...)->usingVercelDataProtocol() and you get a compatible wire format for free.
One production caveat: streaming holds a PHP-FPM worker open for the full duration of the response. Don't point Laravel Octane or Horizon queue workers at the same pool unless you've sized it for long-held requests, and put this endpoint behind fine-grained rate limiting on your API routes to stop runaway clients exhausting your worker pool.
Structured output: from raw text to typed objects
Free-form text is fine for chat, but most business logic wants typed data. Structured output makes the model return JSON that matches a schema you define, and the SDK deserialises it to an array you can access like a DTO. Implement the HasStructuredOutput interface and add a schema() method:
<?php
namespace App\Ai\Agents;
use Illuminate\Contracts\JsonSchema\JsonSchema;
use Laravel\Ai\Attributes\Model;
use Laravel\Ai\Attributes\Provider;
use Laravel\Ai\Contracts\Agent;
use Laravel\Ai\Contracts\HasStructuredOutput;
use Laravel\Ai\Enums\Lab;
use Laravel\Ai\Promptable;
#[Provider(Lab::OpenAI)]
#[Model('gpt-4.1-mini')]
class SupportTicketClassifier implements Agent, HasStructuredOutput
{
use Promptable;
public function instructions(): string
{
return 'Classify the support ticket into a category, urgency, and suggested reply.';
}
public function schema(JsonSchema $schema): array
{
return [
'category' => $schema->string()
->enum(['billing', 'bug', 'feature-request', 'other'])
->required(),
'urgency' => $schema->string()
->enum(['low', 'medium', 'high'])
->required(),
'suggested_reply' => $schema->string()->required(),
'tags' => $schema->array()->items($schema->string())->required(),
];
}
}
Call it and access the response like an array:
$classification = (new SupportTicketClassifier)->prompt($ticket->body);
Ticket::find($ticket->id)->update([
'category' => $classification['category'],
'urgency' => $classification['urgency'],
'tags' => $classification['tags'],
]);
I usually take this one step further and map the raw array into a PHP readonly value object at the service boundary, so the rest of my app never touches loosely-typed arrays from the AI. It turns the model output into something a static analyser can actually reason about.
final readonly class TicketClassification
{
public function __construct(
public string $category,
public string $urgency,
public string $suggestedReply,
/** @var string[] */
public array $tags,
) {}
public static function fromResponse(array $response): self
{
return new self(
category: $response['category'],
urgency: $response['urgency'],
suggestedReply: $response['suggested_reply'],
tags: $response['tags'],
);
}
}
Tool calling: giving the agent real abilities
Tools let the model call functions in your application. The SDK handles the back-and-forth loop — you just describe what the tool does and what it takes. Generate one:
php artisan make:tool LookupOrder
Then implement its handle() and schema():
<?php
namespace App\Ai\Tools;
use App\Models\Order;
use Illuminate\Contracts\JsonSchema\JsonSchema;
use Laravel\Ai\Contracts\Tool;
use Laravel\Ai\Tools\Request;
class LookupOrder implements Tool
{
public function description(): string
{
return 'Look up an order by its public order number and return status, total, and line items.';
}
public function handle(Request $request): string
{
$order = Order::where('number', $request['number'])->first();
if (! $order) {
return "No order found with number {$request['number']}.";
}
return json_encode([
'number' => $order->number,
'status' => $order->status,
'total' => $order->total->format(),
'items' => $order->items->map->only(['name', 'quantity'])->all(),
]);
}
public function schema(JsonSchema $schema): array
{
return [
'number' => $schema->string()
->description('The public order number, e.g. ORD-12345')
->required(),
];
}
}
Now expose the tool on any agent that needs it, alongside a MaxSteps attribute so the agentic loop can't run away:
use App\Ai\Tools\LookupOrder;
use Laravel\Ai\Attributes\MaxSteps;
use Laravel\Ai\Contracts\HasTools;
#[MaxSteps(5)]
class SupportAgent implements Agent, Conversational, HasTools
{
use Promptable, RemembersConversations;
public function instructions(): string
{
return 'You are a support agent. Use tools to look up real data. Never guess order details.';
}
public function tools(): iterable
{
return [
new LookupOrder,
];
}
}
MaxSteps(5) is the single most important production guard rail you can add. Without it, an agent that gets into a bad state can loop on tools until you hit your monthly budget. Five is plenty for almost any support workflow; I've only ever needed to raise it when building multi-step research agents.
Embeddings and vector search with pgvector
Embeddings turn text into high-dimensional vectors that capture meaning. Two pieces of text with similar meaning have vectors that are close together in cosine space — that's the foundation of every RAG, semantic search and deduplication feature built on LLMs.
Laravel 13 ships two APIs for this. The quick one is a Stringable macro:
use Illuminate\Support\Str;
$vector = Str::of('How do I queue a Laravel job?')->toEmbeddings();
// float[], e.g. [0.0123, -0.0451, ...]
The full-featured one is the Embeddings class, which supports batching, dimensions, caching and provider selection:
use Laravel\Ai\Embeddings;
use Laravel\Ai\Enums\Lab;
$response = Embeddings::for([
'How do I queue a Laravel job?',
'Configuring Horizon supervisors.',
])
->dimensions(1536)
->cache(seconds: 3600)
->generate(Lab::OpenAI, 'text-embedding-3-small');
$response->embeddings; // [[...1536 floats...], [...1536 floats...]]
To store embeddings, Laravel 13 adds a native vector column type on PostgreSQL via pgvector. Here's the migration for a documents table we'll use for the RAG demo:
use Illuminate\Database\Migrations\Migration;
use Illuminate\Database\Schema\Blueprint;
use Illuminate\Support\Facades\Schema;
return new class extends Migration {
public function up(): void
{
Schema::ensureVectorExtensionExists();
Schema::create('documents', function (Blueprint $table) {
$table->id();
$table->string('title');
$table->text('content');
$table->vector('embedding', dimensions: 1536)->index();
$table->timestamps();
});
}
};
Schema::ensureVectorExtensionExists() creates the pgvector extension if it's missing, and ->index() on the vector column builds an HNSW index with cosine distance — good enough for up to a million rows on a reasonably-sized database.
The Eloquent model needs the vector column cast to array:
class Document extends Model
{
protected $fillable = ['title', 'content', 'embedding'];
protected function casts(): array
{
return [
'embedding' => 'array',
];
}
}
Searching is a one-liner thanks to whereVectorSimilarTo:
$results = Document::query()
->whereVectorSimilarTo('embedding', $userQuery, minSimilarity: 0.5)
->limit(5)
->get();
Pass a raw string and the query builder automatically embeds it using your default provider. Pass a float[] if you've already generated the embedding (e.g. for reuse in the same request). Under the hood this generates a pgvector cosine distance query and orders results — no manual SQL required.
Building a RAG chatbot that actually ships
Now we tie everything together. The flow for Retrieval-Augmented Generation is straightforward: embed the user's question, retrieve the top-k most similar documents, inject them into the prompt as context, and let the agent generate an answer grounded in those documents.
The cleanest way to implement this in the AI SDK is the built-in SimilaritySearch tool, which exposes your document table as a tool the agent can call. Add it alongside your other tools:
<?php
namespace App\Ai\Agents;
use App\Models\Document;
use Laravel\Ai\Attributes\MaxSteps;
use Laravel\Ai\Contracts\Agent;
use Laravel\Ai\Contracts\Conversational;
use Laravel\Ai\Contracts\HasTools;
use Laravel\Ai\Promptable;
use Laravel\Ai\Tools\SimilaritySearch;
#[MaxSteps(4)]
class DocsRagAgent implements Agent, Conversational, HasTools
{
use Promptable;
public function instructions(): string
{
return <<<PROMPT
You are a Laravel documentation assistant. For every user question,
call the `similarity_search` tool first to retrieve the three most
relevant doc chunks, then answer using only those results. Quote
verbatim where possible and cite the doc titles you used.
PROMPT;
}
public function messages(): iterable
{
return []; // or load from your own table
}
public function tools(): iterable
{
return [
SimilaritySearch::usingModel(
model: Document::class,
column: 'embedding',
minSimilarity: 0.5,
limit: 3,
)->withDescription('Search the Laravel docs knowledge base.'),
];
}
}
To populate the knowledge base, you'll need a job that chunks your docs, embeds each chunk, and stores it. The naive version looks like this:
use App\Models\Document;
use Illuminate\Support\Str;
collect(File::glob(resource_path('docs/*.md')))->each(function (string $path) {
$markdown = File::get($path);
// Rough chunking: split on H2s, keep under ~1000 tokens each.
collect(preg_split('/^## /m', $markdown))
->filter()
->each(function (string $chunk) use ($path) {
Document::create([
'title' => basename($path, '.md'),
'content' => $chunk,
'embedding' => Str::of($chunk)->toEmbeddings(cache: true),
]);
});
});
In production, run this inside a queued job per file so you can parallelise and retry individual failures — see scaling Laravel queues in production for the worker topology I'd pair this with. Prompting the agent now transparently performs RAG:
$response = (new DocsRagAgent)->prompt('How do I write a custom Blade directive?');
return (string) $response;
The agent calls similarity_search under the hood, the SDK wires the results back into the conversation, and the model answers grounded in your documents.
Advanced patterns and edge cases
A few things will bite you once this hits real traffic.
Cost control starts with model selection. Agents default to whatever you configure in config/ai.php. For pipelines where 70% of requests are simple ("summarise this", "classify this") and 30% need reasoning, use #[UseCheapestModel] on the simple agents and #[UseSmartestModel] on the others. Don't hardcode model names — a new cheaper or smarter model will ship every quarter and you want the upgrade for free.
Failover is a one-liner, use it. Pass an array to the provider argument and the SDK automatically falls over to the backup on rate limits or provider outages:
$response = (new DocsRagAgent)->prompt(
$question,
provider: [Lab::Anthropic, Lab::OpenAI],
);
You can also bake failover into the agent class by passing multiple values to #[Provider]. The fallback fires on 429s, 5xxs and connection errors; it does not fire on 4xx validation errors, so a bad prompt still fails fast.
Cache embeddings aggressively. Embedding the same piece of text twice is pure waste. Enable the global cache in config/ai.php (ai.caching.embeddings.cache = true) and you get 30 days of free deduplication on identical inputs. For transient per-request work use ->cache(seconds: 3600) instead.
Watch out for PII in prompts and logs. Middleware (HasMiddleware) lets you scrub sensitive data before it leaves your application. I normally add one middleware that redacts email addresses and credit cards from $prompt->prompt and another that logs token usage to Pulse or Sentry. Never ship an agent that logs raw prompts at INFO level.
Rate limit per user, not just per IP. AI endpoints are expensive. A login loop on a public demo can cost you real money in a few hours. Combine route-level throttling with a per-user monthly token budget persisted in your own database — check it in middleware, deny the request before you call ->prompt().
Testing AI-backed code
This is the feature that convinced me to migrate. Every agent, image call, transcription and embedding generation is fakeable with a single method call, and you get Pest-friendly assertions for free:
use App\Ai\Agents\SupportTicketClassifier;
use Laravel\Ai\Prompts\AgentPrompt;
it('classifies billing tickets as billing', function () {
SupportTicketClassifier::fake([
[
'category' => 'billing',
'urgency' => 'medium',
'suggested_reply' => 'We have refunded your card.',
'tags' => ['refund', 'card'],
],
]);
$result = (new SupportTicketClassifier)
->prompt('I was double charged yesterday.');
expect($result['category'])->toBe('billing');
SupportTicketClassifier::assertPrompted(
fn (AgentPrompt $prompt) => $prompt->contains('double charged')
);
});
For structured output, ::fake() without arguments will auto-generate data matching your schema — which is perfect when you only care about the shape, not the values. For a stronger guarantee, add ->preventStrayPrompts() so that any unmocked AI call in the test suite blows up loudly:
beforeEach(function () {
SupportTicketClassifier::fake()->preventStrayPrompts();
});
This is the single best hygiene pattern I've found. It means a test that accidentally hits the real OpenAI API will fail in CI rather than silently cost you $0.02 per run. Pair it with an architectural Pest test that forbids direct instantiation of HTTP clients in the Ai\ namespace and you have a hard wall between your code and real providers in the test suite.
Common mistakes
Forgetting MaxSteps on tool-using agents. I said it above but it bears repeating. Without a step limit, an agent can loop indefinitely and bankrupt you. Start at 5, raise it only if you measure need.
Over-prompting the system message. Every token in instructions() is billed on every turn. Keep the system prompt under ~500 tokens; move lengthy rules into tool descriptions where the model only pays for them when it invokes the tool.
Using structured output for free-form chat. Structured output forces the model into JSON, which hurts the quality of prose responses. Use it for classification, extraction and anything downstream code needs to parse. Use plain text agents for chat.
Storing embeddings without an index. Omitting ->index() on the vector column makes similarity search an O(N) sequential scan. At 10k rows you won't notice. At 500k rows your queries take 30 seconds. Always add the index up front.
Streaming behind a buffering proxy. If your nginx sits in front of PHP-FPM with default proxy_buffering on, streamed responses arrive in one chunk at the end. Add proxy_buffering off on the streaming route and set X-Accel-Buffering: no on the response.
Wrapping up
We've built a real RAG chatbot: installed the Laravel AI SDK, configured multiple providers, wrote a conversational agent with memory, streamed it into Livewire, classified data with structured output, added a tool-calling loop, generated and stored embeddings in pgvector, wired up similarity search, faked every call for tests, and hardened the whole thing with MaxSteps, failover, caching, and rate limiting. That's the complete surface of the first-party SDK, end-to-end.
If you're migrating from a vendor SDK or from Prism, pick one agent and port it. Start with something low-risk — a classifier or summariser — and move business logic behind the new service class. Once you've got one green, the rest follow.
For the next step up from here, you might want to read the getting-started guide to Prism if you still need to support Laravel 12 in parallel, scaling Laravel queues in production to run your embedding jobs reliably, or immutable value objects with PHP readonly classes to put a typed boundary between your AI code and the rest of your app.
FAQ
What is the Laravel AI SDK?
The Laravel AI SDK (laravel/ai) is the first-party AI toolkit shipped with Laravel 13. It provides a unified, Laravel-native API for text generation, tool-calling agents, structured output, streaming, embeddings, vector stores, image generation, audio and transcription across providers like OpenAI, Anthropic, Gemini, Ollama and more. It replaces the need for vendor-specific SDKs or community packages in new projects.
How is the Laravel AI SDK different from Prism?
Prism is a community package (by TJ Miller, now at Laravel) that pioneered a fluent, provider-agnostic AI API for Laravel. The Laravel AI SDK is the first-party evolution of that idea, shipped as part of the framework in Laravel 13. It absorbs Prism's best patterns, adds first-party features like RemembersConversations, whereVectorSimilarTo, built-in vector stores and tighter Artisan integration, and has one fewer dependency to manage. Prism is still supported and is the right choice on Laravel 12 and earlier.
How do I stream AI responses in Laravel?
Call ->stream($prompt) on any agent and return the result from a route — the SDK sends a Server-Sent Events stream automatically. Inside a Livewire 4 component, iterate the returned stream and call $this->stream('property') to push tokens to the browser. You can also emit the Vercel AI SDK protocol with ->usingVercelDataProtocol() for Next.js clients.
What is RAG in PHP Laravel?
RAG (Retrieval-Augmented Generation) is a pattern where you embed your source documents into vectors, store them in a database, retrieve the most similar ones to a user query at prompt time, and inject those snippets into the LLM's context so it answers grounded in your data. In Laravel 13 you build this with Str::of($text)->toEmbeddings(), a pgvector-backed vector column, whereVectorSimilarTo, and the SDK's SimilaritySearch tool.
How do I generate text embeddings in Laravel?
The fastest way is Str::of('your text')->toEmbeddings(), which returns a float[] using your default embedding provider. For batching, dimensions control and caching, use Embeddings::for([...])->dimensions(1536)->cache(seconds: 3600)->generate(Lab::OpenAI, 'text-embedding-3-small'). Store the result in a vector column on PostgreSQL with pgvector.
How do I test AI calls in Laravel without hitting the real API?
Call YourAgent::fake() — with an array, closure, or no arguments — to stub responses, then use ::assertPrompted(), ::assertNotPrompted() and ::assertNeverPrompted() to verify behaviour. Add ->preventStrayPrompts() at the top of your test suite to fail any unmocked AI call loudly. The same pattern works for Image::fake(), Audio::fake(), Embeddings::fake() and the other top-level classes.
Does Laravel 13 support Anthropic Claude?
Yes. The Laravel AI SDK supports Anthropic (including Claude Haiku, Sonnet and Opus) as a first-class provider for text generation, tool calling and file attachments. Set ANTHROPIC_API_KEY in .env, reference the provider with Lab::Anthropic, and pick a model with #[Model('claude-haiku-4-5-20251001')] or by passing the model name to ->prompt().
Steven is a software engineer with a passion for building scalable web applications. He enjoys sharing his knowledge through articles and tutorials.