Stop Using parse_url(): PHP 8.5's URI Extension Explained

5 min read

parse_url() has been silently returning incomplete arrays and swallowing garbage input since PHP 4. PHP 8.5's new URI extension fixes that — immutable objects, typed getters, and actual exceptions when something goes wrong.

Meet the PHP 8.5 URI Extension

The extension ships two classes, bundled and always available — no composer require, no PECL:

  • Uri\Rfc3986\Uri — strict URI parsing following RFC 3986. Use this for backend services, custom schemes (ftp://, mailto:, your own), and anywhere that needs hard validation.
  • Uri\WhatWg\Url — browser-compatible URL parsing following the WHATWG URL standard. Use this when you need behaviour that matches Chrome or Firefox — Unicode hostnames, lenient normalisation, percent-encoding handled for you.

Both classes are immutable. Every modification returns a new instance; the original is untouched. It is the same design PHP developers already know from PSR-7 HTTP messages, and the same principle behind building immutable value objects with PHP readonly classes.

Parsing and Reading URI Components

The old way with parse_url():

$parts = parse_url('https://richdynamix.com/articles?tag=php#top');

echo $parts['host'];     // richdynamix.com
echo $parts['fragment']; // top
echo $parts['port'];     // PHP Notice: Undefined index: port

The array returned by parse_url() only contains keys that are present in the URL. Miss an isset() check and you get a notice at runtime — no exception, no early failure.

With the PHP 8.5 URI extension:

use Uri\Rfc3986\Uri;

$uri = new Uri('https://richdynamix.com/articles?tag=php#top');

echo $uri->getScheme();   // https
echo $uri->getHost();     // richdynamix.com
echo $uri->getPath();     // /articles
echo $uri->getQuery();    // tag=php
echo $uri->getFragment(); // top
var_dump($uri->getPort()); // NULL — absent components return null, not an undefined key

No undefined index notices. Every getter is typed and returns null when that component is absent.

The RFC 3986 class also exposes raw (percent-encoded) variants for components where encoding matters:

$uri = new Uri('https://example.com/caf%C3%A9/menu');

echo $uri->getPath();    // /café/menu    (decoded)
echo $uri->getRawPath(); // /caf%C3%A9/menu (original encoding preserved)

Use the raw getters when you need to forward the URI elsewhere without altering its encoding.

Modifying URIs Immutably

Each withX() method returns a new Uri instance with that component replaced. Chain them freely:

use Uri\Rfc3986\Uri;

$original = new Uri('https://richdynamix.com/articles?tag=php');

$updated = $original
    ->withQuery('tag=laravel&sort=recent')
    ->withFragment('results');

echo $original->toString(); // https://richdynamix.com/articles?tag=php
echo $updated->toString();  // https://richdynamix.com/articles?tag=laravel&sort=recent#results

toString() returns a normalised representation. If you need the URI exactly as you gave it, call toRawString() instead.

Available modifiers on Uri\Rfc3986\Uri:

$uri->withScheme(string $scheme): static
$uri->withHost(string $host): static
$uri->withPort(?int $port): static
$uri->withPath(string $path): static
$uri->withQuery(?string $query): static
$uri->withFragment(?string $fragment): static
$uri->withUserInfo(?string $user, ?string $password = null): static

This is the same immutable chaining pattern PHP 8.5 introduces elsewhere — if you haven't yet seen how clone with updates readonly objects without boilerplate, the two features complement each other nicely.

RFC 3986 vs WHATWG: Which to Use?

Reach for Uri\Rfc3986\Uri when:

  • You're building backend APIs or processing URIs programmatically
  • Your URIs use non-HTTP schemes
  • You want strict validation — invalid input must throw, full stop

Reach for Uri\WhatWg\Url when:

  • You're processing URLs submitted by users or browsers
  • You need IDNA/Unicode hostname support
  • You want normalisation that matches browser behaviour

The WHATWG class has its own serialisation methods to match the spec:

use Uri\WhatWg\Url;

$url = new Url('HTTPS://Example.COM/Path');

echo $url->toAsciiString();   // https://example.com/Path  (lowercase, machine-readable)
echo $url->toUnicodeString(); // https://example.com/Path  (display-friendly, Unicode hosts decoded)

The RFC 3986 class uses toString() / toRawString()toAsciiString() does not exist on that class.

The pattern is similar to type safety in domain modelling: just as replacing string constants with PHP backed enums eliminates the ambiguity of raw strings, choosing the right URI class encodes the spec you're working against into your type system.

PHP 8.5 URI Extension Error Handling

parse_url() returns false on severely malformed input — but stays quiet for a lot of edge cases that return incomplete arrays:

// parse_url() says this is fine
$parts = parse_url('not://a valid uri ///');
var_dump($parts);
// array(2) { ["scheme"]=> string(3) "not" ["path"]=> string(16) "//a valid uri //" }

The PHP 8.5 URI extension throws. Use the constructor when you want an exception on bad input:

use Uri\Rfc3986\Uri;
use Uri\InvalidUriException;

try {
    $uri = new Uri($userInput);
} catch (InvalidUriException $e) {
    // $userInput was not a valid RFC 3986 URI
    return response()->json(['error' => 'Invalid URL provided'], 422);
}

Or use the static parse() factory if you prefer a null-return style:

use Uri\Rfc3986\Uri;

$uri = Uri::parse($userInput);

if ($uri === null) {
    // invalid input — handle without a try/catch
    return null;
}

echo $uri->getHost();

The WHATWG version adds a third argument for collecting "soft errors" — validation warnings that do not prevent parsing:

use Uri\WhatWg\Url;

$errors = [];
$url = Url::parse('  https://example.com  ', null, $errors);
// $url is a valid Url instance — WHATWG parsed it despite the leading/trailing whitespace
// $errors contains UrlValidationError objects describing the soft failures

Gotchas and Edge Cases

No fromString() method — some early blog posts mention Uri::fromString(). It does not exist. Use new Uri($string) (throws on failure) or Uri::parse($string) (returns null on failure).

Exception class — the thrown exception is Uri\InvalidUriException, not \ValueError. The exception hierarchy is Uri\UriExceptionUri\InvalidUriExceptionUri\WhatWg\InvalidUrlException.

RFC 3986 does not support IDNA — passing a Unicode hostname to Uri\Rfc3986\Uri accepts it as a raw string without Punycode conversion. Use Uri\WhatWg\Url for user-facing URLs with Unicode hosts.

Fragment equality — both classes ignore the fragment by default when comparing URIs. https://example.com#foo equals https://example.com. Pass UriComparisonMode::IncludeFragment to equals() for strict matching.

WHATWG requires a schemeUri\WhatWg\Url throws on schemeless input. Uri\Rfc3986\Uri is more permissive and accepts relative URIs.

PHP 8.5+ only — the extension is always available from PHP 8.5 with no install step. Until your app (and any packages you depend on) requires PHP 8.5 as a minimum, you can't use it in shared library code. Before upgrading, audit your PHP dependencies to check the minimum PHP versions your key packages already enforce.

Wrapping Up

Swap parse_url() for new Uri\Rfc3986\Uri($url) or Uri::parse($url) in any PHP 8.5 codebase. You get typed components, immutable modification, and a real exception on bad input. Use Uri\WhatWg\Url when browser compatibility matters.

PHP 8.5 is shipping several ergonomic improvements at once — the |> pipe operator for cleaner function chaining is another worth picking up in the same upgrade window.

php
php-8.5
url-parsing
Steven Richardson

Steven is a software engineer with a passion for building scalable web applications. He enjoys sharing his knowledge through articles and tutorials.