URL Encoding: Why Special Characters Break Links and How to Fix It
You've seen it before: a URL with %20 instead of a space, or %2F in a path that should have a slash. These %XX sequences aren't bugs β they're percent-encoding, the mechanism that allows URLs to contain characters they otherwise couldn't. Understanding how URL encoding works will save you from mysterious broken links, failed API calls, and frustrating debugging sessions.
Why URLs Have Character Restrictions
A URL (Uniform Resource Locator) is governed by RFC 3986, which specifies that URLs can only contain a specific set of ASCII characters. The "unreserved" characters β safe anywhere in a URL without encoding β are:
A-Z a-z 0-9 - _ . ~
Every other character must be percent-encoded before inclusion in a URL.
This restriction exists for good reasons. URLs are passed through many systems: browsers, proxy servers, load balancers, web servers, logging systems. Each of these components needs to parse the URL reliably. Special characters like ?, &, =, #, and / have specific meanings in URL structure. A literal & in a query string value would be misinterpreted as a query parameter separator.
How Percent-Encoding Works
Percent-encoding (also called URL encoding) replaces each unsafe byte with a % sign followed by two hexadecimal digits representing the byte's value.
For ASCII characters, this is straightforward:
- Space (ASCII 32, hex 0x20) β
%20 #(ASCII 35, hex 0x23) β%23&(ASCII 38, hex 0x26) β%26=(ASCII 61, hex 0x3D) β%3D?(ASCII 63, hex 0x3F) β%3F/(ASCII 47, hex 0x2F) β%2F
For non-ASCII characters (accented letters, emoji, characters from non-Latin scripts), the character is first encoded as UTF-8, then each resulting byte is percent-encoded.
Example β the French character Γ© (U+00E9):
- UTF-8 encoding:
0xC3 0xA9(two bytes) - Percent-encoded:
%C3%A9
So "cafΓ©" in a URL becomes caf%C3%A9.
Use the URL Encoder to encode or decode any string without having to do the hex math manually.
URL Structure and What Goes Where
A full URL has several components, and encoding rules differ between them:
https://example.com:8080/path/to/page?key=value&other=data#section
β β β β β β β
scheme host port path query fragment
Scheme (https): Fixed identifier, no encoding needed.
Host (example.com): Domain names use their own encoding system (Punycode) for internationalized domain names, not percent-encoding. mΓΌnchen.de becomes xn--mnchen-3ya.de.
Path (/path/to/page): Slashes separate path segments. If a segment itself contains a slash (as literal data, not a separator), encode it as %2F. Other special characters in path segments must be encoded.
Query string (key=value&other=data): The = separates keys from values; & separates pairs. If a key or value contains literal & or =, these must be encoded. This is the most common source of URL encoding bugs.
Fragment (#section): The part after #, used for page anchors. Not sent to the server β it's processed by the browser only.
The + vs %20 Problem
Spaces can be encoded two ways in URLs, and mixing them up causes bugs:
%20: The correct percent-encoding for a space, valid anywhere in a URL+: A shorthand for space valid only in the query string (form data encoding,application/x-www-form-urlencoded)
A literal + in a query string value is encoded as %2B if you want to send an actual plus sign.
This distinction causes real bugs. If you encode query parameters using %20 on the client but your server decodes + as space (or vice versa), you'll get mismatched data. The URL Encoder lets you choose which convention to use.
Encoding in Practice: Code Examples
JavaScript
// Encode a single query parameter value
encodeURIComponent("hello world & more")
// β "hello%20world%20%26%20more"
// Encode a full URL (preserves :// and /)
encodeURI("https://example.com/path with spaces")
// β "https://example.com/path%20with%20spaces"
// DO NOT use encodeURI for query values β it doesn't encode & = ? #
// ALWAYS use encodeURIComponent for individual values
// Building a query string properly
const params = new URLSearchParams({
q: "cafΓ© & more",
page: "2"
});
`https://example.com/search?${params}`
// β "https://example.com/search?q=caf%C3%A9+%26+more&page=2"
Python
from urllib.parse import quote, urlencode, quote_plus
# Encode a path segment
quote("hello world") # "hello%20world"
quote("path/with/slashes") # "path%2Fwith%2Fslashes"
# Encode a query value (uses + for spaces)
quote_plus("hello world & more") # "hello+world+%26+more"
# Build a full query string
urlencode({"q": "cafΓ©", "page": 2})
# β "q=caf%C3%A9&page=2"
Common mistake: encoding a complete URL
A frequent error is encoding a whole URL instead of just the parts that need it:
// Wrong: encodes the : // and / too, breaking the URL
encodeURIComponent("https://example.com/search?q=hello world")
// β "https%3A%2F%2Fexample.com%2Fsearch%3Fq%3Dhello%20world"
// Right: encode only the value
"https://example.com/search?q=" + encodeURIComponent("hello world")
// β "https://example.com/search?q=hello%20world"
Double Encoding: A Common Bug
Double encoding happens when encoded URLs get encoded again:
Original: "hello world"
Encoded once: "hello%20world" β correct
Encoded twice: "hello%2520world" β wrong β %25 is the encoding of %
When the server decodes %2520, it gets %20 β a literal percent-sign followed by "20" β not a space. This is a frequent bug when storing or displaying encoded URLs in encoded contexts.
To check if a URL is double-encoded: look for %25 in it. That's always a sign something encoded a % sign, which only happens when already-encoded content gets encoded again.
Decoding: When to Be Careful
URL decoding is generally safe, but a few edge cases matter:
Null bytes: %00 decodes to a null byte. Some server software treats this as a string terminator. Historically this was used in directory traversal attacks (../../../../etc/passwd%00.jpg).
Path traversal: %2F (encoded slash) and %2E%2E (encoded ..) in paths can sometimes bypass path sanitization that looks for literal slashes and dots. Always normalize paths before validation.
Encoding mismatch: If a client uses ISO-8859-1 encoding but the server expects UTF-8, accented characters decode to the wrong values. Modern web is standardized on UTF-8, but legacy systems may differ.
Internationalized URLs (IRIs)
Modern URLs can technically contain Unicode characters in paths and query strings β these are called IRIs (Internationalized Resource Identifiers). Browsers display https://mΓΌnchen.de/straΓe in the address bar, but internally convert it to https://xn--mnchen-3ya.de/stra%C3%9Fe for actual transmission.
If you're storing or comparing URLs, be careful about whether you're working with display form (IRI) or wire form (percent-encoded ASCII).
FAQ
What is the difference between encodeURI and encodeURIComponent in JavaScript?
encodeURI is for encoding a complete URL β it leaves structural characters like :, /, ?, #, and & unencoded. encodeURIComponent is for encoding a single value (like a query parameter) β it encodes everything including ?, &, =, and /. Always use encodeURIComponent for individual parameter values.
Do I need to encode the path if I control the server? Yes. Even if your server handles unencoded paths, proxies and caches in between may not. It's safer to always encode correctly.
Why does my URL have %0A or %0D in it?
%0A is a newline (LF) and %0D is a carriage return (CR). These appear when form data contained multi-line text or when line endings crept in from copy-pasting.
Is URL encoding reversible? Yes, decoding always recovers the original string (assuming the correct encoding β typically UTF-8 β is used). URL encoding is not encryption.
Should I encode slashes in path segments?
Only if the slash is literal data within a path segment (not a path separator). In most URL routing, / separates segments. If a segment value itself contains a slash (e.g., a filename with /), encode it as %2F β but be aware that many web servers decode %2F before routing, which can cause double-decoding issues. Test your specific server's behavior.