URI encoding, more accurately known as percent-encoding or URL encoding, is a method used to convert characters into a format that is safely transmitted over the internet within a Uniform Resource Identifier (URI). It ensures that all characters are represented using only the allowed US-ASCII characters within a URI.
Why is URI Encoding Necessary?
Certain characters are reserved or unsafe for use in URIs for various reasons:
- Reserved Characters: These characters have special meanings within the URI syntax (e.g., "/", "?", "#").
- Unsafe Characters: These characters may not be handled correctly by all systems or may be misinterpreted by some browsers or servers (e.g., spaces, "<", ">").
- Non-ASCII Characters: URIs are originally designed to use the US-ASCII character set. Non-ASCII characters (like accented letters or characters from other alphabets) need to be encoded.
How Does URI Encoding Work?
URI encoding replaces unsafe or reserved characters with a "%" followed by the two-digit hexadecimal representation of the character's ASCII value. For example:
- Space: Encoded as
%20
- "&": Encoded as
%26
- "=": Encoded as
%3D
Examples
Let's say you want to include the string "Hello world!" in a URI. Because the space character is not allowed, it needs to be encoded:
- Original String:
Hello world!
- Encoded String:
Hello%20world%21
Therefore, the complete URL might look like this:
https://www.example.com/search?q=Hello%20world%21
Characters That Usually Need Encoding
The following characters are commonly encoded in URIs:
"
(Double quote):%22
#
(Hash):%23
%
(Percent):%25
&
(Ampersand):%26
+
(Plus):%2B
/
(Slash):%2F
:
(Colon):%3A
;
(Semicolon):%3B
<
(Less than):%3C
=
(Equals):%3D
>
(Greater than):%3E
?
(Question mark):%3F
@
(At sign):%40
[
(Left square bracket):%5B
\
(Backslash):%5C
]
(Right square bracket):%5D
^
(Caret):%5E
`
(Grave accent):%60
{
(Left curly brace):%7B
|
(Vertical bar):%7C
}
(Right curly brace):%7D
~
(Tilde):%7E
- Space:
%20
Programming Languages and URI Encoding
Most programming languages offer built-in functions or libraries to handle URI encoding and decoding. Examples include:
- JavaScript:
encodeURIComponent()
anddecodeURIComponent()
- Python:
urllib.parse.quote()
andurllib.parse.unquote()
- Java:
java.net.URLEncoder.encode()
andjava.net.URLDecoder.decode()
URI encoding is essential for ensuring that data is transmitted correctly and consistently across the internet within URLs.