askvity

What is URI encoding?

Published in Web Development 2 mins read

URI encoding, more accurately known as percent-encoding or URL encoding, is a method used to convert characters into a format that is safely transmitted over the internet within a Uniform Resource Identifier (URI). It ensures that all characters are represented using only the allowed US-ASCII characters within a URI.

Why is URI Encoding Necessary?

Certain characters are reserved or unsafe for use in URIs for various reasons:

  • Reserved Characters: These characters have special meanings within the URI syntax (e.g., "/", "?", "#").
  • Unsafe Characters: These characters may not be handled correctly by all systems or may be misinterpreted by some browsers or servers (e.g., spaces, "<", ">").
  • Non-ASCII Characters: URIs are originally designed to use the US-ASCII character set. Non-ASCII characters (like accented letters or characters from other alphabets) need to be encoded.

How Does URI Encoding Work?

URI encoding replaces unsafe or reserved characters with a "%" followed by the two-digit hexadecimal representation of the character's ASCII value. For example:

  • Space: Encoded as %20
  • "&": Encoded as %26
  • "=": Encoded as %3D

Examples

Let's say you want to include the string "Hello world!" in a URI. Because the space character is not allowed, it needs to be encoded:

  • Original String: Hello world!
  • Encoded String: Hello%20world%21

Therefore, the complete URL might look like this:

https://www.example.com/search?q=Hello%20world%21

Characters That Usually Need Encoding

The following characters are commonly encoded in URIs:

  • " (Double quote): %22
  • # (Hash): %23
  • % (Percent): %25
  • & (Ampersand): %26
  • + (Plus): %2B
  • / (Slash): %2F
  • : (Colon): %3A
  • ; (Semicolon): %3B
  • < (Less than): %3C
  • = (Equals): %3D
  • > (Greater than): %3E
  • ? (Question mark): %3F
  • @ (At sign): %40
  • [ (Left square bracket): %5B
  • \ (Backslash): %5C
  • ] (Right square bracket): %5D
  • ^ (Caret): %5E
  • ` (Grave accent): %60
  • { (Left curly brace): %7B
  • | (Vertical bar): %7C
  • } (Right curly brace): %7D
  • ~ (Tilde): %7E
  • Space: %20

Programming Languages and URI Encoding

Most programming languages offer built-in functions or libraries to handle URI encoding and decoding. Examples include:

  • JavaScript: encodeURIComponent() and decodeURIComponent()
  • Python: urllib.parse.quote() and urllib.parse.unquote()
  • Java: java.net.URLEncoder.encode() and java.net.URLDecoder.decode()

URI encoding is essential for ensuring that data is transmitted correctly and consistently across the internet within URLs.

Related Articles