A URL (acronym for Uniform Resource Locator) is the address of a resource on the world wide web. URLs have a well-defined structure that was formulated in RFC 1738 per Tim Berners-Lee , inventor of the world wide web.

A URL follows the following syntax:

protocol:[//[usuario:senha@] host[:porta] ]path[?parametro][#fragmento]

The best known use for the URL is for connecting to websites, as we can see in the example below.

https://google.com.br

Several improvements were made to the initial RFC.The current RFC defining the URI syntax is RFC 3986 . This post contains information from the most recent RFC document.

Difference between URL and URI

You will probably hear URLs in some places and URIs in other places.

  • A URI is an identifier for a specific resource. Like a page, book or document
  • URL is special type of identifier that also tells you how to access it, like HTTPs, FTP, etc. An example would be the website https://marquesfernandes.com

If the protocol (HTTPS, FTP, etc.) is present or implied for a domain, you should call it a URL, even though it is also a URI. All URLs are URIs, but not all URIs are URLs.

URL encoding (percent encoding)

A URL is made up of a limited set of characters belonging to the US- character set. ASCII .These characters include digits (0-9), letters (AZ, az), and some special characters, ( "-" , "." , "_" , "~" ), as it is US-ASCII, the characters allowed do not include accents like those found in Portuguese.

There are some special characters that have a special use in URLs. Some examples of reserved characters are ? , / , # , : etc. Any data passed as part of the URL, whether in string or path segment query, must not contain these characters directly.

Also, dangerous characters like space , , < , > , { , } etc, and any character outside the character set ASCII , are not allowed directly in URLs.

So what to do when we need to send data in the URL that contains these disallowed characters?We use the magic of encoding.

URL encoding converts reserved and unsafe characters into a format understood by all web browsers and servers.We first convert the character to one or more bytes.So each byte is represented by two hexadecimal digits with a prefix of % (for example %20 ).The percent sign is used as a security character.

URL encoding example

Space: one of the most frequent URL encoded characters you are likely to encounter is a empty space .The ASCII value of the empty space decimal character is 32 , which when converted to hexadecimal becomes 20 .Now we add the percentage prefix ( % ), which gives us the encoded value of the URL: %20 .

ASCII Character Percent Encoding Reference Table

The following table is an ASCII character reference to its corresponding URL encoded form.

DecimalCharacterURL encoding (UTF-8)
0NUL (null character)%00
1SOH (header start)%01
twoSTX (beginning of text)%02
3ETX (end of text)%03
4EOT (End of Transmission)%04
5ENQ (survey)%05
6ACK (acknowledge)%06
7BEL (bell)%07
8BS (backward)%08
9HT (horizontal guide)%09
10LF (line feed)%0A
11VT (vertical guide)%0B
12FF (form feed)%0C
13CR (car return)%0D
14SO (switch out)%0E
15SI (change)%0F
16DLE (data link escape)%10
17DC1 (Device Control 1)%11
18DC2 (device control 2)%12
19DC3 (device control 3)%13
20DC4 (device control 4)%14
21NAK (negative recognition)%15
22SYN (synchronize)%16
23ETB (end of transmission block)%17
24MAY (cancel)%18
25EM (end of media)%19
26SUB (substitute)%1A
27ESC (escape)%1B
28FS (file separator)%1C
29GS (group separator)%1D
30RS (record tab)%1 AND
31US (unit separator)%1F
32space%20
33!%21
34"%22
35#%23
36$%24
37%%25
38AND%26
39'%27
40(%28
41)%29
42*%2A
43+%2B
44,%2C
45%2D
46.%2E
47/%2F
480%30
491%31
50two%32
513%33
524%34
535%35
546%36
557%37
568%38
579%39
58:%3A
59;%3B
60<%3C
61=%3D
62>%3E
63?%3F
64@%40
65THE%41
66B%42
67Ç%43
68D%44
69AND%45
70F%46
71G%47
72H%48
73I%49
74J%4A
75K%4B
76L%4C
77M%4D
78N%4E
79THE%4F
80P%50
81Q%51
82R%52
83s%53
84T%54
85U%55
86V%56
87Ç%57
88X%58
89Y%59
90Z%5A
91[%5B
92%5C
93]%5D
94^%5E
95_%5F
96`%60
97The%61
98B%62
99ç%63
100d%64
101and%65
102f%66
103g%67
104H%68
105i%69
106j%6A
107k%6B
108I%6C
109m%6D
110no%6E
111The%6F
112P%70
113what%71
114r%72
115s%73
116t%74
117u%75
118v%76
119Ç%77
120x%78
121y%79
122z%7A
123{%7B
124|%7C
125}%7D
126~%7E
127DEL (delete)%7F

4 2 votos
Nota do Artigo
Subscribe
Notify of

0 Comentários
newest
oldest most voted
Inline Feedbacks
View all comments
wpDiscuz
0
0
Would love your thoughts, please comment.x