URLs

URL stands for Uniform Resource Locator. It is a common form of URI (Uniform Resource Identifier) that describes a path to some resource, often over the Web. The following will give a bit of a technical description of the format of common URLs, mostly in the context of links.

Syntax

A URL is made up of a number of components that describe how to access the destination resource. Any of these components may be omitted and a default will be assumed. The components, if specified, must be written in the following order:

Scheme (http:)

The scheme indicates the protocol (the type of computer communication language) that will be used to make the request for the destination resource. On the Web, this is usually HTTP, the HyperText Transfer Protocol, indicated by http:. For a secure (encrypted) HTTP connection, https: may be used. ftp: is another common scheme on the Web.

If the scheme is omitted, it assumes the scheme that was used to reach the current document (usually HTTP on the Web).

Authority (//)

If present, this identifier indicates that “authority” information will follow. Authority information includes one or more of the following components:

User information (username@)

User information may be used for logins or other identification purposes. It is often written username@ or username:password@, although providing password information in this manner could result in the password being leaked and thus isn't recommended.

If user information is omitted, no such information will be given. In this case, if the server requires user information for the request, it may ask the user for the information before continuing.

Host (www.w3.org)

The host identifier is the IP address of the destination resource or a domain name that represents its IP address (for example, www.w3.org).

If the host identifier is omitted, the URL is assumed to be from the same host/domain.

Port (:80)

The communication port is specified with a colon and the port number to use (for example, :8080).

If omitted, a port number may be assumed based on the scheme used (:80 is the default in HTTP)

Path (/path/to/resource.html)

This is the rest of the path to the resource (for example, /TR/html401/). This is a UNIX-style path, which may be an absolute path (beginning with /), or otherwise — if and only if the authority information is omitted — it may be a relative path from the base established by the current document. The default base is the location of the current document, but the document may manually specify a different base such as with the HTML base element.

If the path is omitted, the root (/) is assumed.

Query (?key=value&key2=value2)

A query string is used to send special parameters to the resource, such as simple form data. A query string begins with a question mark (?) and then contains a series of parameters separated by a delimiter. The delimiter is usually an ampersand (&) character, but depending on the website it may be a semicolon (;) or some other character. The parameters themselves may be simple keywords or a key-value pair. The key-value pairs include a parameter name, followed by an equal sign (=), followed by the parameter's value. For historical reasons, a plus (+) in a parameter value is the same as a space. The purpose and usage of these parameters completely depends on the destination resource — the browser merely sends the query string as it is written in the URL.

If the query string is omitted, no additional query information will be included in the request.

Fragment identifier (#contents)

There may also be a fragment identifier at the end of a URI. A fragment identifier refers to a specific section of the resource, such as a certain paragraph on a webpage. It consists of a hash symbol (#) followed by the name of the fragment in the destination resource. In HTML, fragment names are specified using the id or name attributes. In XHTML, they are only specified using the id attribute.

If omitted, no particular fragment will be referenced.

In order to prevent confusion of the different parts of a URI and the special characters that separate them, parts of the URI are “encoded”, meaning special characters that shouldn't hold special meanings are replaced with something else that can later be “decoded” back to the original characters. Different sections may require different levels of encoding. For example, an ampersand (&) holds no special meaning in the resource path, but it does in a parameter value. Generally, special characters are encoded using a hexadecimal representation. They begin with a percent sign (%), followed by a two-digit hexadecimal number representing the character. For example, an encoded ampersand looks like %26 and an encoded space looks like %20.

Examples

Here are some examples to illustrate different forms of URLs (the current base URL is http://www.webdevout.net/articles/):

Additional notes

Although mailto:nanobot@gmail.com is a URI, it is not a URL because it does not describe a path. Whether a URI may also be a URL depends on the URI scheme. http: and ftp: both describe path information, while mailto: and data: do not.
Special attention should be drawn to the ampersand (&) character in SGML-based languages like HTML, XML, and consequently XHTML. While it commonly has a special purpose in URIs (separating parameters), it also has a special purpose in common SGML and XML languages (character references). An ampersand appearing in a URI may accidentally be interpretted as the start of a character reference — thus causing part of the URI to be converted to a different character — or else cause a validation error. Therefore, it is important to change all occurrences of & to & in the URI. This should be done after the basic URI encoding. For example, the href attribute might look like this: href="http://www.google.com/search?num=20&hl=en&q=m%26m%27s". Notice the ampersand (%26) in the q value is already encoded and doesn't require a character reference.
In HTML-based languages, if there is a base element present in the head section of the current document, a relative URL will be relative to the specified base instead of the path used to reach the current document.

URLs

Syntax

Examples

Additional notes

See also