HTTP is a protocol used by hundreds of millions of people every day for all sorts of activities: getting news, posting news, sharing photos, socializing, educating themselves, educating others, reading, and sending mail, and relaxing. HTTP is the Internet’s glue, which connects distributed web pages and WWW users. Below is quick and dirty explanation of how it actually works.
The HTTP protocol first appeared in yearly 90’s as a solution for document management in CERN, a huge European research center. They needed to maintain and manage a large number of scientific documents, each having many links to other documents (think of Wikipedia).
Using hypertext (which was not a new concept itself, it existed decades before) to build the documents was an obvious decision. Then, Sir Tim Berners-Lee came up with a quite elegant solution: to store documents on a server and manage them using client applications (browsers). To build the link between the server and the client application, the HTTP was invented.
Nowadays, HTTP/1.1 is the most commonly used version of the HTTP protocol, which is specified in RFC2616 internet standard document.
Before I go into HTTP itself, I would like to give a brief description to a URL, a term that is tightly related to the HTTP. URL stands for Uniform Resource Locator, a sequence of symbols, allowing to identify and to fetch resource (document, image, and video) from the Internet. URL is often called “link” or a “address”. URL looks the like following:
Here is a example of URL: http://www.worksforweb.com/company/. Here, scheme is “http”, “www.worksforweb.com” is hostname, port is not given (defaults to 80), and requested path is “/company/”. No parameters are passed to the resource because it is a static HTML document.
The interaction between the client and server is pretty straightforward: the client sends an HTTP request to the server, then server sends an HTTP response to the client. No magic.
An HTTP request contains identification of the information requested, so that server can tell whether this information is available, and if so send it back to the client (browser). The identification of requested resource is in the URL of the resource requested.
Both request and response have two parts: the headers and the body, the later being optional and required only for certain request methods and response statuses.
Let up look closely into the HTTP request and response. Let user enter the following URL into the address bar: http://www.worksforweb.com/company/. The browser then connects to the www.worksforweb.com server at port 80 and sends the following:
GET /company/ HTTP/1.1 Host: www.worksforweb.com Accept-Encoding: gzip,deflate
Then the server at www.worksforweb.com sends the following response:
HTTP/1.1 200 OK Content-Type: text/html <!DOCTYPE html PUBLIC "- …
The response is truncated because the body of the response contains the page’s HTML which is 16Kb. The request contains no body, only headers, which is standard for the GET requests (see “Request Methods” paragraph below). The body of the response comes after a blank line, which separates body from headers.
The shown request has three headers:
The response contains two headers and a body. The two headers shown in the example are mandatory:
The body of the response (the requested resource itself) comes after a blank line.
Typically, there are more request and response headers then shown in this example. The number and constitution of request and response headers depends on browser and server software, their configuration, type of the resource requested and other factors.
Each response sent by the server to the browser has a status header, which comes first. The status value indicates whether the request was successful and what can be done next. Probably the most well known response status is “404 Not Found”, which indicates that the requested document is not available on the server.
Here is the list of most popular statuses:
The other statuses are less often. The HTTP/1.1 defines five status groups:
There are eight methods defined in HTTP/1.1: OPTIONS, GET, HEAD, POST, PUT, DELETE, TRACE, CONNECT. However, only two of them are really used widely: the GET and the POST. These two methods must be supported by the HTTP/1.1-compliant servers.
GET is what browsers actually use to fetch pages, images, CSS style sheets, etc. Each time we click on the link, the browser issues a GET request to the server. Then browser uses a sequence of other GET request to get the design graphics, styles, and script to complete the page load. Thus, GET is used in more than 99% requests. POST is mostly used for form submission.
The difference between the GET and the POST methods lies in the request’s body. The GET method has only headers, while the POST can have both, the header and the body. Since the length of headers is limited (the limit is different for different browsers and server), GET can be used to send only limited amount of data to the server. Usually, it is only the path to the requested file and a few more headers. Thus, GET cannot be used to submit bigger forms and cannot be used to send files to servers.
When browser makes a POST request, the body of the request contains the form data, including uploaded files. The server can then parse this data and take appropriate action (store the file on disk, database, modify it, and send back to the browser, etc.).
GET requests can be used to send form data as well, but in case of GET, the form data becomes part of the requested URL. It is added to the right of the URL after the question mark. One can see this when using google.com. The requested search term appears in the URL of the search results page after the “&q=” symbols. This actually shows how data is encoded when sent from browser to the server: each form field is represented by the fieldname=fieldvalue pair, delimited by the “&” sign. Then the encoded data is either added to the requested URL (in case of the GET method) or is sent in the request body (in case of the POST method).
The client-server interaction, as defined in HTTP, is stateless and request-response-based. This means that when server serves the request, it does not take into account any prior requests made by the same or different browser, from the same or different IP address, or by the same or different user. The same is true for future requests: current request will not affect any of the future requests.
The statement above seems to contradict the basic observation one would make using any of the modern websites. For example, when I first open a website, I can see a log in box. Once I submit my login information, I get a greeting displayed in the same place where the login block was located. Even though the URL of the page remained the same (I requested the same page), the content of the page has changed because of me logging in during my previous request. Thus, prior requests do affect the behavior of the HTTP-based website.
The above is true, but is actually achievable beyond the HTTP specification. The HTTP provides a way to transfer data between the server and browser, but it does not specify how the browser or server should use this data. One of the possible ways of creating a stateful operation is to pass a token between the browser and the server to indicate a special state of the browser-server interaction (and this is where cookies come very handy). However, HTTP specifies neither a form of such token, nor the way it should be passed between server and browser, nor how long this token should be valid.
Cookies are small text data chunks stores by the browser and sent back to the server with each request. Each cookie has a name and a value. Cookies can be tied to a specific domain name (i.e. google.com) and can have limited lifetime.
HTTP cookies are not actually part of the HTTP standard; they are described in a separate document. However, modern WWW relies heavily on cookies and none of HTTP-based applications (i.e. website or a web interface such as Gmail) can work without cookies.
Here how it works: when the browser requests a web page of a certain website for the first time, server sends a cookie to the browser in response headers. Then, each time the browser makes a request to the server, it sends the cookie back to the server. If the server sets another cookie, then both cookies will be sent back to the server in subsequent requests, and so on. The browser sends all cookies that are tied to the website’s domain name and path and are not expired yet.
What is the point of using cookies? In most cases, cookie is the best solution to the statelessness of the HTTP protocol. Cookies are the best choice for passing the identification token between the server and the browser. On the first request, server generates a token (made up of chars, digits, etc) and sends it to the browser as a cookie. Then, the browser sends it back with each request, which allows server to tell requests from this particular browser (and a user behind it) from the other requests.
This pattern of browser-server interaction is usually called a session, and, consequently, such cookie is called a session token. Server then uses session token as a key to client’s state information stored on server. The advantage of this method is in no username and password information that is being sent to the server with each request. On the other hand, if an eavesdropping malware steals a cookie, it can then be used to impersonate a different user by sending the session token cookie to the server, thus stealing user’s identity.
However, cookies are limited in size (limit is different for different browsers) and are not reliable. The user can delete the cookies or tune the browser not to accept the cookies at all. The cookies are not shared between the browsers: if one used Mozilla Firefox to log into the website, opening the same website in Internet Explorer will show no user logged in.
Despite the core of HTTP being as simple as a request resource—receive resource in response, the HTTP/1.1 standard contains provisions for many related features. Below are the most commonly used ones:
The list above highlights only a few of numerous features of the HTTP. There are more features provided in HTTP/1.1 specification and even more in other standards that augment HTTP with additional functionality, such as cookies, request methods, authentication schemes, security layers, etc.
The usage of HTTP is not limited to only fetching the web pages by browsers and submission of web forms. Many other application-level protocols use HTTP as a client-server interaction layer. These protocol specifications include so-called HTTP-bindings, which specify how HTTP can be used to transfer data between the two applications (one would act as a HTTP client, while the other will assume server role). The most prominent examples are:
Moreover, HTTP-based application interfaces are getting more and more popular each day, which is grounded on the maturity and variety of modern HTTP-compliant software (both server and client).
[http://www.ietf.org/rfc/rfc2616.txt RFC2616] contains HTTP/1.1 standard specification. Though being written in a very formal language, it is not hard to read. It is a must-read for Web developers and a recommended reading for Web masters.
[http://en.wikipedia.org/wiki/HTTP_cookie HTTP Cookie] – an article in Wikipedia that gives a comprehensive overview of HTTP cookies, their usage, advantages, security threats and alternatives to session management.
[http://en.wikipedia.org/wiki/URL Uniform Resource Locator] — gives an overview or URL types and how to read them.
[http://www.w3.org/TR/uri-clarification/ World Wide Web Consortium] – an article that provides clarification of URL, URI and URN terms, and highlights differences between them.
Author: Max Kosyakov Worksforweb Development Team Leader
WorksForWeb software portfolio:
WorksForWeb software features:
"...we are very happy with all your work and help, you have been great and we love working with you. We ... express our greatest gratitude for your help."
"Everything has been going great with our customization work with WorksForWeb. You have truly been very professiononal and provided exactly what we asked for. We will most likely have a few more customization project in the future...."
"I've worked in the marine industry my whole life and customer satisfaction is the one thing that keeps our doors open. I can tell your company believes customer satisfaction is greatly important as well. ... I needed a solution and you guys certainly are delivering on that."