Recent Changes - Search:

Network Programming

This website demonstrates using wikis as teaching and learning tool.

The course instructor is happy to share the teaching materials here with those who find it readable.

HTTP - Hypertext Transfer Protocol

A Network Programming Lecture by Steven Choy

Overview: What is HTTP? - Structure of HTTP Transactions - What are HTTP Headers? - HTTP Request Structure - HTTP Request Method: GET - HTTP Request Method: POST - HTTP Response Structure - HTTP Status Codes - HTTP Headers in HTTP Requests - HTTP Headers in HTTP Responses


What is HTTP?

  • HTTP stands for Hypertext Transfer Protocol.
"Almost everything you see in your browser is transmitted to your computer over HTTP. For example, when you opened a particular web page, your browser probably have sent over 40 HTTP requests and received HTTP responses for each."
  • It's the network protocol used to deliver virtually all resources on the web.
  • Resources on the web include all files and other data such as HTML files, image files, query results, or anything else.
  • Usually, HTTP takes place through TCP/IP sockets.
  • Knowing HTTP enables you to write Web browsers, Web servers, automatic page downloaders, link-checkers, and other useful tools.

Structure of HTTP Transactions

  • HTTP uses the client-server model:
    • An HTTP client opens a connection and sends a request message to an HTTP server.
    • The server then returns a response message, usually containing the resource that was requested.
HTTP GET transaction (Source: http://oreilly.com/openbook/webclient/ch03.html)
  • The format of an HTTP (request or response) message is:
        <initial request/response line>
        Header1: value1
        Header2: value2
        Header3: value3

        <optional message body goes here, like file contents or query data;
         it can be many lines long, or even binary data>

What are HTTP Headers?

  • HTTP headers are the core part of these HTTP requests and responses, and they carry information about the client browser, the requested page, the server and more.

Example HTTP Request

      GET /tutorials/other/top-20-mysql-best-practices/ HTTP/1.1
      Host: net.tutsplus.com
      User-Agent: Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.1.5) Gecko/20091102 Firefox/3.5.5 (.NET CLR 3.5.30729)
      Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
      Accept-Language: en-us,en;q=0.5
      Accept-Encoding: gzip,deflate
      Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
      Keep-Alive: 300
      Connection: keep-alive
      Cookie: PHPSESSID=r2t5uvjq435r4q7ib3vtdjq120
      Pragma: no-cache
      Cache-Control: no-cache
  • The first line is the Request Line which contains some basic info on the request.
  • The rest are the HTTP headers.

Example HTTP Response

      HTTP/1.x 200 OK
      Transfer-Encoding: chunked
      Date: Sat, 28 Nov 2009 04:36:25 GMT
      Server: LiteSpeed
      Connection: close
      X-Powered-By: W3 Total Cache/0.8
      Pragma: public
      Expires: Sat, 28 Nov 2009 05:36:25 GMT
      Etag: "pub1259380237;gz"
      Cache-Control: max-age=3600, public
      Content-Type: text/html; charset=UTF-8
      Last-Modified: Sat, 28 Nov 2009 03:50:37 GMT
      X-Pingback: http://net.tutsplus.com/xmlrpc.php
      Content-Encoding: gzip
      Vary: Accept-Encoding, Cookie, User-Agent
  • The first line is the Status Line
  • It is followed by HTTP headers, until the blank line.
  • After that, the content starts.

How to view HTTP Headers

  • Your browser only shows you the content of a HTTP response and not the HTTP response headers.
  • You need to use some browser addons or plugins, or other specific programs to view HTTP headers.

HTTP Request Structure

  • Request line = request method + path + protocol
  • HTTP headers: name-value pairs
  • Request methods: indicates what kind of request this is; most common methods are GET, POST and HEAD.

HTTP Request Method: GET

  • This is the main method used for retrieving html, images, JavaScript, CSS, etc.
  • Most data that loads in your browser was requested using this method.
  • Web forms can be set to use the method GET. Here is an example
      <form method="GET" action="foo.php">
        First Name: <input name="first_name" type="text"> <br />
        Last Name: <input name="last_name" type="text"> <br />
        <input type="submit" name="action" value="Submit" />
      </form>
  • When that form is submitted, the HTTP request line is:
      GET /foo.php?first_name=John&last_name=Doe&action=Submit HTTP/1.1

HTTP Request Method: POST

  • POST requests are most commonly sent by web forms.
      <form method="POST" action="foo.php">
        First Name: <input type="text" name="first_name" /> <br />
        Last Name: <input type="text" name="last_name" /> <br />
        <input type="submit" name="action" value="Submit" />
      </form>
  • When that form is submitted, the HTTP request line is:
      POST /foo.php HTTP/1.1
      Host: localhost
      User-Agent: Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.1.5) Gecko/20091102 Firefox/3.5.5 (.NET CLR 3.5.30729)
      Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
      Accept-Language: en-us,en;q=0.5
      Accept-Encoding: gzip,deflate
      Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
      Keep-Alive: 300
      Connection: keep-alive
      Referer: http://localhost/test.php
      Content-Type: application/x-www-form-urlencoded
      Content-Length: 43

      first_name=John&last_name=Doe&action=Submit
  • Note the following points:
    • Content-Type and Content-Lenght headers have been added
    • All the data is in now sent after the HTTP headers.
    • POST method requests can also be made via AJAX, applications, cURL, etc.
    • All file upload forms are required to use the POST method.

HTTP Request Method: HEAD

  • HEAD is similar to GET, except the server does not return the content in the HTTP response.
"When you send a HEAD request, it means that you are only interested in the response code and the HTTP headers, not the document itself."

HTTP Response Structure

  • Response line = protocol + status code
  • HTTP headers: name-value pairs
  • Most of the name-vale pairs in the HTTP headers are optional.

HTTP Status Codes

  • The status code is a three-digit integer
  • The first digit identifies the general category of response:
      1xx indicates an informational message only
      2xx indicates success of some kind (i.e. for successful requests)
      3xx redirects the client to another URL (i.e. for redirections)
      4xx indicates an error on the client's part (i.e. there was a problem with the request)
      5xx indicates an error on the server's part (i.e. there was a problem with the server)
  • The most common status codes are:
200 OK (The request succeeded, and the resulting resource is returned in the message body.)
404 Not Found (the requested resource doesn't exist.)
301 Moved Permanently
302 Moved Temporarily
(Both 302 and 301 are handled very similarly by the browser, but they can have different meanings to search engine spiders. For instance, if your website is down for maintenance, you may redirect to another location using 302. The search engine spider will continue checking your page later in the future. But if you redirect using 301, it will tell the spider that your website has moved to that location permanently.)
304 Not Modified (If the client has performed a conditional GET request and access is allowed, but the document has not been modified, the server SHOULD respond with this status code. The 304 response MUST NOT contain a message-body, and thus is always terminated by the first empty line after the header fields.)
500 Server Error (An unexpected server error. The most common cause is a server-side script that has bad syntax, fails, or otherwise can't run correctly.)
401 Unauthorized (Password protected web pages send this code. If you don't enter a login correctly, you may see the following in your browser.)
403 Forbidden (If you are not allowed to access a page, this code may be sent to your browser. This often happens when you try to open a url for a folder, that contains no index page. If the server settings do not allow the display of the folder contents, you will get a 403 error.)
(There are other ways in which access can be blocked, and 403 can be sent. For example, you can block by IP address, with the help of some htaccess directives.)
         order allow,deny
         deny from 192.168.44.201
         deny from 172.16.7.92
         allow from all

HTTP Headers in HTTP Requests

  • Host
"An HTTP Request is sent to a specific IP Addresses. But since most servers are capable of hosting multiple websites under the same IP, they must know which domain name the browser is looking for."
  • User-Agent
"This header can carry several pieces of information such as: browser name and version, operating System name and version, and default language. This is how websites can collect certain general information about their users' systems."
  • Accept-Language
"This header displays the default language setting of the user. If a website has different language versions, it can redirect a new surfer based on this data."
  • Accept-Encoding
"Most modern browsers support gzip, and will send this in the header. The web server then can send the HTML output in a compressed format. This can reduce the size by up to 80% to save bandwidth and time."
  • If-Modified-Since
"If a web document is already cached in your browser, and you visit it again, your browser can check if the document has been updated. If it was not modified since that date, the server will send a "304 Not Modified" response code, and no content - and the browser will load the content from the cache."
  • Cookie
This sends the cookies stored in your browser for that domain.
  • Referer
This HTTP header contains the referring url.
  • Authorization
When a web page asks for authorization, the browser opens a login window. When you enter a username and password in this window, the browser sends another HTTP request, but this time it contains this header.

HTTP Headers in HTTP Responses

  • Cache-Control
          Cache-Control: max-age=3600, public
Definition from w3.org: The Cache-Control general-header field is used to specify directives which MUST be obeyed by all caching mechanisms along the request/response chain." These "caching mechanisms" include gateways and proxies that your ISP may be using.
Caching can also be prevented by using the "no-cache" directive.
          Cache-Control: no-cache
  • Content-Type
This header indicates the "mime-type" of the document. The browser then decides how to interpret the contents based on this.
Some examples of content type
             Content-Type: text/html; charset=UTF-8

             Content-Type: image/gif

             Content-Type: application/pdf
Here is a list of common MIME types and their corresponding file extensions
  • Content-Disposition
This header instructs the browser to open a file download box, instead of trying to parse the content. Example:
          Content-Disposition: attachment; filename="download.zip"
  • Content-Length
When content is going to be transmitted to the browser, the server can indicate the size of it using this header. This is especially useful for file downloads. That's how the browser can determine the progress of the download.
  • Etag
The web server may send this header with every document it serves. The value can be based on the last modify date, file size or even the checksum value of a file.
          Etag: "pub1259380237;gz"  
The browser then saves this value as it caches the document. Next time the browser requests the same file, it sends this in the HTTP request:
          If-None-Match: "pub1259380237;gz"
If the Etag value of the document matches that, the server will send a "304 Not Modified" code instead of "200 OK", and no content. The browser will load the contents from its cache.
  • Last-Modified
This header indicates the last modify date of the document, in GMT format:
          Last-Modified: Sat, 28 Nov 2009 03:50:37 GMT
  • Location
This header is used for redirections. If the response code is 301 or 302, the server must also send this header.
  • Set-Cookie
When a website wants to set or update a cookie in your browser, it will use this header.
      Set-Cookie: skin=noskin; path=/; domain=.amazon.com; expires=Sun, 29-Nov-2009 21:42:28 GMT
      Set-Cookie: session-id=120-7333518-8165026; path=/; domain=.amazon.com; expires=Sat Feb 27 08:00:00 2010 GMT
  • WWW-Authenticate
A website may send this header to authenticate a user through HTTP. When the browser sees this header, it will open up a login dialogue window.
          WWW-Authenticate: Basic realm="Restricted Area"
  • Content-Encoding
This header is usually set when the returned content is compressed.

References

Cool Tools


Thanks for Reading

If you would rather like to have this lecture note in printed format, please click the print action link in the top right corner.

If you find any problem in this lecture note, please feel free to tell Steven via steven@findaway.hk.

Edit - History - Print - Recent Changes - Search
Page last modified on January 14, 2010, at 09:26 AM