|
Network Programming This website demonstrates using wikis as teaching and learning tool. The course instructor is happy to share the teaching materials here with those who find it readable. |
Lecture /
HTTP - Hypertext Transfer ProtocolA Network Programming Lecture by Steven Choy Overview: What is HTTP? - Structure of HTTP Transactions - What are HTTP Headers? - HTTP Request Structure - HTTP Request Method: GET - HTTP Request Method: POST - HTTP Response Structure - HTTP Status Codes - HTTP Headers in HTTP Requests - HTTP Headers in HTTP Responses What is HTTP?
"Almost everything you see in your browser is transmitted to your computer over HTTP. For example, when you opened a particular web page, your browser probably have sent over 40 HTTP requests and received HTTP responses for each."
Structure of HTTP Transactions
HTTP GET transaction (Source: http://oreilly.com/openbook/webclient/ch03.html)
<initial request/response line>
Header1: value1
Header2: value2
Header3: value3
<optional message body goes here, like file contents or query data;
it can be many lines long, or even binary data>
What are HTTP Headers?
Example HTTP Request GET /tutorials/other/top-20-mysql-best-practices/ HTTP/1.1
Host: net.tutsplus.com
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.1.5) Gecko/20091102 Firefox/3.5.5 (.NET CLR 3.5.30729)
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en-us,en;q=0.5
Accept-Encoding: gzip,deflate
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
Keep-Alive: 300
Connection: keep-alive
Cookie: PHPSESSID=r2t5uvjq435r4q7ib3vtdjq120
Pragma: no-cache
Cache-Control: no-cache
Example HTTP ResponseHTTP/1.x 200 OK Transfer-Encoding: chunked Date: Sat, 28 Nov 2009 04:36:25 GMT Server: LiteSpeed Connection: close X-Powered-By: W3 Total Cache/0.8 Pragma: public Expires: Sat, 28 Nov 2009 05:36:25 GMT Etag: "pub1259380237;gz" Cache-Control: max-age=3600, public Content-Type: text/html; charset=UTF-8 Last-Modified: Sat, 28 Nov 2009 03:50:37 GMT X-Pingback: http://net.tutsplus.com/xmlrpc.php Content-Encoding: gzip Vary: Accept-Encoding, Cookie, User-Agent
How to view HTTP Headers
HTTP Request Structure
HTTP Request Method: GET
<form method="GET" action="foo.php">
First Name: <input name="first_name" type="text"> <br />
Last Name: <input name="last_name" type="text"> <br />
<input type="submit" name="action" value="Submit" />
</form>
GET /foo.php?first_name=John&last_name=Doe&action=Submit HTTP/1.1 HTTP Request Method: POST
<form method="POST" action="foo.php">
First Name: <input type="text" name="first_name" /> <br />
Last Name: <input type="text" name="last_name" /> <br />
<input type="submit" name="action" value="Submit" />
</form>
POST /foo.php HTTP/1.1
Host: localhost
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.1.5) Gecko/20091102 Firefox/3.5.5 (.NET CLR 3.5.30729)
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en-us,en;q=0.5
Accept-Encoding: gzip,deflate
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
Keep-Alive: 300
Connection: keep-alive
Referer: http://localhost/test.php
Content-Type: application/x-www-form-urlencoded
Content-Length: 43
first_name=John&last_name=Doe&action=Submit
HTTP Request Method: HEAD
"When you send a HEAD request, it means that you are only interested in the response code and the HTTP headers, not the document itself."
HTTP Response Structure
HTTP Status Codes
1xx indicates an informational message only
2xx indicates success of some kind (i.e. for successful requests)
3xx redirects the client to another URL (i.e. for redirections)
4xx indicates an error on the client's part (i.e. there was a problem with the request)
5xx indicates an error on the server's part (i.e. there was a problem with the server)
200 OK (The request succeeded, and the resulting resource is returned in the message body.)
404 Not Found (the requested resource doesn't exist.)
301 Moved Permanently
302 Moved Temporarily
(Both 302 and 301 are handled very similarly by the browser, but they can have different meanings to search engine spiders. For instance, if your website is down for maintenance, you may redirect to another location using 302. The search engine spider will continue checking your page later in the future. But if you redirect using 301, it will tell the spider that your website has moved to that location permanently.)
304 Not Modified (If the client has performed a conditional GET request and access is allowed, but the document has not been modified, the server SHOULD respond with this status code. The 304 response MUST NOT contain a message-body, and thus is always terminated by the first empty line after the header fields.)
500 Server Error (An unexpected server error. The most common cause is a server-side script that has bad syntax, fails, or otherwise can't run correctly.)
401 Unauthorized (Password protected web pages send this code. If you don't enter a login correctly, you may see the following in your browser.)
403 Forbidden (If you are not allowed to access a page, this code may be sent to your browser. This often happens when you try to open a url for a folder, that contains no index page. If the server settings do not allow the display of the folder contents, you will get a 403 error.)
(There are other ways in which access can be blocked, and 403 can be sent. For example, you can block by IP address, with the help of some htaccess directives.)
order allow,deny
deny from 192.168.44.201
deny from 172.16.7.92
allow from all
HTTP Headers in HTTP Requests
"An HTTP Request is sent to a specific IP Addresses. But since most servers are capable of hosting multiple websites under the same IP, they must know which domain name the browser is looking for."
"This header can carry several pieces of information such as: browser name and version, operating System name and version, and default language. This is how websites can collect certain general information about their users' systems."
"This header displays the default language setting of the user. If a website has different language versions, it can redirect a new surfer based on this data."
"Most modern browsers support gzip, and will send this in the header. The web server then can send the HTML output in a compressed format. This can reduce the size by up to 80% to save bandwidth and time."
"If a web document is already cached in your browser, and you visit it again, your browser can check if the document has been updated. If it was not modified since that date, the server will send a "304 Not Modified" response code, and no content - and the browser will load the content from the cache."
This sends the cookies stored in your browser for that domain.
This HTTP header contains the referring url.
When a web page asks for authorization, the browser opens a login window. When you enter a username and password in this window, the browser sends another HTTP request, but this time it contains this header.
HTTP Headers in HTTP Responses
Cache-Control: max-age=3600, public Definition from w3.org: The Cache-Control general-header field is used to specify directives which MUST be obeyed by all caching mechanisms along the request/response chain." These "caching mechanisms" include gateways and proxies that your ISP may be using.
Caching can also be prevented by using the "no-cache" directive.
Cache-Control: no-cache
This header indicates the "mime-type" of the document. The browser then decides how to interpret the contents based on this.
Some examples of content type
Content-Type: text/html; charset=UTF-8
Content-Type: image/gif
Content-Type: application/pdf
Here is a list of common MIME types and their corresponding file extensions
This header instructs the browser to open a file download box, instead of trying to parse the content. Example:
Content-Disposition: attachment; filename="download.zip"
When content is going to be transmitted to the browser, the server can indicate the size of it using this header. This is especially useful for file downloads. That's how the browser can determine the progress of the download.
The web server may send this header with every document it serves. The value can be based on the last modify date, file size or even the checksum value of a file.
Etag: "pub1259380237;gz" The browser then saves this value as it caches the document. Next time the browser requests the same file, it sends this in the HTTP request:
If-None-Match: "pub1259380237;gz" If the Etag value of the document matches that, the server will send a "304 Not Modified" code instead of "200 OK", and no content. The browser will load the contents from its cache.
This header indicates the last modify date of the document, in GMT format:
Last-Modified: Sat, 28 Nov 2009 03:50:37 GMT
This header is used for redirections. If the response code is 301 or 302, the server must also send this header.
When a website wants to set or update a cookie in your browser, it will use this header.
Set-Cookie: skin=noskin; path=/; domain=.amazon.com; expires=Sun, 29-Nov-2009 21:42:28 GMT
Set-Cookie: session-id=120-7333518-8165026; path=/; domain=.amazon.com; expires=Sat Feb 27 08:00:00 2010 GMT
A website may send this header to authenticate a user through HTTP. When the browser sees this header, it will open up a login dialogue window.
WWW-Authenticate: Basic realm="Restricted Area"
This header is usually set when the returned content is compressed.
References
Cool Tools
Thanks for ReadingIf you would rather like to have this lecture note in printed format, please click the print action link in the top right corner. If you find any problem in this lecture note, please feel free to tell Steven via steven@findaway.hk. |