World Wide Web
Uniform Resource Locators (URL)
HTTP Hypertext Transfer Protocol
RFC 1945 (HTTP 1.0) RFC 2616 (HTTP 1.1)
Web consists of a large set of documents, called Web pages, that are accessible over the Internet. Each Web page is classified as a hypermedia document.
The suffix media is used to indicate that a document can contain items other than text (e.g., graphics images); the prefix hyper is used because a document can contain selectable links that refer to other, related documents.
Two main building blocks are used to implement the Web on top of the global Internet. Web browser, Web server Pages that contain a mixture of text and other items are represented using HyperText Markup Language (HTML). An HTML document consists of a file that contains text along with embedded commands, called tags, that give guidelines for display.
Each Web page is assigned a unique name that is used to identify it. The name,which is called a Uniform Resource Locator (URL), begins with a specification of the scheme used to access the item.
[Link] parameters][?query]
URLs
HTTP
HTTP characteristics
[Link]
[Link] /[Link]
[Link] .aspx Relative URLs:
/arcd/arc_nucleus.htm
HTTP is the protocol that supports communication between web browsers and web servers. HTTP is an application-level protocol with the lightness and speed necessary for distributed, hypermedia information systems The RFC states that the HTTP protocol generally takes place over a TCP connection, but the protocol itself is not dependent on a specific transport layer.
Application Level. Request/Response Stateless.
Each H'ITP request is self-contained; the server does not keep a history of previous requests or previous sessions.
Bi-Directional Transfer Capability Negotiation Support For Caching
To improve response time, a browser caches a copy of each Web page it retrieves. If a user requests a page again, HTTP allows the browser to interrogate the server to determine whether the contents of the page has changed since the copy was cached.
Support For Intermediaries.
HTTP allows a machine along the path between a browser and a server to act as a proxy server that caches Web
Request - Response
Well Known Address
HTTP Versions
HTTP has a simple structure:
client sends a request server returns a reply.
The well known TCP port for HTTP servers is port 80. Other ports can be used as well...
The original version now goes by the name HTTP Version 0.9
HTTP 0.9 was used for many years.
HTTP can support multiple requestreply exchanges over a single TCP connection.
Starting with HTTP 1.0 the version number is part of every request.
tells the server what version the client can talk (what options are supported, etc).
HTTP 1.0+ Request
Request Line
Method URI HTTP-Version\r\n
Request Method
Lines of text (ASCII). Lines end with CRLF \r\n First line is called Request-Line
The Request Method can be:
GET HEAD POST DELETE OPTIONS PUT TRACE
The request line contains 3 tokens (words). space characters separate the tokens. Newline (\n) seems to work by itself (but the protocol requires CRLF)
future expansion is supported
Methods
Methods (cont.)
More Methods
TRACE: used to trace HTTP forwarding through proxies, tunnels, etc. OPTIONS: used to determine the capabilities of the server, or characteristics of a named resource.
GET: retrieve information identified by the URI. HEAD: retrieve meta-information about the URI. POST: send information to a URI and retrieve result.
PUT: Store information in location named by URI.
DELETE: remove entity identified by URI.
HTTP Version Number
HTTP/1.0 or HTTP/1.1 HTTP 0.9 did not include a version number in a request line. If a server gets a request line with no HTTP version number, it assumes 0.9
The Header Lines
After the Request-Line come a number (possibly zero) of HTTP header lines. Each header line contains an attribute name followed by a : followed by a space and the attribute value.
The Name and Value are just text.
Headers
Request Headers provide information to the server about the client
what kind of client what kind of content will be accepted who is making the request
There can be 0 headers (HTTP 1.0) HTTP 1.1 requires a Host: header
Example HTTP Headers
GET /[Link] HTTP/1.1 Host: [Link] //must in HTTP 1.1 Connection: Keep-Alive User-Agent: Mozilla/4.06 [en] (X11; U; Linux 2.1.121 i686) Accept: image/gif, image/x-xbitmap, image/jpeg, image/pjpeg, image/png Accept-Encoding: gzip Accept-Language: en Accept-Charset: iso-8859-1,utf-8 <blank line>
End of the Headers
POST
A POST request includes some content (some data) after the headers (after the blank line). There is no format for the data (just raw bytes). A POST request must include a ContentLength line in the headers:
Content-length: 267
Each header ends with a CRLF ( \r\n ) The end of the header section is marked with a blank line.
just CRLF
For GET and HEAD requests, the end of the headers is the end of the request!
Example POST Request
POST /[Link] HTTP/1.1 Host: [Link] //must in HTTP 1.1 Connection: Keep-Alive User-Agent: Mozilla/4.06 [en] (X11; U; Linux 2.1.121 i686) Accept: image/gif, image/x-xbitmap, image/jpeg, image/pjpeg, image/png Accept-Encoding: gzip Accept-Language: en Accept-Charset: iso-8859-1,utf-8 Content Length:35
Typical Method Usage
GET used to retrieve an HTML document. HEAD used to find out if a document has changed. POST used to submit a form.
HTTP Response
ASCII Status Line Headers Section
Status-Line Headers . . .
blank line
Content...
Content can be anything (not just text)
typically an HTML document or some kind of image.
idno=2007A1PS001&item=test1&name=Krishna
Response Status Line
HTTP-Version
Status Codes
1xx Informational 2xx Success 3xx Redirection 4xx Client Error 5xx Server Error
Example Status Lines
HTTP/1.0 200 OK HTTP/1.0 301 Moved Permanently HTTP/1.0 400 Bad Request HTTP/1.0 500 Internal Server Error
Status-Code
Message
Status Code is 3 digit number (for computers) Message is text (for humans)
Response Headers
Response Header Examples
Date: Sat, 30 Jan 2010 [Link] IST Server: Apache/1.17 Content-Type: text/html Content-Length: 1756 //len of content that arrives after headers Content-Encoding: gzip
Content
Provide the client with information about the returned entity (document).
what kind of document how big the document is how the document is encoded when the document was last modified
Content can be anything (sequence of raw bytes).
Content-Length header is required for any response that includes content.
Content-Type header also required.
Response headers end with blank line
Single Request/Reply
The client sends a complete request. The server sends back the entire reply. The server closes its socket.
Persistent Connections
HTTP 1.1 supports persistent connections (this is the default). Multiple requests can be handled over a single TCP connection. The Connection: header is used to exchange information about persistence (HTTP/1.1) 1.0 Clients used a Keep-alive: header
Persistent Connections And Lengths
In HTTP 1.0, a client opens a TCP connection and sends a GET request. The server transmits a copy of the requested item, and then closes the TCP connection. Until it encounters an end of file condition, the client reads data from the TCP connection. Finally, the client closes its end of the connection.
If the client needs another document it must open a new connection.
This was the default for HTTP 1.0
Persistent Connections And Lengths
Data Length And Program Output
Data Length And Program Output
HTTP/1.1 200 OK Server: Microsoft-IIS/5.0 Date: Fri, 08 Oct 2010 [Link] GMT Connection: close Content-Type: text/html
The chief advantage of persistent connections lies in reduced overhead A browser using a persistent connection can further optimize by pipelining requests (i.e., send requests back-to-back without waiting for a response). The chief disadvantage of using a persistent connection lies in the need to identify the beginning and end of each item sent over the connection. There are two possible techniques that handle the situation:
either send a length followed by the item or send a sentinel value after the item to mark the end.
to avoid ambiguity between sentinel values and data, HlTP uses the approach of sending a length followed by an item of that size.
It may not be convenient or even possible for a server to know the length of an item before sending. Servers use the Common Gateway Interface (CGI) mechanism to create dynamic documents. To provide for dynamic Web pages, the HTTP standard specifies that if the server does not know the length of an item a priori, the server can inform the browser that it will close the connection after transmitting the item
Conditional Requests
HTTP Proxy Server
Proxy Server
Security by filtering Performance by Caching
HlTP allows a sender to make a request conditional For example
If-Modified-Since: Sat, 01 Jan 2000 [Link] GMT
Browser
Proxy
HTTP Server