Web Engineering
HTTP Protocol
Internet and Web
HTML tells the browser how to present the
content to the user.
Web and HyperText Transfer Protocol (HTTP)
First some jargon
◻ Web page consists of objects
◻ Object can be HTML file, JPEG image, Java applet, audio
file,…
◻ Web page consists of base HTML-file which includes several
referenced objects
◻ Each object is addressable by a URL
◻ Example URL:
[Link]/someDept/[Link]
path name
host name
URL
HTTP overview
HTTP: hypertext transfer
protocol HT
TP
req
◻ Web’s application layer protocol PC HT ues
TP t
running res
◻ client/server model pon
Explorer se
client: browser that requests,
receives, “displays” Web e st
equ
objects TPr on
s e Server
H T p running
es
server: Web server sends TPr Apache Web
objects in response to requests HT
server
◻ HTTP 1.0: RFC 1945
◻ HTTP 1.1: RFC 2068 Mac running
Navigator
Ports
❑ The TCP port numbers from
0 to 1023 are reserved for
well-known services.
❑ Don’t use these ports for your
own custom server programs!
HTTP overview (continued)
Uses TCP: HTTP is “stateless”
◻ client initiates TCP connection ◻ server maintains no
(creates socket) to server, port 80 information about past
◻ server accepts TCP connection client requests
from client aside
◻ HTTP messages (application-layer Protocols that maintain “state”
protocol messages) exchanged are complex!
between browser (HTTP client) r past history (state) must be
and Web server (HTTP server) maintained
◻ TCP connection closed r if server/client crashes, their
views of “state” may be
inconsistent, must be
reconciled
HTTP connections
Nonpersistent HTTP Persistent HTTP
◻ At most one object is sent ◻ Multiple objects can be sent
over a TCP connection. over single TCP connection
◻ HTTP/1.0 uses nonpersistent between client and server.
HTTP ◻ HTTP/1.1 uses persistent
connections in default mode
Nonpersistent HTTP
Suppose user enters URL
[Link]/someDepartment/[Link] (contains text,
references to
10
1a. HTTP client initiates TCP connection to jpeg images)
HTTP server (process) at
[Link] on port 80
1b. HTTP server at host
[Link] waiting
for TCP connection at port 80.
“accepts” connection, notifying
client
2. HTTP client sends HTTP
request message (containing
URL) into TCP connection 3. HTTP server receives request
socket. Message indicates that message, forms response
client wants object message containing requested
someDepartment/[Link] object, and sends message into
its socket
time
Nonpersistent HTTP (cont.)
4. HTTP server closes TCP
connection.
5. HTTP client receives response
message containing html file,
displays html. Parsing html file, finds
10 referenced jpeg objects
time
6. Steps 1-5 repeated for each of
10 jpeg objects
Response time modeling
Definition of RRT: time to send a
small packet to travel from
client to server and back.
initiate TCP
Response time: connection
◻ one RTT to initiate TCP RTT
connection request
file
◻ one RTT for HTTP request and RTT
time to
transmit
first few bytes of HTTP file
file
response to return received
◻ file transmission time
time time
total = 2RTT+transmit time
Persistent HTTP
Nonpersistent HTTP issues: Persistent without pipelining:
◻ requires 2 RTTs per object ◻ client issues new request only
◻ OS must work and allocate host when previous response has
resources for each TCP connection been received
◻ but browsers often open parallel ◻ one RTT for each referenced
TCP connections to fetch object
referenced objects Persistent with pipelining:
Persistent HTTP ◻ default in HTTP/1.1
◻ server leaves connection open ◻ client sends requests as soon as
after sending response it encounters a referenced
◻ subsequent HTTP messages object
between same client/server are ◻ as little as one RTT for all the
sent over connection referenced objects
HTTP request message
◻ two types of HTTP messages: request, response
◻ HTTP request message:
ASCII (human-readable format)
request line
(GET, POST, GET /somedir/[Link] HTTP/1.1
HEAD Host: [Link]
commands) User-agent: Mozilla/4.0
header Connection: close
lines Accept-language:fr
Carriage return, (extra carriage return, line feed)
line feed
indicates end
of message
HTTP request message
Anatomy of an HTTP GET request
Anatomy of an HTTP GET request
Ch 3 - 18
Anatomy of an HTTP POST request
Anatomy of an HTTP POST request
Ch 3 - 20
HTTP request message: general format
GET /somedir/[Link] HTTP/1.1
Host: [Link]
User-agent: Mozilla/4.0
Connection: close
Accept-language:fr
(extra carriage return, line feed)
HTTP request message: general format
Now let's look at the header lines in the example. The header line HOST: [Link] specifies the host on which the
object resides. You night think that this header line is unnecessary, as there is already a TCP connection in place to the host. But,
as we'll see in Section 2.2.6, the information provided by the host header line is required by Web proxy caches. By including
theConnection:close header line, the browser is telling the server that it doesn't want to use persistent connections; it wants the
server to close the connection after sending the requested object. Thus the browser that generated this request message
implements HTTP/1.1 but it doesn't want to bother with persistent connections. The User-agent: header line specifies the user
agent, that is, the browser type that is making the request to the server . Here the user agent is Mozilla/4.0, a Netscape browser.
This header line is useful because the server can actually send different versions of the same object to different types of user
agents. (Each of the versions is addressed by the same URL.) Finally, the Accept-language: header indicates that the user prefers
to receive a French version of the object, if such an object exists on the server; otherwise, the server should send its default
version.
The Entity Body is not used with the GET method, but is used with the POST method. The HTTP client uses the POST method
when the user fills out a form
Method types
HTTP/1.0 HTTP/1.1
◻ GET ◻ GET, POST, HEAD
◻ POST ◻ PUT
◻ HEAD uploads file in entity body to
path specified in URL field
asks server to leave
requested object out of ◻ DELETE
response deletes file specified in the
URL field
HTTP response message
status line
(protocol
status code HTTP/1.1 200 OK
status phrase) Connection close
Date: Thu, 06 Aug 1998 [Link] GMT
header Server: Apache/1.3.0 (Unix)
lines Last-Modified: Mon, 22 Jun 1998 …...
Content-Length: 6821
Content-Type: text/html
data, e.g., data data data data data ...
requested
HTML file
HTTP response status codes
In first line in server->client response message.
A few sample codes:
200 OK
request succeeded, requested object later in this message
301 Moved Permanently
requested object moved, new location specified later in this message
(Location:)
400 Bad Request
request message not understood by server
404 Not Found
requested document not found on this server
505 HTTP Version Not Supported
User-Server Interaction: Authorization and Cookies
◻ HTTP server is stateless – simplifies server design
◻ Sometime server needs to identify user
◻ Two mechanism for identification:
1. Authorization & 2. CooKies
Authorization :
1) Provide username and password to access documents on server
2) Status code 401: Authorization Required
User-server state: cookies
Many major Web sites use Example:
cookies Susan access Internet always
from same PC
Four components:
She visits a specific
1) cookie header line in the HTTP
e-commerce site for first time
response message
When initial HTTP requests
2) cookie header line in HTTP
arrives at site, site creates a
request message
unique ID and creates an
3) cookie file kept on user’s host entry in backend database
and managed by user’s for ID
browser
4) back-end database at Web
site
Cookies: keeping “state” (cont.)
client server
Cookie file usual http request msg ne
server da try i
tab n b
usual http response + creates ID as ac
e
ebay: 8734 ke
Set-cookie: 1678 1678 for user nd
Cookie file
usual http request msg
amazon: 1678 cookie: 1678 cookie-
ss
ebay: 8734 specific acce
usual http response msg action
ss
one week later:
ce
ac
Cookie file usual http request msg
cookie-
amazon: 1678
cookie: 1678
spectific
ebay: 8734 usual http response msg action
Cookies (continued)
aside
Cookies and privacy:
What cookies can bring: r cookies permit sites to learn a lot about you
r you may supply name and e-mail to sites
r search engines use redirection & cookies to
◻ authorization learn yet more
r advertising companies obtain info across sites
◻ shopping carts
◻ recommendations
◻ user session state (Web
e-mail)
Thank you