HTTP Protocol

This article will delve into the protocol structure that enables communication on the WEB, the HTTP protocol.

Table Of Contents

HTTP Protocol

Hypertext Transfer Protocol, HTTP, is a message transfer protocol over the Internet.

The HTTP protocol governs a structure and defines the types of messages exchanged between peers.

It is defined by RFCs 1945 (version 1.0) and 2116 based on client-server architecture. RFCs 2116 were deprecated, and a new RFC was released, RFC 7231 (version 1.1).

In addition, there is still a defined protocol with TLS encryption as RFC 2817 (Upgrading to TLS Within HTTP / 1.1).

The RFCs that deal with the second version of the unencrypted and encrypted HTTP protocol are RFCs 7540 and 8740, respectively.

Briefly and objectively, a client-server architecture separates a separation in the level of action, creating specific limits for each end of the communication.

Each tip only spreads its corresponding role, and the rest is transparent.

Client/Server Architecture

So what role does each end play in this pair? The customer is responsible for ordering the web objects.

This ordering is implemented by using the browser (browser). In this case, the client will only be concerned with sending them as messages.

It is not up to the client to even know how the server will respond to his request.

However, the request messages must contain all the information necessary for them to be served by the server.

The web objects requested by the client are usually hosted on the server. However, in most cases, data is stored in a DBMS physically separate from the main server.

Once the server receives the request, a query is made to the DBMS, returning the requested objects. Afterward, the server will respond with the requested content.

There is no packet loss handling, i.e., the server does not need to verify if the client’s message was received.

This no-packet loss handling allows lower server overhead, ensuring higher performance.

Furthermore, HTTP is a stateless protocol, meaning it has no stored state.

This lack of stored state means they will not be handled if there is a recurrence of occurrence, as the server does not know the client’s state.

It will respond to all messages, even if they are redundant. However, there are hypotheses where it will be necessary to maintain a persistent state.

In this case, cookies are used. To read our article about cookies, go to the link.

TCP Overview

Thus, we can define the two types of messages in HTTP communication. The messages can be of the type: request and response.

This communication, including a request to send messages and a reply, takes place through a transport layer protocol called TCP.

TCP – Transmission Control Protocol – is a protocol that supports packet loss at the TCP/IP transport layer.

Within a TCP transmission, information is partitioned so that there is control over the packets sent.

This control allows the destination to recompose the information according to the sequence of packets.

Furthermore, in the event of packet loss on the path, the protocol can request the sender to resend the specific packet.

TCP communication modes

It is also worth noting that HTTP has two modes of communication: persistent and non-persistent.

In the case of the HTTP protocol, the type of communication used is determined when the client sends its request.

What does a non-persistent connection mean?

It means that every time a message is sent and received by the destination, the connection is terminated.

However, if we choose persistent communication, the protocol will establish and close the connection between the parties after sending all the page objects.

We will explore this protocol in a future article. For now, this explanation is enough for us to understand the dynamics of HTTP.

HTTP Request Structure

We define it as messages and as the opera protocol. OK, but how is an HTTP message organized? What is its structure?

Like HTTP, request messages are formed by three fields: request line, header line, and entity body.

HTTP Request Fields

The first field, the request line, corresponds to the first line of the package containing the version information of the protocol and method used.

The method is usually accompanied by a path that indicates the target server. For example, the method used in most HTTP request cases is called GET.

We will see this in more detail in this article, but in short, the GET requests a web object from the server, for example, the page.

The second field, named header line, encompasses the information between the second and fourth lines of the example shown in Figure 2.

Observing each line, we verify that the host address is determined in the structure, that is, the server to which the request will be sent .

In addition, the message also defines the type of connection between the parties through the “connection” parameter.

This parameter is set to “close”, i.e., a non-persistent TCP connection.

The agent and user preference characteristics are other information we find in this field. In our example, the request was made from a Mozilla browser in version 5.0.

However, several features can be configured according to customer preference.

In our example, a resource corresponds to the “accept-language: fr” parameter determines the user’s preference regarding the adopted language.

If the server does not have a requested option, it will be sent or default mode related to the resource.

HTTP Methods – GET and POST

Before we continue talking about a message response structure, let us go back to the topic of existing methods for the HTTP protocol.

In this article, we will quote some of them in case the reader wants to go deeper in his search for advice, reading about the RFCs cited here.

So far, we have talked briefly about the GET method. However, we still need to define what an HTTP method is.

An HTTP method informs the server which actions to perform for a particular resource.

The GET method, therefore, requests the server to send web objects for a specific item, such as a page.

Web objects are the elements that make up a page, such as text, images, video, and even the page itself.

Another widespread method, unlike GET, is the POST method. In this case, the client is submitting information to the server.

The server, in turn, receives and treats this information, which is sent encapsulated in the message body field.

An intuitive example of the POST method would be filling out a form or registering on a shopping site.

Access restriction with POST

Note that as the server is required to handle post information, many sites place limitations on this method.

For example, a malicious user could use this field to send a Trojan or malicious code to be executed on the server.

Therefore, if we try to perform a POST request by an agent other than a browser, for example, a JAVA API, this request will return a disallow error.

To learn more about some of these methods mentioned above, visit the HTTP Methods article click here.

HTTP Response Structure

So far, we have discussed Request HTTP messages and their fields. However, what is the structure of an HTTP Reply message?

Due to a request, the server can perform a series of actions in the background (user view).

We can cite as an example queries performed with the database to return the requested data.

The actions performed on the server will vary according to the method used (client) and server configuration.

Generally speaking, an HTTP message of the RESPONSE type has three fields: Status Line, Header Lines, and Entity Body.

These three fields are found in the figure below, but let us describe each.

HTTP Response Fields

The first item on our list is the Status Line. In this line, the server informed the version of the HTTP protocol, which in our example is 1.1.

Also, the status code of the message. This data informs us if a request was answered successfully (200 ok).

Otherwise, the corresponding status code will be described in the message if there is any error.

We will go into more detail about the status code later in another article on the site.

Header lines like provide message structure, configuration information, and metadata. We have already mentioned that HTTP uses TCP.

In this case, the server responds the same way as the client (in our previous example) with a non-persistent connection.

We also have information about the order as data, and sending server, in our case, an Apache 2.2.3 installed on a CentOS machine.

Finally, the size and type of content are information related to the data. Likewise, the Last-Modified field also matches the data.

However, we may wonder, “Why send the last page modification in the message? This information will be handy when dealing with caching.

This subject will be on standby for now and will be covered in another article.

Figure 2 – Example HTTP Response message structure

How does an HTTP connection occur?

We already mentioned that the connection occurs via the TCP transport layer protocol in a non-persistent way.

Therefore, the user only needs to send a request to initiate the connection. The server will answer this request with the requested content.

An error message will be sent in case of an error or lack of content.

GET Example

Let us cite two examples for the main methods: GET and POST. Suppose this example: User A (Alice) is browsing the internet and accessing the amazon page for the first time.

Alice will be requesting the content of the page from the server. In this case, which method is used?

Alice’s request will be forwarded with the GET method defined in a message request.

The package containing an HTTP Request has a method target URL, e.g., GET

https://www.amazon.com;
the destination host;
the connection type, non-persistent equal to (close);
the user agent used is = the Mozilla browser.

Upon receiving this message, the server will check the files, database, and other repositories to deliver the requested content.

Once this content is found, a response-type message is sent to the client.

The HTTP Response returns the web objects within its field entity-body. In our example, it will be the Amazon homepage.

However, we could use other pages to demonstrate this. Control fields like header line and status line (previously before) define the metadata of this package.

Exemplifying a request with HTTP GET method

POST Example

However, the POST method implies submitting to the server some content contained in the Entity Body field. Then, the server handles it and processes that content.

Suppose this time, Alice will fill out a registration form on the amazon website. Thus, the user (Alice) sends a packet containing an HTTP Request with the content of its form.

Upon receiving this message, the server will check its files, database, and other repositories to deliver the requested content.

Once this content is found, a response-type message is sent to the client. The fields vary according to the form. However, an aggregate of information is sent to the server.

By identifying a request with the POST method, the server will handle the content of the entity’s body.

A check and subsequent update of the database will be carried out. In this case, the POST request caused a succession of queries in the back-end database.

If the content update is well defined, a Response HTTP message will be sent with data identical to the previous example (GET method).

Otherwise, the customer will still receive an answer. However, we can identify the type of error that occurred by the message’s status code.

Example of a request with HTTP POST method

See more:

HTTP Client – GET and POST requests with JAVA API

HTTP Methods

Cookies – HTTP Protocol

Apache HTTPS Server configure

Install APACHE on WINDOWS

Installing servers in Docker

Install Zabbix on Ubuntu

Bibliographic References

Tanenbaum, Andrew et al. “Computer Networks.” Pearson – 5° edition.
Kurose, James F., and Keith W. Ross. “Computer Networks and the Internet.” Pearson – 6° edition

Juliana Mascarenhas

Data Scientist and Master in Computer Modeling by LNCC.
Computer Engineer

Cookie	Duration	Description
cookielawinfo-checkbox-advertisement	1 year	Set by the GDPR Cookie Consent plugin, this cookie is used to record the user consent for the cookies in the "Advertisement" category
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional". O cookie é definido pelo consentimento do cookie GDPR para registrar o consentimento do usuário para os cookies na categoria "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary". Este cookie é definido pelo plug-in GDPR Cookie Consent. Os cookies são usados para armazenar o consentimento do usuário para os cookies na categoria "Necessary",
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other".
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data. O cookie é definido pelo plug-in GDPR Cookie Consent e é usado para armazenar se o usuário consentiu ou não com o uso de cookies. Ele não armazena nenhum dado pessoal.

Cookie	Duration	Description
_tccl_visit	30 minutes	This cookie is set by the web hosting provider GoDaddy. This is a persistent cookie used for monitoring the website usage performance.
_tccl_visitor	1 year	This cookie is set by the web hosting provider GoDaddy. This is a persistent cookie used for monitoring the website usage performance.

Cookie	Duration	Description
__gads	1 year 24 days	The __gads cookie, set by Google, is stored under DoubleClick domain and tracks the number of times users see an advert, measures the success of the campaign and calculates its revenue. This cookie can only be read from the domain they are set on and will not track any data while browsing through other sites.
_ga	2 years	The _ga cookie, installed by Google Analytics, calculates visitor, session and campaign data and also keeps track of site usage for the site's analytics report. The cookie stores information anonymously and assigns a randomly generated number to recognize unique visitors.
_gat_gtag_UA_199766752_1	1 minute	Set by Google to distinguish users.
_gid	1 day	Installed by Google Analytics, _gid cookie stores information on how visitors use a website, while also creating an analytics report of the website's performance. Some of the data that are collected include the number of visitors, their source, and the pages they visit anonymously.

Cookie	Duration	Description
FCCDCF	12 hours	No description available.
GoogleAdServingTest	session	No description