This article will delve into the protocol structure that enables communication on the WEB, the HTTP protocol.
Hypertext Transfer Protocol, HTTP, is a message transfer protocol over the Internet.
The HTTP protocol governs a structure and defines the types of messages exchanged between peers.
In addition, there is still a defined protocol with TLS encryption as RFC 2817 (Upgrading to TLS Within HTTP / 1.1).
The RFCs that deal with the second version of the unencrypted and encrypted HTTP protocol are RFCs 7540 and 8740, respectively.
Briefly and objectively, a client-server architecture separates a separation in the level of action, creating specific limits for each end of the communication.
Each tip only spreads its corresponding role, and the rest is transparent.
So what role does each end play in this pair? The customer is responsible for ordering the web objects.
This ordering is implemented by using the browser (browser). In this case, the client will only be concerned with sending them as messages.
It is not up to the client to even know how the server will respond to his request.
However, the request messages must contain all the information necessary for them to be served by the server.
The web objects requested by the client are usually hosted on the server. However, in most cases, data is stored in a DBMS physically separate from the main server.
Once the server receives the request, a query is made to the DBMS, returning the requested objects. Afterward, the server will respond with the requested content.
There is no packet loss handling, i.e., the server does not need to verify if the client’s message was received.
This no-packet loss handling allows lower server overhead, ensuring higher performance.
Furthermore, HTTP is a stateless protocol, meaning it has no stored state.
This lack of stored state means they will not be handled if there is a recurrence of occurrence, as the server does not know the client’s state.
It will respond to all messages, even if they are redundant. However, there are hypotheses where it will be necessary to maintain a persistent state.
In this case, cookies are used. To read our article about cookies, go to the link.
Thus, we can define the two types of messages in HTTP communication. The messages can be of the type: request and response.
This communication, including a request to send messages and a reply, takes place through a transport layer protocol called TCP.
TCP – Transmission Control Protocol – is a protocol that supports packet loss at the TCP/IP transport layer.
Within a TCP transmission, information is partitioned so that there is control over the packets sent.
This control allows the destination to recompose the information according to the sequence of packets.
Furthermore, in the event of packet loss on the path, the protocol can request the sender to resend the specific packet.
TCP communication modes
It is also worth noting that HTTP has two modes of communication: persistent and non-persistent.
In the case of the HTTP protocol, the type of communication used is determined when the client sends its request.
What does a non-persistent connection mean?
It means that every time a message is sent and received by the destination, the connection is terminated.
However, if we choose persistent communication, the protocol will establish and close the connection between the parties after sending all the page objects.
We will explore this protocol in a future article. For now, this explanation is enough for us to understand the dynamics of HTTP.
HTTP Request Structure
We define it as messages and as the opera protocol. OK, but how is an HTTP message organized? What is its structure?
Like HTTP, request messages are formed by three fields: request line, header line, and entity body.
HTTP Request Fields
The first field, the request line, corresponds to the first line of the package containing the version information of the protocol and method used.
The method is usually accompanied by a path that indicates the target server. For example, the method used in most HTTP request cases is called GET.
We will see this in more detail in this article, but in short, the GET requests a web object from the server, for example, the page.
The second field, named header line, encompasses the information between the second and fourth lines of the example shown in Figure 2.
Observing each line, we verify that the host address is determined in the structure, that is, the server to which the request will be sent .
In addition, the message also defines the type of connection between the parties through the “connection” parameter.
This parameter is set to “close”, i.e., a non-persistent TCP connection.
The agent and user preference characteristics are other information we find in this field. In our example, the request was made from a Mozilla browser in version 5.0.
However, several features can be configured according to customer preference.
In our example, a resource corresponds to the “accept-language: fr” parameter determines the user’s preference regarding the adopted language.
If the server does not have a requested option, it will be sent or default mode related to the resource.
HTTP Methods – GET and POST
Before we continue talking about a message response structure, let us go back to the topic of existing methods for the HTTP protocol.
In this article, we will quote some of them in case the reader wants to go deeper in his search for advice, reading about the RFCs cited here.
So far, we have talked briefly about the GET method. However, we still need to define what an HTTP method is.
An HTTP method informs the server which actions to perform for a particular resource.
The GET method, therefore, requests the server to send web objects for a specific item, such as a page.
Web objects are the elements that make up a page, such as text, images, video, and even the page itself.
Another widespread method, unlike GET, is the POST method. In this case, the client is submitting information to the server.
The server, in turn, receives and treats this information, which is sent encapsulated in the message body field.
An intuitive example of the POST method would be filling out a form or registering on a shopping site.
Access restriction with POST
Note that as the server is required to handle post information, many sites place limitations on this method.
For example, a malicious user could use this field to send a Trojan or malicious code to be executed on the server.
Therefore, if we try to perform a POST request by an agent other than a browser, for example, a JAVA API, this request will return a disallow error.
To learn more about some of these methods mentioned above, visit the HTTP Methods article click here.
HTTP Response Structure
So far, we have discussed Request HTTP messages and their fields. However, what is the structure of an HTTP Reply message?
Due to a request, the server can perform a series of actions in the background (user view).
We can cite as an example queries performed with the database to return the requested data.
The actions performed on the server will vary according to the method used (client) and server configuration.
Generally speaking, an HTTP message of the RESPONSE type has three fields: Status Line, Header Lines, and Entity Body.
These three fields are found in the figure below, but let us describe each.
HTTP Response Fields
The first item on our list is the Status Line. In this line, the server informed the version of the HTTP protocol, which in our example is 1.1.
Also, the status code of the message. This data informs us if a request was answered successfully (200 ok).
Otherwise, the corresponding status code will be described in the message if there is any error.
We will go into more detail about the status code later in another article on the site.
Header lines like provide message structure, configuration information, and metadata. We have already mentioned that HTTP uses TCP.
In this case, the server responds the same way as the client (in our previous example) with a non-persistent connection.
We also have information about the order as data, and sending server, in our case, an Apache 2.2.3 installed on a CentOS machine.
Finally, the size and type of content are information related to the data. Likewise, the Last-Modified field also matches the data.
However, we may wonder, “Why send the last page modification in the message? This information will be handy when dealing with caching.
This subject will be on standby for now and will be covered in another article.
How does an HTTP connection occur?
We already mentioned that the connection occurs via the TCP transport layer protocol in a non-persistent way.
Therefore, the user only needs to send a request to initiate the connection. The server will answer this request with the requested content.
An error message will be sent in case of an error or lack of content.
Let us cite two examples for the main methods: GET and POST. Suppose this example: User A (Alice) is browsing the internet and accessing the amazon page for the first time.
Alice will be requesting the content of the page from the server. In this case, which method is used?
Alice’s request will be forwarded with the GET method defined in a message request.
The package containing an HTTP Request has a method target URL, e.g., GET
- the destination host;
- the connection type, non-persistent equal to (close);
- the user agent used is = the Mozilla browser.
Upon receiving this message, the server will check the files, database, and other repositories to deliver the requested content.
Once this content is found, a response-type message is sent to the client.
The HTTP Response returns the web objects within its field entity-body. In our example, it will be the Amazon homepage.
However, we could use other pages to demonstrate this. Control fields like header line and status line (previously before) define the metadata of this package.
However, the POST method implies submitting to the server some content contained in the Entity Body field. Then, the server handles it and processes that content.
Suppose this time, Alice will fill out a registration form on the amazon website. Thus, the user (Alice) sends a packet containing an HTTP Request with the content of its form.
Upon receiving this message, the server will check its files, database, and other repositories to deliver the requested content.
Once this content is found, a response-type message is sent to the client. The fields vary according to the form. However, an aggregate of information is sent to the server.
By identifying a request with the POST method, the server will handle the content of the entity’s body.
A check and subsequent update of the database will be carried out. In this case, the POST request caused a succession of queries in the back-end database.
If the content update is well defined, a Response HTTP message will be sent with data identical to the previous example (GET method).
Otherwise, the customer will still receive an answer. However, we can identify the type of error that occurred by the message’s status code.
- Tanenbaum, Andrew et al. “Computer Networks.” Pearson – 5° edition.
- Kurose, James F., and Keith W. Ross. “Computer Networks and the Internet.” Pearson – 6° edition
Data Scientist and Master in Computer Modeling by LNCC.