Basis of computer networks

By learning computer networks, I've gather some most useful knowledge and put them here.

What is a three-way handshake?

-The first handshake: Client sets SYN to 1, randomly generates an initial sequence number seq and sends it to Server, and enters the SYN_SENT state; -The second handshake: After receiving the SYN=1 from the Client, the Server knows that the client requests to establish a connection, and sets its SYN to 1, ACK to 1, and generates an acknowledgement number=sequence number+1, and randomly generates its own The initial serial number, sent to the client; enter the SYN_RCVD state; -The third handshake: The client checks whether the acknowledge number is serial number +1 and whether the ACK is 1, and after the check is correct, it sets its own ACK to 1, generates an acknowledge number = the serial number sent by the server +1, and sends it to the server ; Enter the ESTABLISHED state; after the server checks that the ACK is 1 and the acknowledge number is the serial number +1, it also enters the ESTABLISHED state; the three-way handshake is completed, and the connection is established.

Can TCP connect two handshake?

why?

Not possible. There are two reasons:

First of all, it may appear that the invalid connection request segment is transmitted to the server again.

The first connection request message segment sent by the client was not lost, but stayed at a network node for a long time, so that it was delayed until a certain time after the connection was released before reaching the server. Originally, this is a segment that has long since expired. But after the server receives this invalid connection request segment, it mistakes it for a new connection request sent by the client again. So it sends a confirmation segment to the client, agreeing to establish a connection. Assuming that the "three-way handshake" is not used, then as long as the server sends an acknowledgement, a new connection is established. Since the client does not send a request to establish a connection, it ignores the server's confirmation and does not send data to the server. But the server thinks that the new transport connection has been established, and has been waiting for the client to send data. In this way, a lot of server resources are wasted in vain. The "three-way handshake" approach can prevent the above phenomenon from happening. For example, in the case just now, the client will not send a confirmation to the server's confirmation. Since the server cannot receive the confirmation, it knows that the client did not request to establish a connection.

Secondly, the two handshake cannot guarantee that the client correctly receives the message of the second handshake (the Server cannot confirm whether the client has received it), nor can it guarantee the successful exchange of initial serial numbers between the client and the server.

Can a four-way handshake be used?

why?

can. But it will reduce the transmission efficiency.

The four-way handshake refers to: the second handshake: Server only sends ACK and acknowledge number; and the server's SYN and initial sequence number are sent during the third handshake; the third handshake in the original protocol becomes the fourth handshake. For optimization purposes, two and three of the four-way handshake can be combined.

In the third handshake, what happens if the client's ACK is not delivered to the server?

Server side: Since the Server did not receive the ACK confirmation, it will retransmit the previous SYN+ACK (the default is five times, and then automatically close the connection and enter the CLOSED state), and the Client will retransmit the ACK to the Server after receiving it.

Client side, two situations:

During the server's timeout retransmission process, if the Client sends data to the server, the ACK of the data header is 1, so the server will read the ACK number after receiving the data and enter the establish state
After the Server enters the CLOSED state, if the Client sends data to the server, the server will reply with an RST packet.

What should I do if the connection has been established but the client fails?

The server resets a timer every time it receives a request from the client. The time is usually set to 2 hours. If it has not received any data from the client for two hours, the server will send a probe segment, and then every 75 Sent once every second. If there is no response after sending 10 probe messages, the server considers the client to be faulty, and then closes the connection.

What is the initial serial number?

One party A of the TCP connection randomly selects a 32-bit sequence number (Sequence Number) as the initial sequence number (ISN) of the sent data, such as 1000, and uses the sequence number as the origin to perform the data transmission Number: 1001, 1002... During the three-way handshake, this initial serial number is transmitted to the other party B, so that when data is transmitted, B can confirm which data number is legal; at the same time, when data is transmitted, A returns Every byte received by B can be confirmed. If A receives B's acknowledgement number (acknowledge number) is 2001, it means that the data numbered 1001-2000 has been successfully accepted by B.

What is four waves of hands?

-First wave: Client sets FIN to 1, and sends a sequence number seq to Server; enters FIN_WAIT_1 state; -Wave the second time: After the server receives the FIN, it sends an ACK=1, acknowledge number=received serial number+1; enters the CLOSE_WAIT state. At this point, the client has no data to send, but it can still accept data from the server. -Wave for the third time: Server sets FIN to 1, sends a serial number to Client; enters LAST_ACK state; -Fourth wave: After receiving the FIN from the server, the client enters the TIME_WAIT state; then sets the ACK to 1, and sends an acknowledge number=serial number+1 to the server; after the server receives it, it changes to the CLOSED state after confirming the acknowledge number , No longer send data to the client. The client also enters the CLOSED state after waiting for 2*MSL (the longest life span of the message segment). Complete four waves.

Why can't the ACK and FIN sent by the server be combined into three waves (what is the meaning of the CLOSE_WAIT state)?

Because when the server receives the client's request for disconnection, there may be some data that has not been sent. At this time, it will reply ACK first, indicating that it has received the disconnection request. Wait until the data is sent and then send FIN to disconnect the data transmission from the server to the client.

What happens if the server's ACK is not delivered to the client during the second wave?

The client does not receive the ACK confirmation and will resend the FIN request.

What is the meaning of the client TIME_WAIT state?

In the fourth wave of hands, the ACK sent by the client to the server may be lost, and the TIME_WAIT state is used to resend the ACK message that may be lost. If the Server does not receive the ACK, it will resend the FIN. If the Client receives the FIN within 2*MSL, it will resend the ACK and wait for 2MSL again to prevent the Server from continuously resending the FIN without receiving the ACK.

MSL (Maximum Segment Lifetime) refers to the maximum survival time of a segment in the network. 2MSL is the maximum time required for a transmission and a reply. If the Client does not receive the FIN again until 2MSL, the Client concludes that the ACK has been successfully received and ends the TCP connection.

How does TCP realize flow control?

Use sliding window protocol to achieve flow control. Prevent the sender from sending too fast and the receiver's buffer area is not enough to cause overflow. The receiver will maintain a receiver window (the size of the window is in bytes). The size of the receiving window is dynamically adjusted according to its own resource situation. When returning ACK, the size of the receiving window is placed in the window field of the TCP message. sender. The size of the sending window cannot exceed the size of the receiving window, and the sending window can be moved to the right only after the sender sends and receives the confirmation.

The upper limit of the sending window is the smaller of the receiving window and the congestion window. The acceptance window indicates the receiving capability of the receiver, and the congestion window indicates the transmission capability of the network.

What is a zero window (what happens when the receiving window is 0)?

If the receiver is not capable of receiving data, the receiving window will be set to

At this time, the sender must pause sending data, but will start a persistence timer, and send a 1-byte probe after expiration. Data packet to view the status of the receive window. If the receiver can receive the data, it will update the receiving window size in the returned message and resume data transmission.

How is TCP congestion control implemented?

Congestion control is mainly composed of four algorithms: Slow Start, Congestion voidance, Fast Retransmit, Fast Recovery

Slow start: At the beginning of sending data, first set the congestion window to the value of the maximum segment MSS, and add 1 MSS to the congestion window every time a new confirmation message is received. In this way, each transmission round (or each round-trip time RTT), the size of the congestion window will double
Congestion avoidance: When the size of the congestion window reaches the slow start threshold, the congestion avoidance algorithm is executed. The size of the congestion window no longer increases exponentially, but increases linearly, that is, it only increases by 1MSS after each transmission round. .

Whether in the slow start phase or in the congestion avoidance phase, as long as the sender judges that the network is congested (the basis is that no confirmation is received), the slow start threshold ssthresh must be set to half of the sender window value when congestion occurs (but Cannot be less than 2). Then reset the congestion window cwnd to 1, and execute the slow start algorithm. (This is the case when fast retransmission is not used)

Fast retransmission: Fast retransmission requires the receiver to send a repeated confirmation immediately after receiving an out-of-sequence segment (in order to let the sender know that there is a segment that has not arrived at the other party). Wait until you send the data with a confirmation. The fast retransmission algorithm stipulates that as long as the sender receives three repeated acknowledgments in a row, it should immediately retransmit the segment that the other party has not received, without having to wait for the set retransmission timer to expire.
Fast recovery: When the sender receives three repeated confirmations in a row, it halves the slow start threshold, and then executes the congestion avoidance algorithm. The reason for not implementing the slow start algorithm: because if the network is congested, it will not receive several repeated confirmations, so the sender believes that the network may not be congested now.

Some fast retransmissions increase the value of the congestion window cwnd at the beginning, which is equal to ssthresh

3*MSS. The reason for this is that since the sender has received three repeated acknowledgments, it indicates that three packets have left the network. These three packets no longer consume network resources but stay in the receiver's buffer. It can be seen that three groups have been reduced in the network now. Therefore, the congestion window can be appropriately enlarged.

How does TCP make maximum use of bandwidth?

TCP rate is affected by three factors

-Window: the size of the sliding window, see [How does TCP realize flow control? ](#How to implement flow control in TCP)

-Bandwidth: Bandwidth here refers to the "highest data rate" that can be passed from the sender to the receiver in a unit time, which is a hardware limitation. The number of data transmissions between the TCP sender and receiver cannot exceed the bandwidth limit between the two points. The bandwidth between the sending end and the receiving end is the minimum bandwidth of the line passed through (such as connecting via the Internet).

-RTT: Round Trip Time, which means the time required from the sender to the receiver. TCP will sample the RTT during the data transmission process (that is, measure the time difference between the sent data packet and its ACK, and Update the RTT value according to the measured value), TCP updates the RTO value according to the obtained RTT value, that is, Retransmission TimeOut, which is the retransmission interval. The sender counts each sent data packet. If the sent is not received within the RTO time If the corresponding ACK of the data packet is lost, the task data packet will be lost and the data will be retransmitted. Generally, the RTO value is larger than the sampled RTT value.

Bandwidth delay product=bandwidth*RTT, which is actually equal to twice the data volume of the one-way channel from the sender to the receiver. The data volume of the one-way channel can be understood in this way. The one-way channel is regarded as a one-way road, The bandwidth is the number of lanes on the road, and the cars running on the road are the data (but all cars here have the same speed, and no one wants to overtake, everyone goes hand in hand), then the data volume of the one-way channel is full on this one-way street How many cars can you put in total? The bandwidth is the number of lanes on the road, and the bandwidth multiplied by the data volume of the one-way channel is the total amount of data that can be accommodated on the road. When the road is full, you can no longer put it inside.

Suppose the size of the sliding window is , and the bandwidth of the sender and receiver is ![](https://latex.codecogs.com/svg.latex ?B), RTT is.

As mentioned before, TCP is limited by the sliding window when sending data. When TCP sends all the data in the sliding window, before receiving the first ACK, the sliding window size is 0, and no more data can be sent. Waiting for the ACK packet to move the sliding window. So in an ideal situation, when should the ACK packet arrive? Obviously, the ACK packet arrives after the RTT time after the data is sent. This means that, without considering packet loss and congestion, the maximum amount of data that TCP can send within an RTT time is, So regardless of bandwidth limitations, the maximum speed that TCP can achieve at a time is.

Now consider bandwidth restrictions. I said that when the road is full of cars, you can no longer put cars in it. For the same reason, the TCP sender is at![](https://latex.codecogs.com/svg. latex?\frac{T_r}{2}) In time, the maximum amount of data that can be put on the channel is![](https://latex.codecogs.com/svg.latex?\frac{V*T_r}{ 2}), the volume limit obtained by the bandwidth-delay product is. When , one-way The channel volume does not constitute a bottleneck, and the rate limitation mainly comes from the window size limitation. And when, then It is limited by the volume, that is, the rate limitation comes from the bandwidth limitation at this time.

Therefore, the maximum rate of TCP is

In the broadband network, ADSL and other environments that we use in our daily life, because the bandwidth is relatively small, so is also relatively small, plus The network situation is more complicated, and congestion is more common. Therefore, in these network environments, the main limiting factors of TCP rate are bandwidth, packet loss rate, etc. Long-fertilizer pipelines are generally not common. They are mostly seen in private line networks used by some units. The main limiting factor in the rate of these networks is the window size. This is also the reason why traditional TCP cannot make full use of bandwidth in these network environments (because of the traditional The TCP window size is expressed in 2 bytes, so the maximum is only 65535 (without considering the window expansion option). In addition to the dedicated line network, with the development of network hardware technology, the emergence of 10 Gigabit switches may also appear in the local area network. When the bandwidth-delay product is large.

The difference between TCP and UDP

TCP is connection-oriented, UDP is connectionless;

UDP does not need to establish a connection before sending data

TCP is reliable, UDP is unreliable;

After the UDP receiver receives the message, it does not need to give any confirmation

TCP only supports point-to-point communication, UDP supports one-to-one, one-to-many, many-to-one, and many-to-many;
TCP is byte-oriented, UDP is message-oriented;

Byte-oriented flow means that when sending data, the unit is byte. A data packet can be divided into several groups for transmission, while a UDP packet can only be sent at a time.

TCP has a congestion control mechanism, but UDP does not. Congestion in the network will not reduce the sending rate of the source host, which is very important for some real-time applications, such as media communications and games;
TCP header overhead (20 bytes) is larger than UDP header overhead (8 bytes)
UDP hosts do not need to maintain a complicated connection state table

When to choose TCP and when to choose UDP?

For some situations with high real-time requirements, choose UDP, such as games, media communications, and real-time video streaming (live), even if transmission errors occur, it can be tolerated; in most other cases, HTTP uses TCP because transmission is required The content is reliable, no loss

Can HTTP use UDP?

HTTP cannot use UDP, HTTP needs to be based on a reliable transmission protocol, and UDP is unreliable

Note: http 3.0 is implemented using udp

https://wikipedia.org/wiki/HTTP/3

The difference between connection-oriented and connectionless

Connectionless network service (datagram service)-connection-oriented network service (virtual circuit service)

Virtual circuit service: first establish a connection, all data packets go through the same path, and the quality of service is better guaranteed;

Datagram service: Each data contains the destination address, and the data routing is independent of each other (the path may change); the network does its best to deliver the data, but it does not guarantee that it will not be lost, sequenced, or delivered within the time limit; when network congestion occurs , Some packets may be discarded;

How does TCP ensure the reliability of transmission

Data packet verification
Reorder out-of-sequence packets (TCP packets have sequence numbers)
Discard duplicate data
Response mechanism: After the receiver receives the data, it will send an acknowledgement (usually delayed by a fraction of a second);
Timeout retransmission: After the sender sends the data, it starts a timer. If the receiver does not receive the confirmation over time, the data will be retransmitted;
Flow control: to ensure that the receiver can receive the sender's data without buffer overflow

What is the difference between HTTP and HTTPS?

Different ports: HTTP uses port 80 and HTTPS uses port 443;
HTTP (Hypertext Transfer Protocol) information is transmitted in plain text, HTTPS runs on SSL (Secure Socket Layer), adds encryption and authentication mechanisms, and is more secure;
HTTPS will bring greater CPU and memory overhead due to encryption and decryption;
HTTPS communication requires a certificate, which generally needs to be purchased from a certificate authority (CA)

Https connection process?

The client sends a request to the server, and at the same time sends a set of encryption rules supported by the client (including symmetric encryption, asymmetric encryption, and digest algorithms);
The server selects a set of encryption algorithms and HASH algorithms, and sends its identity information back to the browser in the form of a certificate. The certificate contains the website address, the encrypted public key (used for asymmetric encryption), and the issuing authority of the certificate (the private key in the certificate can only be used for decryption on the server side);
The client verifies the legitimacy of the server, including: whether the certificate has expired, whether the CA is reliable, whether the public key of the issuer certificate can correctly unlock the "issuer's digital signature" of the server certificate, and whether the domain name on the server certificate is the same as that of the server Matches the actual domain name;
If the certificate is trusted, or the user receives an untrusted certificate, the browser will generate a random key (for symmetric algorithms), and encrypt it with the public key provided by the server (using an asymmetric algorithm for encryption) Key encryption); Use the Hash algorithm to calculate the digest of the handshake message, and encrypt the digest with the previously generated key (symmetric algorithm); send the encrypted random key and the digest to the server together;
The server uses its own private key to decrypt and obtains the symmetrically encrypted key, uses this key to decrypt the Hash digest value, and verifies whether the handshake message is consistent; if they are the same, the server uses the symmetrically encrypted key to encrypt the handshake message and send it to the browser Device
The browser decrypts and verifies the digest. If they are consistent, the handshake ends. All subsequent data transmissions are encrypted with a symmetric encryption key

Summary: The asymmetric encryption algorithm is used to encrypt the generated password during the handshake process; the symmetric encryption algorithm is used to encrypt the real transmitted data; the HASH algorithm is used to verify the integrity of the data.

Enter www.google.com, how to become https://www.google.com, how to determine whether to use HTTP or HTTPS?

One is the original 302 redirect. The server redirects all HTTp traffic to HTTPS. But there is a loophole in this, that is, the middleman may hijack the site the first time it visits the site.

The solution is to introduce the HSTS mechanism, and the user's browser is forced to use HTTPS when visiting the site.

When connecting via HTTPS, how can I be sure that the received packet is from the server (man-in-the-middle attack)?

What is symmetric encryption and asymmetric encryption?

What is the difference?

-Symmetric encryption: the same key is used for encryption and decryption. Such as: DES, RC2, RC4

-Asymmetric encryption: Two keys are required: a public key and a private key. If you encrypt with a public key, you need a private key to decrypt it. Such as: RSA

-Difference: Symmetric encryption is faster and is usually used to encrypt large amounts of data; asymmetric encryption is more secure (no need to transmit private keys)

Principles of digital signature and message digest

-The sender A uses the private key to sign, and the receiver B uses the public key to verify the signature. Because no one except A has the private key, B believes that the signature comes from A. A cannot be denied, and B cannot forge messages.

-Digest algorithm: MD5, SHA

The difference between GET and POST?

GET is idempotent, that is, read the same resource, always get the same data, POST is not idempotent;
GET is generally used to obtain resources from the server, and POST may change the resources on the server;
In the form of the request: the data of the GET request is appended to the URL and in the HTTP request header; the data of the POST request is in the request body;
Security: GET requests can be cached, saved, and saved to historical records, and the request data appears in the URL in plain text. POST parameters will not be saved, and the security is relatively high;
GET only allows ASCII characters, POST has no requirements on the data type, and binary data is also allowed;
The length of GET is limited (operating system or browser), and the size of POST data is unlimited

Session is the server-side state-keeping scheme, and Cookie is the client-side state-keeping scheme

The cookie is stored locally on the client, and the client submits the cookie when it requests the server; the session is stored on the server, and the status can be checked by retrieving the Sessionid. Cookie can be used to save Sessionid. If Cookie is disabled, URL rewriting mechanism can be used (save session ID in URL).

The process from entering the URL to getting the page (the more detailed the better)?

The browser queries DNS to obtain the IP address corresponding to the domain name: the specific process includes the browser searches its own DNS cache, searches the DNS cache of the operating system, reads the local Host file and queries the local DNS server, etc. For query to the local DNS server, if the domain name to be queried is contained in the local configuration zone resource, the resolution result is returned to the client to complete the domain name resolution (this resolution is authoritative); if the domain name to be queried is not zoned by the local DNS server Resolve, but the server has cached the URL mapping relationship, then call the IP address mapping to complete the domain name resolution (this resolution is not authoritative). If the local domain name server does not cache the URL mapping relationship, it will initiate a recursive query or iterative query according to its settings;
After the browser obtains the IP address corresponding to the domain name, the browser requests the server to establish a link and initiates a three-way handshake;
After the TCP/IP link is established, the browser sends an HTTP request to the server;
The server receives the request, maps it to a specific request processor for processing according to the path parameters, and returns the processing result and the corresponding view to the browser;
The browser parses and renders the view. If it encounters references to static resources such as js files, css files, and pictures, repeat the above steps and request these resources from the server;
The browser renders the page according to the requested resources and data, and finally presents a complete page to the user.

What are the common status codes for HTTP requests?

2xx status code: the operation is successful. 200 OK
3xx status code: redirection. 301 permanent redirect; 302 temporary redirect
4xx status code: client error. 400 Bad Request; 401 Unauthorized; 403 Forbidden; 404 Not Found;
5xx status code: server error. 500 server internal error; 501 service unavailable

What is RIP (Routing Information Protocol, Distance Vector Routing Protocol)?

What is the algorithm?

Each router maintains a table to record the "hop count" from the router to other networks. The hop count from the router to the network directly connected to it is 1, and the hop count is increased by 1 for each additional router. When the table is updated, the number of hops Neighboring routers exchange routing information; routers allow a path to contain up to 15 routers, and if the hop count is 16, it is unreachable. When delivering datagrams, the route with the shortest distance is preferred.

-Simple implementation and low overhead

-As the network scale expands, the overhead will increase;

-The maximum distance is 15, which limits the scale of the network;

-When the network fails, it will take a long time to pass this information to all routers

Computer Network Architecture

-Physical, Data Link, Network, Transport, Application

-Application layer: Common protocols:

-FTP (port 21): file transfer protocol

-SSH (port 22): remote login

-TELNET (port 23): remote login

-SMTP (port 25): send mail

-POP3 (Port 110): Receive mail

-HTTP (Port 80): Hypertext Transfer Protocol

-DNS (port 53): running on UDP, domain name resolution service

-Transport layer: TCP/UDP

-Network layer: IP, ARP, NAT, RIP...

-Router network layer, addressing according to IP address;

-Switch data link layer, addressing according to MAC address

Classification of IP addresses?

The router only forwards the packet according to the network number net-id. When the packet reaches the router of the destination network, it delivers the packet to the host according to the host number host-id; all hosts on the same network have the same network number.

What is subnetting?

Borrow several bits from the host number host-id as the subnet number subnet-id; subnet mask: the network number and subnet number are both 1, and the host number is 0; the datagram still finds the destination network according to the network number and sends it to Routers, routers find the destination subnet according to the network number and subnet number: AND the subnet mask and the target address bit by bit. If the result is a network address of a certain subnet, it will be sent to that subnet.

What is the ARP protocol (Address Resolution Protocol)?

ARP protocol completes the mapping between IP address and physical address. Each host has an ARP cache, which contains a mapping table from the IP address to the hardware address of each host and router on the local area network where the ** is located. When the source host wants to send a data packet to the destination host, it will first check whether there is the MAC address of the destination host in its ARP cache. If there is, it will directly send the data packet to this MAC address, if not, it will send to ** The local area network** initiates an ARP request broadcast packet (when sending its own ARP request, it will also bring its own IP address to the hardware address mapping), and the host receiving the request will check its own IP address and the destination host’s Whether the IP addresses are the same, if they are the same, first save the source host's mapping to its own ARP cache, and then send an ARP response packet to the source host. After the source host receives the response packet, it first adds the mapping between the destination host's IP address and the MAC address before data transmission. If the source host has not received a response, it means that the ARP query has failed.

If the host you are looking for is not on the same LAN as the source host, you need to find the hardware address of a router on this LAN through ARP, then send the packet to this router, and let this router forward the packet to the next one. The internet. The rest of the work is done by the next network.

What is NAT (Network Address Translation, Network Address Translation)?

It is used to solve the problem that the host in the intranet needs to communicate with the host on the Internet. The NAT router converts the host's local IP address to a global IP address, which is divided into static conversion (the global IP address obtained by the conversion is fixed) and dynamic NAT conversion.