Internet is a computer network consisting of all the computers in the world which are interconnected to each other by way of certain protocols.
A protocol is a set of rules defining the way in which data is stored or transmitted.
First of all, we need to understand what a data packet is. A packet is a unit of data which travels over a network. When you provide data to the computer to be sent over a network, the data cannot be sent all at once. Therefore the data is fragmented or broken down into packets by the concerned protocol and then those packets are transmitted over the network. A packet usually contains the following:
- actual data
- details of sender
- details of receiver
- any other options related to the transmission of data
These ingredients except the actual data is bundled together into what is usually called “header”. Different kinds of packets have different kinds of details in their header and handle data in different ways.
Now, to understand how the internet works, we need to know about a few protocols as follows.
Transmission Control Protocol (TCP)
TCP is a connection oriented protocol which helps two devices to communicate over a network. Here “Connection oriented” means that there is a proper connection between the two devices to help us have reliable connection over an unreliable network. For TCP, a connection between the two devices is required.
TCP works by ensuring the establishment of a reliable connection between the two devices. Once this connection is established, the actual data transfer is done. This data is transferred by fragmenting the data into small packets and then sending those packets over the network using TCP. The packets are reassembled into original data at the receiver end. TCP also ensures that the data is error-free and re-transmits the data if there has been some error in transmission. TCP also ensures that the destination is not overwhelmed by the data sent from receiver by limiting the amount of data that is sent to the destination.
When the transfer is done, the connection is properly closed as per the specifications of the protocol.
PORTS: This TCP connection is done between Device A and Device B by using something called “ports” or “internet sockets”. The usability of port can be understood by an analogy to the ports used by ships. When something needs to be sent via ships, the material will need to be sent via ports. Thus a ship starts from a particular port and reaches its destination port. Similarly, when data needs to be sent over a computer network, it is sent via virtual addresses on the device called “ports”. Different protocols use different port numbers. Port number 0-65535 exist on TCP indicating that a device can theoretically accept 65536 connections simultaneously.
For a more detailed post on TCP, click here.
User Datagram Protocol (UDP)
Whereas TCP is a reliable protocol to ensure that data is transmitted reliably between two devices, the UDP is a connectionless protocol. “Connectionless” here indicates that there is no formal connection creation or closing as is there in the TCP. The UDP does not do any error checking or data re-transmission. It simply sends a packet and forgets about it placing trust on the intermediate devices that they will send the packet to the correct destination. There is no mechanism of acknowledgement in UDP. Thus, data loss may occur in UDP. It is because of this reason only that UDP is not used in mission-critical tasks or any programs where data integrity is essential. UDP Communication is again done over ports. UDP ports exist from 0-65535.
Internet Protocol (IP)
IP is a protocol for addressing and routing packets across the internet. It is again a connectionless protocol. Since TCP requires a connection between the two devices, it is easy to imagine TCP connections when the two devices are on the same network and can communicate with each other. But when they are on different networks, they cannot communicate with each other directly. Thus this communication between two devices in different networks (i.e. inter-network communication) is done by the IP. IP defines how packets on the internet move from a source to a destination. The TCP and IP are together responsible for the internet working seamlessly and are together known as TCP/IP.
For a more detailed post on IP, click here.
How do the packets traverse the internet?
The devices on the internet that help a packet move from its source to destination are called routers. Routers receive packets from a device, see its destination and then send the packet on a forward route towards its destination. This happens until the packet reaches its destination. A router does not need to know the location of all IP addresses on the internet. For each packet it receives, it simply needs to know to which neighbor should it send the packet.
How are the devices on the internet addressed?
Devices on the internet are addressed by using something called Internet Protocol Address (IP address in short). An IP address is a set of 4 numbers where each number is between 0-255. All the 4 numbers are written together and separated by a dot, e.g. 18.104.22.168. Such an address is specifically called IP v4 address since this addressing came up in the version 4 of the Internet Protocol. The first two numbers are called Network Identifier (12.0) and the last two are called host identifiers (123.255). Network prefix is the same for all devices that connect to the internet through the same connection. Each device on the internet has an IP address and a router knows the IP address of all the devices connected to it.
Each router comes pre-configured with some routes, i.e. which way to send a particular packet. If it receives a packet for which it is not aware of the router, it asks one of its neighbors for the same and stores it for packets received later on. This way, the router builds up what is called “routing table”. Whereas communication to other devices outside a private network is made by using IP addresses, communication within a network is made by using MAC (Media Access Control) address. This MAC address is unique for every device whether it is in a public network or a private network. MAC addresses are linked to the actual hardware of the computer that do network communication.
A Public IP address refers to IP address of a device that is visible on the internet. Such a device can be directly addressed and accessed over the internet. A Private IP address refers to IP address of a device that is inside a local network and is not visible on the internet. Such a device will communicate with the internet as described below. The IP addresses from 10.0.0.0 to 10.255.255.255, 172.16.0.0 to 172.31.255.255, or 192.168.0 0 to 192.168.255.255 are reserved as private IPs for use on local networks. IP addresses assigned to a device may be assigned manually or automatically by another protocol called Dynamic Host Configuration Protocol or DHCP. To read more about it, click here.
But if you consider the number of IP address combinations possible with the 4 numbers between 0-255, it comes down to 2^32 (i.e 2 raised to the power 32). Thus, theoretically there can be only 2^32 devices on the internet. But the number of devices in the world is far more that that and the internet still functions fine. This is possible by something called Network Address Translation (NAT) wherein a public IP address is assigned to a device and all devices within the local network use that public IP address for connecting with other devices on the internet. NAT modifies the network address information in the packet headers while they are crossing a router. This needs to be done since a router cannot forward packets from one private network to another private network as it can only forward packets from one private network to another router and vice-versa. Thus, a device with NAT will hide the entire local network behind it and all communication from the entire local network will appear to be coming from the device with NAT functionality.
A problem associated with the above mentioned IP v4 is that the address space of the IP v4 addresses is less keeping in view how fast technology is growing and internet usage is expanding. All the approximately 4 billions IP addresses are not enough to handle the number of devices present on the internet today. Thus IP v6 addresses have come up which provide an almost unlimited number of IP addresses. Difference between IP v4 and IP v6 will be discussed in another post.
Since it would be difficult for humans to remember numbers, there are Domain Names which correspond to IP-Addresses (e.g. an IP address of www.google.com is 22.214.171.124). This mapping is always working seamlessly in the background through something called Domain Name System (DNS) which helps computers convert human readable domain names to IP addresses. To read more about it, click here.
In closing, data between two devices over the internet is transmitted by TCP/IP protocol using data packets where the addressing of the devices is done by using IP addresses.
For Wikipedia entry on Internet, click here.
For more posts on Internet, click here.
For more posts in The Cyber Cops project, click here.