By now we know that the computer understands only OFF or ON (0 or 1) state. This corresponds to the Binary Number System, which has been explained previously in another article (click here to read). Since a computer can only understand 0 or 1, all data on a computer must be stored in that format only. Thus, understanding how data storage works on a computer becomes necessary.
Let us now understand that. From the article on Binary Number System, we already know that 23 in decimal is equivalent to 10111 in binary (i.e. (23)10 = (10111)2). Now this number will be stored as 10111 in a computer. We need to learn some more logical concepts before we can move onto how actually data is stored physically.
A bit is the smallest value that can be stored in a computer. It can have only 2 of the already familiar values, i.e. 0 or 1.
Since 23 is stored as 10111 (i.e. a total of five bits), now if we had to store the number 23 two times, it could very well be stored as 1011110111 (i.e. 10111 and 10111 one after the other – a total of ten bits). But then how would the computer know whether it is 2 numbers of 5 bits each (10111 and 10111 one after the other) or 5 numbers of 2 bits each (10,11,11,01,11 one after the other) or 4 numbers of varying bit length (10,1,1110,111).
To overcome such a situation, the easiest and best way to fix how the numbers are stored in binary. We could fix that every number will be stored as 8 bits. Thus, 23 will be stored as 00010111. As we see, if the size of bits in a number is less than 8, then we can add required number of zeroes before the number to make it of 8-bit length. By doing this, the value of the number remains the same. 8 bit represented together are called as byte. Thus we can say that every number can be stored as a byte. Now, if we have to store the number 23 two times, it will be stored as 0001011100010111 (i.e. 00010111 and 00010111 one after the other). Since the computer will know that the data is stored in 8-bit length or in bytes, it will read 8-bits at a time for accessing data and hence be sure that it is the number 23 stored two times.
Since every bit can have only 2 values, i.e. 0 or 1, we need to understand how much data can be stored in a byte. Let us see the possible values for few bit-lengths:
- One bit – With one bit, we can store only 2 numbers, i.e. either 0 or 1.
- Two bits – With two bits, we can store only 4 numbers, i.e. either 00,01,10,11.
- Three bits – With three bits, we can store only 8 numbers, i.e. 000,001,010,011,100,101,110,111.
Thus, we see that for a a bit length of n, we can store only 2n numbers using those bits. The value of those numbers, as we see above, goes from 0 to 2n-1 Thus, with 8 bits, we can store a number with maximum value of 28-1, i.e. we can store numbers from 0 to 255. For storing a number of value more than 255, we need to use more bits. Then also, we cannot use bit-length as per our choice, else we face the same problem that was described above for storing the number 23 two times. Thus, we need standardization for the same. Thus, the most common standards for storing numbers are using either 8 bits (1 byte), 16 bits (2 bytes), 32 bits (4 bytes), 64 bits (8 bytes). Therefore:
- With 2 bytes, we can store numbers up to 2^16, i.e. 65536.
- With 4 bytes, we can store numbers up to 2^32, i.e. 4294967296.
- With 8 bytes, we can store numbers up to 2^64, i.e. 18446744073709552000.
Till now we have understood how numbers are stored on a computer. But what about storing alphabets and other special characters like !@#$%^&*? Well, the best and the easiest way is to assign a number to every possible character including alphabets, numbers and other special characters. So, whenever we have to store a character, we can simply store the corresponding number assigned to it. This assignment of numbers to characters is called Character Encoding. One such encoding is named American Standard Code for Information Interchange or ASCII. It uses 8 bit (i.e. one byte) numbers to assign values to characters. Since it has only 8 bits, a maximum of 256 values (0 to 255) can be used to represent characters. The equivalent value of the characters in binary is then saved onto the hard disk while storing data. ASCII assigns the following numbers:
|Characters||Numbers Assigned by ASCII|
|A to Z||65 to 90|
|a to z||97 to 122|
|0 to 9||48 to 57|
|Other characters||Rest of the numbers|
For a detailed look at the total ASCII character set, please go to the end of this article and click on the Wikipedia link for ASCII.
Thus, if we have to save “The Cyber Cops”, it can be represented as “84 104 101 32 67 121 98 101 114 32 67 111 112 115” in ASCII, which is then saved as “01010100 01101000 01100101 00100000 01000011 01111001 01100010 01100101 01110010 00100000 01000011 01101111 01110000 01110011” in the computer.
Data Storage on a Hard Disk
Now, let us come to how data is actually stored physically on a computer. Since data is usually stored onto a hard disk in a computer, let us understand it in the terms of hard disk only. The picture below shows the internal structures of a hard drive.
A typical hard disk has the following parts:
- A central spindle – It rotates the hard disk at a high speed.
- Magnetic platter – It stores the actual data in magnetic form (i.e. 0 or 1).
- Actuator – It controls the movement of the actuator arm which reads the data from the platter.
- Actuator arm – It moves back and forth the magnetic platter for data to be read or written by the read-write head.
- Read-write head – It reads the bits (0 or 1) stored on the magnetic platter.
- Spindle– It allows the actuator arms to move back and forth.
- Plug connection – It connects the hard disk to the computer’s motherboard to be used by the computer.
- Circuit board (below the hard disk) – It allows the flow of data to and from the platter through the read-write head.
- Connector (orange cable in picture below) – It allows movement of data from the circuit board to the read-write head.
The magnetic platter is divided into millions and billions of tiny areas, and each of such area represents a bit. Each one of those areas can be independently magnetized or demagnetized to store a bit – i.e. either 1 or 0. Magnetism helps us retain data even when the power is switched off because each of these areas retains their magnetism even when power is switched off. Later on, the data is retrieved from the platter by the read-write head by using the actuator arm, which in turn is mounted on the actuator. A hard disk may consist of multiple platter, one over the other.
The following terms are required to understand storage of data on a hard disk:
- Track – The hard disk platter is divided into concentric bands called tracks. The actual data is stored onto such tracks. Track numbers start at 0 (the outermost track) and increase as we move in towards the central spindle.
- Sector – It is a segment of a track. Each track has the same number of sectors. This means that the sectors are packed much closer together on tracks near the center of the disk.
- Cluster – It is a collection of sector and is the smallest unit for storing files on the hard disk.
- Cylinder – A cylinder consists of the set of tracks that are at the same head position on the disk in different platters.
Now, files are stored on the hard disk by the Operating System based on the File System‘s file structure (e.g. FAT, FAT32, NTFS, ext4, ZFS, etc). Each file systems has a different way of storing files. But the data in those files is stored in clusters only. Supposing the cluster size on a hard disk is 512 bytes, let us understand how a file is stored in the hard disk.
Let us assume that the user wants to save 2 files – first one of size 200 bytes and the second one of 600 bytes. Now, since the smallest unit for storing files on hard disk is a cluster, it means that it will not be possible to store 2 files on the same cluster. Thus, first file may be stored in cluster 1. Now since the second file is of 600 bytes, two clusters are needed to store the file. Thus, the second file will be stored in clusters 2 and 3. This unused space in a cluster cannot be used used anymore and is called “Slack Space”.
An associated phenomenon with file storage is called Disk Fragmentation. Whenever you store data/files on your hard disk, they are saved in in the free space that exists on the disk. Now, usually the contents of a file are stored sequentially so as to make optimal use of the hard disk. But if you delete a few files, there may be some files which may lie beyond the newly freed space obtained by deleting some files. The Operating System and various softwares installed on the computer also generate various temporary files which are deleted often. Now if you try to save a file onto the said free space, but the size of the file is bigger than the free space available, the rest of the file will be saved in another free space location. Thus, the contents of the file are not stored sequentially, i.e. they are fragmented. This is called File Fragmentation. It slows down the access time for files since the hard disk has to read at multiple locations to access one file. The process by which the file fragments are put together into a contiguous space is called Defragmentation. Defragmentation often involves shifting one or more files across the clusters.
Some other terminology:
- 1 Kilobyte = 1000 Bytes
- 1 Megabyte = 1000 Kilobytes
- 1 Gigabyte = 1000 Megabytes
- 1 Terabyte = 1000 Gigabytes
It is to be noted that:
- 1 KiB (Kibibyte) = 1,024 B (Bytes) (210 Bytes)
- 1 kb (Kilobit) = 125 B (Bytes) (103 Bits ÷ (8 bits / byte) = 125 B)
- 1 kB (Kilobyte) = 1,000 B (Bytes) (103 Bytes)
- 1 MiB (Mebibyte) = 1,048,576 B (Bytes) (220 Bytes)
- 1 Mb (Megabit) = 125,000 B (Bytes) (106 Bits ÷ (8 bits / byte) = 125,000 B)
- 1 MB (Megabyte) = 1,000,000 B (Bytes) (106 Bytes)
For Wikipedia entry on ASCII, click here.
For more posts on Computer basics, click here.
For more posts in The Cyber Cops project, click here.