A checksum is a small-sized block of data that is used to identify faults in files introduced during transmission or storage. Checksums are frequently used to verify data integrity but are not used to validate data validity. In simple terms, it is just a sequence of numbers and letters used to check data for errors.
Now you may be wondering: what is the point of a checksum? Well, it is the fact that they are used to detect download errors prior to transmission or in-transit which can help detect whether a file has been tampered with. In this article, we will explore checksums, different checksum algorithms, and why you should implement the use of them when downloading files.
Brief Overview: What Are Checksums?
As stated, checksums is a small-sized block of data that is used to identify faults in files introduced during transmission or storage. They are essentially digital fingerprints that are created from a series of bytes. The contents of a file are the most typical example of this series of bytes. Checksums are normally generated for whole files and can also be generated at a more detailed level, such as for individual frames in a movie or data recorded in a database. A checksum is always unique meaning that any alteration to the checksum, no matter how minor, will cause the checksum value to change completely. A checksum value is the result of performing a cryptographic hash function algorithm to generate it. In the next part, we will go through the various hashing methods in further detail.
What Are the Different Checksum Algorithms?
You generate a checksum by running a program that runs a file through an algorithm. The algorithm uses a cryptographic hash function that takes an input, whether it be a file or data or a website, and produces a unique, fixed string of characters, or a hash. No two different pieces of data can ever produce the same hash value. In fact, even the tiniest tweak to a piece of data will result in different hash values. There are many different algorithms out there but the most popular ones are sha-256, md5, and sha-1. Being that almost all cryptographic hashing functions result with the same output, meaning that you will receive a hashed output no matter which function you use, what are the differences between them and why should you choose one over the other? Well, let's talk about that a little bit here.
SHA-256:
SHA 256 belongs to the SHA 2 algorithm family, with SHA standing for Secure Hash Algorithm. SHA-256 is a patented cryptographic hash algorithm that generates a 256-bit result. SHA-256 is used in several common authentication and encryption protocols, such as SSL, TLS, and SSH. SHA-256 is one of the most secure hash methods available. It is nearly impossible to recreate the original data from the hash value, and having two hashes with the same value makes it even more difficult. Similarly, SHA-256 makes it simple to trace even minor changes to items. A little modification to the original data alters the hash value so much that it is not obvious that the new hash value is generated from identical data.
MD5:
To create a signature that can be compared to a file or data's original signature, MD5 runs whole files through a mathematical hashing algorithm. In this manner, it will be possible to verify that a file received matches the one that was transmitted, guaranteeing that the correct files are delivered to their intended locations. The data is transformed into a string of 32 characters using the MD5 hashing algorithm. Previously, we spoke about how it is difficult to achieve two hashes with the same value using SHA-256. With MD5, you will notice that it is more than likely that this will occur, otherwise known as collisions. Since SHA produces a larger string of hexadecimal characters, it solves MD5's shortcomings of causing collisions and producing shorter hashes, making it simple to distinguish between the two kinds. However, MD5 is sufficient when security isn't a concern and is still a useful solution for data verification.
SHA-1:
SHA 1 belongs to the SHA family, being one of the four main algorithms. It is a cryptographic hash function that accepts an input and returns a hash value of 160 bits. It is thus commonly represented as a 40-digit hexadecimal number. SHA-1 attempted to improve security by addressing a flaw discovered in SHA-0; nevertheless, SHA-1 was also determined to be vulnerable. In some cases, SHA-1 was found giving two different hash values for the same PDF file, with no changes made to it of course. Despite this, SHA-1 is still used for credit card verification amongst other things still making it somewhat viable. However, when a cryptographic function is able to give two different hash values for one singular file, it is technically considered cryptographically broken and shouldn’t really be used but for medical situations, it is still widely accepted.
What Does This Mean?
After seeing a few examples of cryptographic functions and how they essentially differ from each other, you can make the distinction of which one to use. Depending on whether you are a company/business that wants to upload downloadable files on your website or something along those lines, you might want to consider using a more secure algorithm if security is your main concern. On the other hand, a lot of companies today still use md5 hashing to verify files so it is all personal preference and it depends on your situation. For users who download files daily and know that these files have a checksum, you will have to use the same algorithm the distributors used to hash that file to confirm the integrity of said file. For example, if you download a file from a website and they use SHA-256 to create a checksum of the file, you will also have to use SHA-256 to confirm that the checksum is correct. Keep following with us below as we show you how to do that.
Doing a Checksum: Rocky Linux 9
For this example, I will be verifying a Rocky Linux 9 download using the checksum that they have provided on their website. To follow along, head over to Download Rocky Linux as shown below:
After heading to the download page, right click on the appropriate ISO you’d like to download matching the architecture of your system. Click on copy link as shown below:
After you copy the link, lets open our terminal and run the command shown below:
wget |
After running the command above, you should have the minimal.iso file in the same directory in which you downloaded the file. After downloading the file, we can then verify its checksum by visiting the same Rocky Linux download page and copying the link for the Checksum. After you copy the link for the checksum, run the command below:
wget |
When downloading the checksum, make sure you are copying the link for the correct one matching the same architecture. When you download the checksum, we should have the iso file as well as the checksum file in the same directory. You can confirm that by running the ls command within the terminal. After confirming that, we can verify the checksum in two ways as shown below:
1st Way:
Run the command shown below:
sha256sum Rocky-9.0-x86_64-minimal.iso |
What the command does above is it generates a sha256 hash for the Rocky Linux 9 iso we just downloaded. We can use this and check it against the list of checksums we downloaded from the Rocky Linux website. After doing so, we should see something as shown below:
Now to check this against the known checksum, we can just use the cat command to view the contents of the checksum file we downloaded. Run the following command:
cat CHECKSUM |
After doing so, we should see something like this:
As shown above, we have a list of quite a few hashes. If you look a little closer, you can see that the third hash matches the name of our download as well as the hash for said iso file. This is one of the ways to verify the checksum. We will show you the other way below.
2nd Way
The second way to verify the checksum is with one simple command shown below:
sha256sum -c CHECKSUM --ignore-missing |
For the command above to work, we have to have the CHECKSUM file as well as the iso file in the same directory. Essentially, what this command does is that it runs the CHECKSUM file against our iso file trying to confirm its sha256 hash. If it is successful, then you should receive a message as shown below:
If the checksum values do not match, then it is most likely that something happened during transit or during the download process. A checksum does not check for integrity of the file but rather just checks whether there are changes. The changes could be nothing major but it is best practice to assume it could be a security issue and that it is not safe to open or run that download.
Why Do You Need Checksums?
Checksums can have many uses. To list a few:
- Detection of data corruption or loss when data is stored, such as on disks
- Detection of data corruption or loss during data transmission.
- Detection of purposeful and harmful efforts to modify the contents of data
Checksums can help protect against accidental data corruption or loss, as well as deliberate data tampering or deletion, which can occur as a result of cyber-attacks or deliberate attempts by individuals to alter data without detection, and they can help keep track of the integrity of your files and data. Whether you are using Ubuntu or Rocky Linux or whatever it may be, it is always good to create checksums for files you might have or files you download to keep track and to avoid potentially huge security issues. Additionally, checksums are generally small files and are easy to store so if storage is ever an issue, you won’t have to worry too much about storing checksum files and them taking space.
Final Thoughts
If you are one who generally values security, and you care about the integrity of your files as well as the integrity of the files you download from the internet, then checksums should be of importance to you. Whether you are downloading security tools for linux, new linux distributions, or possibly even development tools, it is important to verify that what you are downloading is what it says it is. There are many cases where files and downloads have been tampered with and it all could have been avoided if a checksum was done. Nowadays, users tend to download things without worrying about where it comes from or without worrying about what could possibly happen if that file were to be tampered with. As a best security practice, it is always best to use checksums, especially for linux servers since admins might find themselves downloading a lot of files from the web. Thank you for following along with this article and we hope you can now implement the functionality of using checksums!