Elements of cryptography: Hashing algorithms

Don’t be scared, keep reading!

I decided to start the article in this way since reading on the same title words such as cryptography and hashing algorithms can be intimidating.

My aim, in this series of articles dedicated to cryptography, is to explain in a simple and comprehensible way (I won’t use a single math formula, I swear) the functioning and the application of the main cryptographic techniques. At the end you should be able at least to explain your clients why they don’t have to be scared to use their credit card to make online purchases. (under certain conditions).

The topics we will deal with in this series are the following:

  • Hashing algorithms.
  • Symmetric cryptography.
  • Asymmetric cryptography.

Let’s start with the first topic of the series, that is, hashing algorithms.

Putting strings in the grinder

In English to hash means to grind meat – to mince, practically chop into pieces. In fact, hashing algorithms exactly do this. They take a string of random length and produce a fixed-length string (which varies according to the type of algorithm) from which it is no longer possible to go back to the original string. But let’s see what this means in practice.

In order to do this we will make use of a well-known and widely used algorithm (albeit outdated for some) that is MD5. The principles we will see below are valid for all hashing algorithms anyway.

MD5 produces a 128-bit string (32 characters) starting from a string of arbitrary length.

If we pass “hello” to MD5 we will obtain this hash

6e6bc4e49dd477ebc98ef4046c067b5f

If you want to take a trial, PHP puts at your disposal the md5() function whose use is very simple:

echo md5("hello");

Let’s try now to pass the entire content of this article to MD5, the result we will obtain is the following:

fd2f23f583ab1ebf44952a744f65bd6e

Always a string of 32 characters.

And now let’s see something very interesting: let’s pass the content of that same article, but eliminating the first comma from the text.

The result will be the following:

cae3c5fd98e04344be0cd2ccecde0a44

This takes us to the first important application of hashing algorithms. A completely different hash only for having made an imperceptible modification in a very long string.

Checksum: avoiding surprises

It must surely have occurred to you, at the download of a file, to find a similar line written

md5 checksum: cae3c5fd98e04344be0cd2ccecde0a44

Well, what’s its function then?

Pretty simple. Whoever makes the file available for download, indicates also the hash produced by the application of MD5 on such file.

Once you download the file, you can pass it again to MD5 and  the resulting hash should be identical. If it is not, the reasons for that might be two:

  1. While downloading, a part of the file was lost or corrupted (a single bit is enough as we said earlier).
  2. While downloading, the file was intercepted and modified (maybe with the adding of malicious code).

In any case, the application of this technique protects us from running unnecessary risks verifying the integrity of the file we have downloaded.

You can find many applications in Internet to calculate the hash MD5 of a file, this one for example.

Storing passwords correctly

Unfortunately, I often notice small homemade applications (even professional at times (!)) in which user passwords are stored into the database as they are, clearly visible. This is a wrong procedure and one absolutely to be avoided. There are basically two reasons for this:

  1. The eventual violation of the database would automatically entail the violation of passwords of all our users. In this way the ill-intentioned person could log in the application to pass off as a user (even administrator, why not). Besides, take into consideration that most people use the same password for everything. Therefore, this constitutes a serious violation of privacy for our users.
  2. Always on privacy grounds, it is not fair that whoever has access to the database (even legally) can see the passwords of users.

That’s why, it is necessary to store passwords passing them to a hashing algorithm. So, before storing a password in the database, I will pass it for example to MD5.

When the user will insert his log in details, I will pass the password that was inserted to MD5 and I will compare the resulting string with that contained in the database.

Like this, even if I knew what is contained in the password field of the database, it would serve to nothing. In fact, I could not come up with the original password and I could not use that string as a password, given that at the moment of logging in it will be passed to MD5 thus producing a different string.

Limits and remedies

At the beginning of this article we said that MD5 is considered as outdated. There are two main reasons on which this statement is based:

Weak passwords

We know that users often use inappropriate passwords, too common and ordinary. And we also know that it’s easy to find on the web great databases containing the associations between string and hash MD5.

Thus, if my password is “pippo”, its hash will be:

0c88028bf3aa6a6a143ed846f2be1ea4

Now try to insert this hash in one of the many databases, this one for example.

See? Way too easy!

A possible remedy to this flaw is rendering the user passwords a lot more complex, and it’s not necessary that they know about it. How?

We define two constants in our configuration file:

define(“PRE_PASSWORD”, “#$[[a56?][*{00l45%!@wrv7”);
define(“POST_PASSWORD”, “Nel mezzo del cammin di nostra vita mi ritrovai in una selva oscura, che la diritta via era smarrita.  Ahi quanto a dir qual era è cosa dura”);

Now, when we save the password, we won't just pass it to MD5, but we will also modify it. Thus we won't use this procedure:

$PasswordtobeStored = md5($password);

But:

$Passwordtobestored = md5(PRE_PASSWORD . $password . POST_PASSWORD);

In this way the string “pippo” will become:

#$[[a56?][*{00l45%!@wrv7pippoNel mezzo del cammin di nostra vita mi ritrovai in una selva oscura, che la diritta via era smarrita. Ahi quanto a dir qual era è cosa dura.

That passed to MD5 will give this hash:

8b1cc7082c7706a4929c8fc26e92a09e

I am curious to see if you can find it in any database...

Obviously even at log in, when the user will  insert “pippo”, before comparing it to the string stored in the database, we have to add PRE_PASSWORD and POST_PASSWORD and pass it to MD5.

Collision resistance

A collision in cryptography occurs when a hashing algorithm produces the same hash starting from two different strings.

We saw that MD5 always produces a 128-bit string. Thus the possible hashes of MD5 are 2128.

It's an astronomical number, but however it's a finite number. The possible incoming strings are infinite instead as there is no limit to the length they might have.

We can conclude that theoretically there is an infinite number of collisions.

That's why you hear talking about collision resistance; That is as you can see, unless limiting the length of strings in the entrance point, there will be collisions, it should be at least impossible to detect them. Thus, a hashing algorithm, to be solid, has to be resistant to collisions.

Therefore it should not be possible to detect a procedure (or if detectable it should be computationally absurd) that can put into a relation two strings which produce the same hash.

Practically if “pippo” produces 0c88028bf3aa6a6a143ed846f2be1ea4, is it possible to establish a procedure thanks to which I can find another string that gives me the same hash?

Theoretically yes. We know the MD5 algorithm and we can study the peculiarities of the string “pippo” that have led to that hash.

But if the procedure we have detected requires the use of ten super calculators for three years, then this procedure is computationally absurd and thus practically useless.

For many, due to collision resistance, MD5 is considered as outdated and my advice is to use algorithms of the SHA family.

Conclusion

In this article we started to get acquainted with the fascinating world of cryptography. I hope I have treated this topic simply enough given that this subject is pretty difficult. In the next article we'll see the principles of symmetric cryptography or, in a few words, how to exchange secret messages.

And you, do you use hashing algorithms to store passwords? Which algorithms do you consider as the most secure? Do you constantly verify the hashing of files you download (when available)?

Master per Web Designer Freelance
In tutti questi anni abbiamo ricevuto centinaia di richieste di approfondimento sulle numerose tematiche del web design vissuto da freelance. Le abbiamo affrontate volta per volta. Ma ci siamo resi conto che era necessario fare qualcosa di più. Ecco perché è nato One Year Together, un vero e proprio master per web designer freelance che apre finalmente le porte al mondo del lavoro.
Scopri One Year Together »
[pdf]Scarica articolo in PDF[/pdf]
Tags: , ,

The Author

Maurizio is married to the triad PHP - MySql - Apache and, not enough, he has a lover called jQuery. He has a blog where he tries to describe in detail all of "his lovers". His real specialty is the realization of large business application, altough he never refuses the commitment of a website.

Author's web site | Other articles written by

Related Posts

You may be interested in the following articles:

2 comments

Trackback e pingback

  1. Tweets that mention Elements of cryptography: Hashing algorithms | Your Inspiration Web -- Topsy.com
    [...] This post was mentioned on Twitter by soshableweb and V. Tavares (E-Goi), Tom Bangham. Tom Bangham said: Elements of …
  2. Elements of cryptography: The symmetrical cryptography | Your Inspiration Web
    [...] new here, you might want to subscribe to the RSS feed for updates on this topic. …

Leave a Reply

Current month ye@r day *