Understanding Hash Functions: MD5, SHA-1, and SHA-256
Hash functions are one of the fundamental building blocks of computer security, yet many developers only encounter them as "pick MD5 or SHA-256" without understanding what those choices actually mean. This article explains how hash functions work, what makes them cryptographically useful, and when each algorithm is appropriate.
What Is a Hash Function?
A cryptographic hash function takes an input of any size and produces a fixed-size output โ the hash, digest, or checksum. The key properties:
- Deterministic: The same input always produces the same output
- Fixed output size: Regardless of input length, output length is constant
- Fast to compute: Hashing a file takes milliseconds
- Avalanche effect: A tiny change in input produces a completely different output
- One-way: You cannot derive the original input from the hash
- Collision-resistant: It should be computationally infeasible to find two different inputs with the same hash
Try it yourself: hash the word "hello" and then "hello!" with the Hash Generator and observe how completely different the outputs are.
SHA-256("hello") = 2cf24dba5fb0a30e26e83b2ac5b9e29e1b161e5c1fa7425e73043362938b9824
SHA-256("hello!") = ce06092fb948d9af0b24f7a31b04e08f07fb5e49655e96b2d7e6a55b9832c9a9
No relation between the two outputs, even though the inputs differ by just one character.
The Three Major Algorithms
MD5 (1992)
MD5 produces a 128-bit (16-byte) hash, displayed as a 32-character hex string:
MD5("The quick brown fox") = 9e107d9d372bb6826bd81d3542a419d6
Status: Cryptographically broken. MD5 has known collision vulnerabilities โ researchers can generate two different files with identical MD5 hashes. A famous example: in 2008, a rogue SSL certificate was created by exploiting MD5 collisions, allowing attackers to impersonate any HTTPS website.
When it's still acceptable: Non-security file checksums where you control the data (verifying a file wasn't corrupted in transit, not verifying it wasn't tampered with maliciously). Legacy system compatibility. Anywhere the threat model doesn't involve an adversary who can craft collisions.
When to avoid it: Password hashing, digital signatures, certificate fingerprinting, any security-critical use.
SHA-1 (1995)
SHA-1 produces a 160-bit (20-byte) hash, displayed as a 40-character hex string:
SHA-1("The quick brown fox") = 2fd4e1c67a2d28fced849ee1bb76e7391b93eb12
Status: Deprecated for security use. In 2017, Google's Project Zero demonstrated the first practical SHA-1 collision ("SHAttered"), producing two different PDF files with identical SHA-1 hashes. The attack cost roughly $110,000 in cloud computing at the time โ expensive in 2017, increasingly affordable now.
Major browsers removed SHA-1 certificate support in 2017. Git still uses SHA-1 for its object model, but is migrating to SHA-256.
When it's still acceptable: Git commit IDs (where collision-resistance matters less because input is controlled), legacy checksums for non-adversarial integrity checking, compatibility with very old systems.
When to avoid it: Any new security application. SSL/TLS. Password hashing.
SHA-256 (2001)
SHA-256 is part of the SHA-2 family and produces a 256-bit (32-byte) hash, displayed as a 64-character hex string:
SHA-256("The quick brown fox") = d7a8fbb307d7809469ca9abcb0082e4f8d5651e46d3cdb762d02d0bf37c9e592
Status: Currently secure. No practical attacks exist against SHA-256. It's the current standard for security applications and is expected to remain secure for the foreseeable future.
SHA-384 and SHA-512 are variants with larger output sizes โ SHA-512 uses 64-bit operations and can actually be faster than SHA-256 on 64-bit systems for large inputs, while providing a larger security margin.
Properties That Matter
Collision Resistance
A collision occurs when two different inputs produce the same hash. For a 256-bit hash, a brute-force collision search would require roughly 2^128 operations โ a number so astronomically large it's not feasible with any technology imaginable. The concern with MD5 and SHA-1 isn't brute force but structural weaknesses that allow targeted collision crafting far more efficiently.
Preimage Resistance
Given a hash value, can you find the input? This property ("one-wayness") must hold for password hashing. If you could reverse a hash, stored password hashes would be trivially compromised in a breach.
Second Preimage Resistance
Given an input, can you find a different input with the same hash? This matters for document signing โ you don't want someone to be able to swap out the document you signed for a different one with the same hash.
Practical Uses
File Integrity Verification
When you download software, the developer often provides a SHA-256 checksum. After downloading, you compute the hash of your downloaded file and compare:
# Linux/macOS
sha256sum downloaded-file.iso
# Windows PowerShell
Get-FileHash downloaded-file.iso -Algorithm SHA256
If the hashes match, the file is intact. If they differ, the file was corrupted or tampered with. Use the Hash Generator for quick checksums without the command line.
Password Storage
Never store passwords as plaintext. Storing hashes allows verification without storing the password itself: when a user logs in, hash what they typed and compare to the stored hash.
But general-purpose hash functions (MD5, SHA-256) are too fast for password storage โ a modern GPU can test billions of SHA-256 hashes per second. For passwords, use algorithms specifically designed to be slow: bcrypt, Argon2, or scrypt. These are intentionally computationally expensive to make brute-force attacks impractical.
Message Authentication Codes (HMAC)
A plain hash verifies integrity but not authenticity. Anyone can compute a hash. HMAC (Hash-based Message Authentication Code) combines a secret key with the hash:
HMAC-SHA256(key, message)
Only someone with the key can produce or verify the HMAC. This is used in API authentication, JWT signing, and webhook signature verification.
Digital Signatures
In public-key cryptography (RSA, ECDSA), you don't sign the full document โ you sign its hash. This is faster and allows signing documents of any size. The signature algorithm is paired with a hash function: RSA-SHA256, ECDSA-P256-SHA256, etc.
Deduplication
Content-addressable storage systems (like Git, IPFS, and backup software) use hashes to identify content. If two files have the same hash, they're the same content โ you only need to store one copy. Git identifies every commit, tree, and blob by its SHA hash.
Choosing the Right Algorithm
| Use case | Recommended |
|---|---|
| Password hashing | Argon2id, bcrypt |
| File checksums (integrity) | SHA-256 |
| Digital signatures | SHA-256 or SHA-384 |
| HMAC / API authentication | SHA-256 |
| TLS certificates | SHA-256 |
| Non-security checksums (speed) | MD5 or SHA-1 (acceptable) |
| Blockchain / Bitcoin | SHA-256 |
FAQ
Can two files have the same SHA-256 hash? Theoretically yes (it's a 256-bit space, infinite possible inputs). In practice, no known SHA-256 collisions exist, and finding one by brute force would take longer than the age of the universe with all computing power on Earth.
Is a longer hash always more secure? Not necessarily. A 512-bit hash with structural weaknesses can be less secure than a sound 256-bit hash. Algorithm design matters more than output length alone.
Why is MD5 still used if it's broken? Legacy systems, convenience, and the fact that most MD5 uses are non-adversarial. If you're just checking that a file wasn't corrupted (not tampered with by an attacker), MD5 still works fine.
What does "salting" a hash mean? A salt is a random value added to each password before hashing, so two users with the same password get different hashes. This prevents precomputed "rainbow table" attacks. Modern password hashing libraries (bcrypt, Argon2) handle salting automatically.
Can I use SHA-256 to store passwords? Technically yes, but you shouldn't. SHA-256 is too fast โ attackers can test billions of guesses per second. Use bcrypt or Argon2, which are deliberately slow.