Overview
The Adobe 2013 data breach, in which about 40 million account user names and other details were disclosed, as well as other data breaches in which cleartext passwords are disclosed, underscores the need for password hashing, as opposed to cleartext or encryption.
Adobe Data Breach: http://krebsonsecurity.com/2013/10/adobe-breach-impacted-at-least-38-million-users/ Cupid Media Data Breach: EDIT: 5/30/2014 |
Encryption is reversible, where hashing is not.
Table of Contents
What is Hashing?
Hashing
Hashing uses an algorithm to scramble data in a deterministic way, but the process is not reversible. For a given piece of information, such as a password, hashing will produce the same hash code each time, for the same password. The hashing process can’t be reversed – meaning that a stolen hash code can’t be turned back in to the original password.
Hashing is like using a cross-cut paper shredder for your data. During the hashing process, most of the data is removed, and the resulting hash code is both unique, and very small compared to the original data.
Because data is removed, and the hashing process is so complex, it can’t be reversed — you can’t start with a hash code and derive the original data. This means that hashing is a “one way” process.
Changing the data ever so slightly results in a completely different hash code.
When used to hash passwords, each unique password results in a unique hash.
Sample hashes:
Source | SHA-1 Hash |
---|---|
Password: 12345 |
8cb2237d0679ca88db6464eac60da96345513964 |
Password: 12346 |
94ae0a96d83a445d72a93417b63ac90d79db5eca |
Password: thisismyreallylongpassword |
a04f5424328d9b7b7a4d8ce8e0ebf99ffe610c42 |
Contents of File: c:\windows\explorer.exe (Approximately 2.8 MB) |
7a0fd90576e08807bde2cc57bcf9854bbce05fe3 |
Key points – Hashing
- Hashing is a one-way process
- A specific source password will always yield the same hash using the same algorithm, regardless of which system performs the hash (hash codes are persistent and portable).
- Minor changes to the source can result in a significantly different hash code.
- Hash codes are the same length regardless of the length of the source data – you can hash a password, a sentence, an e-mail, or a whole file. The resulting hash codes will be the same length, and are guaranteed to be unique.
Here is a website with an online hash generator:
http://www.sha1-online.com/
Secure Hashing
Using brute-force attacks and lookup tables called “rainbow tables”, it is possible to successfully attack weaker hash algorithms, such that you can reverse engineer an initial value that results in the desired hash value.
Early hash algorithms were used to insure data integrity, such as CRC and CRC32. Later, cryptographic hashing algorithms were used to ensure message and sender authenticity — as time progresses and computing power increases, older algorithms are no longer considered cryptographically-secure.
File Integrity | Cryptographically-secure | No longer secure |
---|---|---|
CRC CRC32 Odd/Even (very old) CKSUM MD-2 MD-4 |
SHA-1* SHA-256 SHA-2 |
MD-2 MD-4 MD-5 |
(*SHA-1 is still considered cryptographically-secure, but is being deprecated in favor of SHA-2)
Hash collisions can occur when different data values result in the same hash code. Although highly unlikely, hash collisions usually occur with arbitrarily short input data values, for example only a few bytes in length.
Like rainbow tables, hash collisions can be exploited as an attack against the hash algorithm itself, to try to determine a source data value that would be equivalent for a given hash code.
These types of analyses require immense computing resources. Due to Moore’s Law, newer, more complex hash algorithms must be developed, and older algorithms are deprecated, to stay ahead of the cat-and-mouse game between attackers and their would-be targets.
What is Encryption?
Encryption
Encryption uses a cipher (encryption algorithm) to scramble data. The cipher uses a key, which is an arbitrary value such as a password, that can later be used to decrypt the data using the same cipher. If viewing the data while encrypted, it appears to be random garbage. This is known as the encryption envelope.
Key points – Encryption
- Encryption uses a cipher (encryption algorithm) and a key to scramble data.
- To decrypt the data (reverse the encryption process), you must have the cipher and key.
- Attempting to decrypt with the wrong key results in unusable garbage
Hashing vs. Encryption
On the surface, hashing and encryption seem very similar, but there are some distinct differences that make one or the other more suitable in various situations.
- Hashing removes data, where encryption scrambles data.
- Hashing is one-way, where encryption is reversible.
- Hashing only uses the data itself as input, where encryption requires the input data plus an encryption key that is later used to decrypt the data.
- Hash codes are a fixed length, regardless of the amount of input data. Encrypted data is the same size as the input data, or slightly larger (due to encryption overhead).
- Hashing is geared toward ensuring the integrity of a process, while encryption allows data to be protected in such a way that it can be accessed later.
- Message authenticity uses both hashing and encryption to ensure that a sender is who they claim to be.
Authentication with Cleartext Passwords
As previously mentioned, storing passwords in cleartext (meaning, not hashed, not encyrpted) means that a compromised database results in an attacker having full knowledge of the passwords. Users have bad habits, like using the same password for multiple websites, meaning, YOUR WEBSITE could be the reason that someone’s bank account or e-mail becomes compromised!
Overview of cleartext authentication:
- User account is created, and credentials (username, password) are stored in the database
- To log in, the user enters their username (u1) and password (p1) in to the application
- The application uses the u1 username to look up the password (p2) from the database
- If the two passwords match (p1 = p2) then the user is authenticated.
Risks
- Passwords stored in the database can be retrieved from the database and used elsewhere
- Passwords stored in the database might be stored in cleartext on disk, depending on the database and file format used by the application.
- Passwords stored in cleartext on disk could be physically compromised by stealing the hard drive(s) containing the data.
Authentication with Encrypted Passwords
Encryption is a good way to protect data that will be used later. Encrypted data can later be decrypted to its original value. Although encrypting passwords protects them, typically, an application uses the same encryption key for storing all user passwords. Using the same encryption key for multiple arbitrarily short values gives an attacker the method to potentially compromise the cipher or derive the encryption key, resulting in the passwords eventually being decrypted by the attacker.
Overview of Encrypted Authentication:
- When the user account is created, the username is stored directly in the database.
- The password is encrypted using a well-known encryption key, and the encrypted password (e2) is stored in the database.
- To log in, the user enters their username (u1) and password (p1) in to the application.
- The application uses the u1 username to look up the encrypted password e2.
- The application decrypts e2 and extracts the original password (p2).
- If the two passwords match (p1 = p2), then the user is authenticated.
Advantages over cleartext
- Passwords are stored in the database and on disk in an encrypted format.
- If the database or drives are compromised, an attacker will have to commit significant effort to decrypt them.
- Passwords can’t be arbitrarily compromised by an attacker reading the database (e.g. SQL injection)
Risks:
- The application must use a well-known encryption scheme, including a well-known key. Coupled with passwords being arbitrarily short values, this could provide a basis for attacking the encryption scheme or deriving the key.
- The application itself could be attacked and analyzed to determine the key. Attackers could obtain the application binaries, steal the source code, or compromise a running system (such as a buffer overrun).
Authentication with Hashed Passwords
Hashing removes the password completely, leaving nothing available to be compromised.
Overview of Hashed Authentication:
- When the user account is created, the username is stored in the database.
- The password is hashed (h2), and the hash code h2 is stored in the database.
- When the user logs in, they enter the username (u1) and password (p1).
- The application immediately hashes the password p1 in to hash code h1.
- The application uses the u1 username to look up the password hash code h2 from the database.
- If the two hash codes match (h1=h2), then the user is authenticated.
Advantages
- Hashed passwords can’t be reversed, stolen, or compromised.
- There is no well-known encryption scheme or key that can be exploited.
- A hash code is useless! A stolen hash code can’t be used elsewhere.
What Else Can Hashing Do?
Hashing can be used for a variety of situations, where a data value (such as credentials) are well-known, but there isn’t a need to access the cleartext data itself — the data can simply be removed, and the hash code can be used in its place.
- Username. Aside from compromised passwords, privilege escalation is also a risk. Hashing the username prevents an attacker with access to the database from modifying application privileges, because the attacker won’t know which user record to update.
- Social Security Number (SSN). The GLB and HIPA acts specify certain levels of protection (security, privacy) for the SSN. For most uses, the SSN is used as an index (for tracking purposes) and could be removed (replaced with a hash code). If your application interfaces with other systems requiring the cleartext SSN, the cleartext value could be stored in a more secure, limited-access database whose sole purpose is to match the hash codes to the cleartext SSN. By reducing the places within your application (data footprint) where SSN is used in cleartext, you reduce the opportunity that an attacker could intercept it!
- Other sensitive fields that are ONLY used for verification (such as date of birth) can be hashed.
- Answers to security questions, if properly normalized, could be hashed, preventing an attacker from gaining knowledge of the user that can be used to exploit another system.
- Federation. If two systems contain the same sensitive data, they can simply exchange hash values, without needing the underlying data. The exchange itself can be protected by a 2nd round of hashing, so that the password hash doesn’t become a skeleton key.
- Salt can be used to convolute source data prior to hashing, in order to prevent brute-force attacks. Since the hash algorithm is well-known, the attacker can throw random data through the hash algorithm in order to obtain a valid hash value (cracking). Using salt means an attacker must use a much wider range of source values — for example, dictionary attacks are useless because salting ensures that no source value would ever be in the dictionary.
Summary
Secure hashing is the proper way to protect passwords, and can be used to protect other data fields as well.
Hashing can also be used to securely share files! For more information, please review my paper, here:
A method for securely sharing files
Pingback: The Credit Card Industry’s Dirty Secret | Justin A. Parr - Technologist
Pingback: Password Commandments | Justin A. Parr - Technologist
Pingback: Technology-Related Movie Myths | Justin A. Parr - Technologist