Mathias Bynens

PBKDF2+HMAC hash collisions explained

Published · tagged with Bash, cryptography, JavaScript, Python

Crypto enthousiast Christian ‘CodesInChaos’ Winnerlein recently tweeted:

plnlrtfpijpuhqylxbgqiiyipieyxvfsavzgxbbcfusqkozwpngsyejqlmjsytrmd and eBkXQTfuBqp'cTcar&g* have the same PBKDF2-HMAC-SHA1 hash.

This intrigued me, so I decided to find out what is going on exactly, and why this happens. If you’re curious too, keep reading.

To confirm these findings, I wrote a Node.js script:

#!/usr/bin/env node

'use strict';

const crypto = require('crypto');
const assert = require('assert');

const salt = 'hunter2'; // can be anything
const iterations = 4; // can be any number
const keyLength = 16; // can be any number

const hash = (passphrase) => {
return crypto.pbkdf2Sync(passphrase, salt, iterations, keyLength).toString();
};

const string1 = 'plnlrtfpijpuhqylxbgqiiyipieyxvfsavzgxbbcfusqkozwpngsyejqlmjsytrmd';
const string2 = 'eBkXQTfuBqp\'cTcar&g*';

const hash1 = hash(string1);
const hash2 = hash(string2);

assert(string1 != string2, 'Passwords should be different');
assert(hash1 == hash2, 'Hashes should be the same (collision)');

Running the script confirms that both strings indeed have the same PBKDF2-HMAC-SHA1 hash.

Explanation

PBKDF2 is a widely used method to derive a key of given length based on a given password, salt and number of iterations. In this case it specifically uses HMAC with the SHA-1 hash function, which is the default as per RFC2898.

HMAC has an interesting property: if a supplied key is longer than the block size of the hash function that’s being used, it uses the hash of the key rather than the key itself.

SHA-1 has a block size of 512 bits, which equals 64 bytes.

So in this case, if the supplied key takes up more than 64 bytes, then SHA1(key) is used as the key. More generally, for any chosen_password larger than 64 bytes, the following holds true (pseudo-code):

PBKDF2_HMAC_SHA1(chosen_password) == PBKDF2_HMAC_SHA1(HEX_TO_STRING(SHA1(chosen_password)))

Note that the smallest password of the two always has a length of 20 characters, because SHA1 hashes always consist of exactly 40 hexadecimal digits representing 20 bytes. One byte, i.e. two hexadecimal digits are used for each character in the colliding password.

For example, in Bash:

$ printf 'plnlrtfpijpuhqylxbgqiiyipieyxvfsavzgxbbcfusqkozwpngsyejqlmjsytrmd' | sha1sum | xxd -r -p
eBkXQTfuBqp'cTcar&g*

That is why plnlrtfpijpuhqylxbgqiiyipieyxvfsavzgxbbcfusqkozwpngsyejqlmjsytrmd and eBkXQTfuBqp'cTcar&g* have the same PBKDF2-HMAC-SHA1 hash.

Consequences

This effectively means you can come up with as many PBKDF2-HMAC-SHA1 collisions as you like. In fact, as long as PBKDF2 is used in combination with HMAC and any hashing algorithm, the same trick can be applied — the only variable is the hash function’s block size. It’s trivial to find colliding passwords when hashing with PBKDF2-HMAC-anything.

So why did Chris choose plnlrtfpijpuhqylxbgqiiyipieyxvfsavzgxbbcfusqkozwpngsyejqlmjsytrmd and eBkXQTfuBqp'cTcar&g*, of all possible collisions? Wouldn’t it be more fun to tweet about a collision with, say, lolololololololololololololololololololololololololololololololol?

$ printf 'lolololololololololololololololololololololololololololololololol' | sha1sum
6ff0b2d76dae24f8b58472ba19f918f07359c0c0

$ printf 'lolololololololololololololololololololololololololololololololol' | sha1sum | xxd -r -p
o��m�$�r���sY��

While it’s easy to find a collision for any given string larger than 64 bytes (just run the above Bash command), it gets trickier if you want the colliding password to consist of readable ASCII characters only. SHA1 hashes can contain any hexadecimal digits, and converting such a hash back into a string is likely to result in at least one character outside of the printable ASCII range ([\x20-\x7E]).

I wrote a Python script to brute-force PBKDF2-HMAC-SHA1 collisions where the large (> 64 bytes) password has a prefix of choice, and where the colliding password consists of printable ASCII characters only.

#!/usr/bin/env python
# coding=utf-8

import hashlib
import itertools
import re
import string
import sys

TOTAL_LENGTH = 65
PREFIX = sys.argv[1] if len(sys.argv) > 1 else ''

prefix_length = len(PREFIX)
brute_force_length = TOTAL_LENGTH - prefix_length
passwords = itertools.product(string.ascii_lowercase, repeat=brute_force_length)
regex_printable = re.compile('[\x20-\x7E]+$')
base_hasher = hashlib.sha1()
base_hasher.update(PREFIX)

for item in itertools.imap(''.join, passwords):
hasher = base_hasher.copy()
hasher.update(item)
sha1_hash = hasher.digest()
if regex_printable.match(sha1_hash):
print u'%s \U0001F4A5 %s'.encode('utf-8') % (PREFIX + item, sha1_hash)

First, I let it run for a few hours without specifying a prefix, and it found the following collisions:

$ ./brute-force.py
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaadzuyfdt 💥 /JRb+z%,6f{$;*|#\LHT
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaebihgje 💥 @3d9ggezHn@iy,vV/#YC
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaagkpicoe 💥 m;Ec4m@1JW)TOSgGl3ZO
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaidwsoeu 💥 Y#pt*^.[}~.6jx!:fu'P
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaitkpvbh 💥 RFvc?%tbygGt(fy7G*+,
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaajhixpyq 💥 @x!iEK2B*N]X`S$u"CEV
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaakidzupe 💥 1t_lP?o}R;YWoJPF7!GY
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaamyrpmpv 💥 &nbSlEfC.X`D0(l)x[tV
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaandkfdci 💥 %U;/> ,3S/4dv!fUku*N
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaowdgicp 💥 b<-'^;Qt7~G[8>\6=wH(
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaqdpodre 💥 EmjaaG|_\Eq;+Wgl%<@)
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaqsmqyjo 💥 oZD49:*Cd)PFCubU[^)_
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaasbvoipq 💥 Woyw!itp af;uJo'Z-x#
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaslsvwra 💥 +ME:wn{F[<f_Zw%yWN\j
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaatljisbf 💥 w<0k!([95gEP%G^?&tP*
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaavdramcj 💥 ?!`e6 m]e/JJubY`|ZM1
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaavutrypa 💥 d63mH`L=IW3Ucwb.FRec
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaawdxljmb 💥 h?2O+Pm5^x|^`du`A:@^
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaxdovzru 💥 ks*A]XD!U1I4[`!:+@s)

Brute-forcing collisions with chosen prefixes is much more fun, though:

$ ./brute-force.py 'trololololololololololololololololololololololololololo'
trololololololololololololololololololololololololololoaaabaslrsc 💥 M,2p[;|4rkZQ,G5`W 9j
trololololololololololololololololololololololololololoaaablkqqic 💥 Fs>"Y0,'Pj:.D8SYiJ!4
trololololololololololololololololololololololololololoaaacblecnb 💥 _0jZI5O+23MGs-1*I\oQ
trololololololololololololololololololololololololololoaaaltyeuck 💥 &8T-gvxEwb`>r\E_.E2@
trololololololololololololololololololololololololololoaaalyxrcss 💥 TGPwn!L?}]x20&U}[^*&
trololololololololololololololololololololololololololoaaaotcvgrg 💥 _b7z7m(^!?`wI;!_~vl<
trololololololololololololololololololololololololololoaaarbaxieg 💥 zR;cd7\'~6zr,*NHQsd$
trololololololololololololololololololololololololololoaaarwxcedc 💥 D`nPK5Is$>.vO9Pi!8\c
trololololololololololololololololololololololololololoaaasghxurd 💥 VI\T]xa:oZ{H04g0`|).
trololololololololololololololololololololololololololoaaayavvpbc 💥 {Aj_idpOz)S/mmc$k/aT
trololololololololololololololololololololololololololoaaazdtxalo 💥 xgdMs~n(Y<jD"?3^#"?}
trololololololololololololololololololololololololololoaabdfbdoeb 💥 Bz+\his88.X1=5@{%<vW
trololololololololololololololololololololololololololoaabhtispji 💥 _z)7#`~I8$!q$x#h_VUI
trololololololololololololololololololololololololololoaabjvvgkfw 💥 h7Kt8oO&#r$jG=BM^DVh
trololololololololololololololololololololololololololoaabkgljluc 💥 agtQL=tZ55nk['ts1t'|
trololololololololololololololololololololololololololoaabmaaoavm 💥 ~;Zp2M=!k>hE[c]6C'nI
trololololololololololololololololololololololololololoaabmhyfjbt 💥 \pEYgz=BP`W3e89zov-R
trololololololololololololololololololololololololololoaabnyplvil 💥 &_f[/u^R@CL|PnNgYI{V
trololololololololololololololololololololololololololoaabogonxnp 💥 FD(}rO@+GIpZo<];#<^o
trololololololololololololololololololololololololololoaabpskyycf 💥 j./w=Ct}HQF:Y5>@a\\Y
trololololololololololololololololololololololololololoaabsnvfhys 💥 ;>[}h6^ha.SNtW:]0`|1
trololololololololololololololololololololololololololoaabtuuecis 💥 emmP*T3"k4$iObPFRFxm
trololololololololololololololololololololololololololoaabuhtrkfz 💥 bMYCnc[o}r\}gcRaL^CG


$ ./brute-force.py 'collision_my_mission_when_the_dawn_breaks_with_a_hand_'
collision_my_mission_when_the_dawn_breaks_with_a_hand_aaaaaappxze 💥 uYmwXw'c04\LHh=i!E|*
collision_my_mission_when_the_dawn_breaks_with_a_hand_aaaaarcgqpq 💥 s%K>G0KHOdx#l*_y"pDj
collision_my_mission_when_the_dawn_breaks_with_a_hand_aaaabejvnge 💥 zI%j'd1f4$U&iZ@w|$!
collision_my_mission_when_the_dawn_breaks_with_a_hand_aaaabzkzrce 💥 <9.f8Zu5e?H"U&;ha!8%
collision_my_mission_when_the_dawn_breaks_with_a_hand_aaaacbszkmz 💥 k^VyGxWJJNy&{^Aa"}Nl
collision_my_mission_when_the_dawn_breaks_with_a_hand_aaaaddcwxmd 💥 oSnr*x=`Ky4`b<{Uh+C@
collision_my_mission_when_the_dawn_breaks_with_a_hand_aaaadqzyizo 💥 &6JZj4jQ0(E0[LrN'K:Y
collision_my_mission_when_the_dawn_breaks_with_a_hand_aaaadsjngbc 💥 DRq9</HIxBLD9Q]Ll6Bj
collision_my_mission_when_the_dawn_breaks_with_a_hand_aaaafcgqfrb 💥 6f.ZfjBU2TLkd2k\+Qxu
collision_my_mission_when_the_dawn_breaks_with_a_hand_aaaafgrkiwn 💥 Ac#Db(cOs\VWz6PU6/sU
collision_my_mission_when_the_dawn_breaks_with_a_hand_aaaafzbytnn 💥 M"F<(|/*K5TS~`:?[kjG
collision_my_mission_when_the_dawn_breaks_with_a_hand_aaaafzwvfen 💥 Z)Jd;6Ql[j5#EE8)=f/U
collision_my_mission_when_the_dawn_breaks_with_a_hand_aaaagkntuew 💥 voC,;s}M;vfggl}}6sr'
collision_my_mission_when_the_dawn_breaks_with_a_hand_aaaagssyiwu 💥 F..Ncx3Z.C]/2n.XX%bu
collision_my_mission_when_the_dawn_breaks_with_a_hand_aaaaibapezf 💥 v/+|qU1erE#4(<ko0D P
collision_my_mission_when_the_dawn_breaks_with_a_hand_aaaajdxotoa 💥 dNLKU-sFik_=AK[FKJTb
collision_my_mission_when_the_dawn_breaks_with_a_hand_aaaakhddjzb 💥 n(?}=^N^52WaozN7*UAh
collision_my_mission_when_the_dawn_breaks_with_a_hand_aaaakxafnmx 💥 "Dy.9Giik;[i$cC#hP_H
collision_my_mission_when_the_dawn_breaks_with_a_hand_aaaamdzdfpk 💥 5wE}l`Rn]nhiX]N/"EII
collision_my_mission_when_the_dawn_breaks_with_a_hand_aaaamofydxb 💥 z\ c_>Q`F?s;jybKeM#(


$ ./brute-force.py 'chosen-prefix_hash_collisions_ftw_'
chosen-prefix_hash_collisions_ftw_aaaaaaaaaaaaaaaaaaaaaaaabcjenbr 💥 e=D&^kQ^qs+^PRL!j_.q
chosen-prefix_hash_collisions_ftw_aaaaaaaaaaaaaaaaaaaaaaaafikpjor 💥 |BHwN@zatb0TT:5I3|7<
chosen-prefix_hash_collisions_ftw_aaaaaaaaaaaaaaaaaaaaaaaaguatnrr 💥 #f.FDRuJz1V%"i*8nLOz
chosen-prefix_hash_collisions_ftw_aaaaaaaaaaaaaaaaaaaaaaaahqmauyt 💥 dUH}]4wi($\O/L`fej0[
chosen-prefix_hash_collisions_ftw_aaaaaaaaaaaaaaaaaaaaaaaaifuaosw 💥 iKd3djP<]g(WLt%2>Uq;
chosen-prefix_hash_collisions_ftw_aaaaaaaaaaaaaaaaaaaaaaaajkzsniz 💥 D>e{,OD%KwQb-y\67nn2
chosen-prefix_hash_collisions_ftw_aaaaaaaaaaaaaaaaaaaaaaaajzgkoyv 💥 WGoWj5#JC+N! w*oH.A
chosen-prefix_hash_collisions_ftw_aaaaaaaaaaaaaaaaaaaaaaaanlmxgoc 💥 >IZASesr9T+_bA?64 LU
chosen-prefix_hash_collisions_ftw_aaaaaaaaaaaaaaaaaaaaaaaaogmbbtc 💥 #f,U~l|=7p<L{'l;wrcr
chosen-prefix_hash_collisions_ftw_aaaaaaaaaaaaaaaaaaaaaaaaolrkjhw 💥 D.^GkKUq$%yn$kmT_,'Z
chosen-prefix_hash_collisions_ftw_aaaaaaaaaaaaaaaaaaaaaaaapetgdgr 💥 6I}z8;8M4Nts,D~D&Z-@
chosen-prefix_hash_collisions_ftw_aaaaaaaaaaaaaaaaaaaaaaaapgkvttr 💥 auk+ *^c%1^\PC2qVi;g
chosen-prefix_hash_collisions_ftw_aaaaaaaaaaaaaaaaaaaaaaaapvkfwkv 💥 .,vNus(7m0'TyIEKHaKX
chosen-prefix_hash_collisions_ftw_aaaaaaaaaaaaaaaaaaaaaaaaqlxcssl 💥 ri%kSz\<mF +|,}=~NB9
chosen-prefix_hash_collisions_ftw_aaaaaaaaaaaaaaaaaaaaaaaasacwjzq 💥 Zx25[b K:94U>DPvV/(P
chosen-prefix_hash_collisions_ftw_aaaaaaaaaaaaaaaaaaaaaaaasqapqmw 💥 sc'S^?vOtrg;"8gF)U}f
chosen-prefix_hash_collisions_ftw_aaaaaaaaaaaaaaaaaaaaaaaatczlqqs 💥 bMob/6%C,=?an0fCtl^O
chosen-prefix_hash_collisions_ftw_aaaaaaaaaaaaaaaaaaaaaaaaxtjsnyi 💥 E0|^i;HKYSht*@T'FAqc
chosen-prefix_hash_collisions_ftw_aaaaaaaaaaaaaaaaaaaaaaaazjcrflx 💥 `]|D]>=4b8O>0{vwZ],e
chosen-prefix_hash_collisions_ftw_aaaaaaaaaaaaaaaaaaaaaaaazxwmsbm 💥 G0FFrj-[w]m pfx>W$Un

About me

Hi there! I’m Mathias. I work on Chrome at Google. HTML, CSS, JavaScript, Unicode, performance, and security get me excited. Follow me on Twitter, Bluesky, and GitHub.

Comments

Thomas Geraghty wrote on :

Cool article as always! I’m just curious as to why you switched from Node to Python for the brute force collisions? I’m not trying to push some sort of Lang A vs. Lang B agenda, just wondering if there was some reasoning behind your decision to switch that I’m missing :)

wrote on :

Thomas: The reason I used Python for the brute-force script is the amazingly powerful itertools.product. You could do something similar in JavaScript, but you’d have to write it yourself.

On the other hand, I used Node.js to confirm the collisions because its built-in crypto module supports PBKDF2 out of the box.

Solar Designer wrote on :

A couple of weeks ago, I tweeted this:

scrypt(PBKDF2-HMAC-SHA256-fail-affects-scrypt-no-security-issue-bGoDFpr8) = scrypt(;`B3nR6wQ2-_LSg"mH #yszm`[#z8B&L) for any salt, N, r, p

The embedded " and ` are not great, but on the other hand there’s no '. The total number of different characters in the shorter password is 26. (The total number of different characters across both passwords could be made much smaller by repeating the same character in place of the message. Then it’d probably contain only 9 different characters.)

This may be usable to check whether a site uses something based on HMAC-SHA256 (with PBKDF2’s order of key and salt inputs) or not.

CodesInChaos wrote on :

I chose a random long password because I preferred it looking random. More mysterious that way — a chosen password makes it too obvious it’s just a cheap trick.

My code is quite similar to yours. I created a random 65-char string as starting point. Then on each iteration increment it in Base26. Compute its SHA-1 hash, check if the hash is nice and printable. Repeat.

Z wrote on :

The moral of the story? Using too looooooong passwords is not as secure as one might think.

I saw admins using 25-character-long passwords, while the hash was based on DES (yes, we are talking about Oracle 9 passwords).

Tom wrote on :

Z: Long passwords are still safe. Nobody has dictionaries neither uses brute force with unreadable characters… yet.

David wrote on :

How useful is this… I mean is it possible to find a collision for common words such as password123 or ChangeMe? Also, how long does it take to find a collision, as a plain old brute-force against the password (with GPU) may be faster?

ILGUIZ wrote on :

Finding printable hashes of arbitrary strings:

find all s whose H(s) is printable

…does not look like brute-forcing where we would search for a string whose hash matches the given hash:

find an s whose H(s) == h

Leave a comment

Comment on “PBKDF2+HMAC hash collisions explained”

Your input will be parsed as Markdown.