PART 2: Insights from Entropy
In the first part of this series, we clarified what is the entropy of a password. We saw that it represents the quantity in bits of the information to recover in order to disclose the password. This gives us the necessary data to understand this second part, so, make sure you’ve read the first part. If we’re all aligned on the definition of the entropy, let’s see what information an attacker could get from knowing the entropy of a password. Is this data sensitive?
What information does the entropy give?
To determine if the entropy is sensitive information, let’s focus first on what information an attacker can get from knowing it?
We saw on the first part that the entropy formula is simple and is composed of 3 elements:
- the resulting entropy itself
- the size of the character set
- the length of the password
With simple maths, we can define the size of the character set from the entropy and the password length. We could also define the password length from the entropy and the character set used.
entropy = passwordLength x log2(characterSetSize)
passwordLength = entropy / log2(characterSetSize)
characterSetSize = 2entropy / passwordLength
However, it would be strange to type 2.45 characters or to pick characters from a set of 5.34 elements. It doesn't make any sense, does it? So we can deduce that `passwordLength` and `characterSet` are integers. This limits the possible results for all the 3 information (`entropy`, `passwordLength` and `characterSet`).
To dig deeper in this limitation, let's summon a 2D table of pre-computed entropies:
Mask size v length > | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 |
---|---|---|---|---|---|---|---|---|
3 | 1.584962501 | 3.169925001 | 4.754887502 | 6.339850003 | 7.924812504 | 9.509775004 | 11.09473751 | 12.67970001 |
4 | 2 | 4 | 6 | 8 | 10 | 12 | 14 | 16 |
6 | 2.584962501 | 5.169925001 | 7.754887502 | 10.33985 | 12.9248125 | 15.509775 | 18.09473751 | 20.67970001 |
7 | 2.807354922 | 5.614709844 | 8.422064766 | 11.22941969 | 14.03677461 | 16.84412953 | 19.65148445 | 22.45883938 |
8 | 3 | 6 | 9 | 12 | 15 | 18 | 21 | 24 |
9 | 3.169925001 | 6.339850003 | 9.509775004 | 12.67970001 | 15.84962501 | 19.01955001 | 22.18947501 | 25.35940001 |
10 | 3.321928095 | 6.64385619 | 9.965784285 | 13.28771238 | 16.60964047 | 19.93156857 | 23.25349666 | 26.57542476 |
11 | 3.459431619 | 6.918863237 | 10.37829486 | 13.83772647 | 17.29715809 | 20.75658971 | 24.21602133 | 27.67545295 |
13 | 3.700439718 | 7.400879436 | 11.10131915 | 14.80175887 | 18.50219859 | 22.20263831 | 25.90307803 | 29.60351775 |
14 | 3.807354922 | 7.614709844 | 11.42206477 | 15.22941969 | 19.03677461 | 22.84412953 | 26.65148445 | 30.45883938 |
15 | 3.906890596 | 7.813781191 | 11.72067179 | 15.62756238 | 19.53445298 | 23.44134357 | 27.34823417 | 31.25512476 |
16 | 4 | 8 | 12 | 16 | 20 | 24 | 28 | 32 |
This table is limited for display reasons. However, if we generate a much bigger table we can notice an interesting fact: almost all entropy values in the table appear once!
That's important information to consider as it means that for a given entropy you can recover both the password length AND the character set used.
If the value appears multiple times, it's usually very limited (like `12` that appears 3 times here only).
Information: From only the entropy value, an attacker is able most of the time to recover the size of the character set used and the password length.
How sensitive is the password length and the character set?
Let's imagine the following scenario: An attacker sees the entropy of a password and tries to use the information to break the password. Technically knowing the entropy of a password doesn't really change the password strength as the quantity of information to recover remains the same. Be careful though, it could give good information if the password is too weak, like being probably leaked in a dictionary or being small enough to attempt brute-force easily.
But we need to put ourselves in the shoes of a hacker to understand. Classical brute-force attacks are approached either with a dictionary of leaked passwords which is a fast way to break a secret or either by trying all possible combinations of characters. Without any data regarding a password an attacker could try with different password lengths starting from 1 character to a bunch of them and for each character to try with the biggest charset possible. This last process is much more tedious to run and mostly requires patience (if the password is weak enough not to spend multiple lifetimes to be broken). With the knowledge of the length of the password, the first rounds can be bypassed, and knowing for sure the character set used plays even more in the favour of the attacker.
This risk however is to be balanced. First, as said earlier, the password strength doesn't change per say. Second, if it's too weak, showing the entropy won't change much, it's weak, you're broken.
We said, "Show me your entropy and I'll break your password!"
This statement is a bit bold admittedly. But, everything is not covered yet. We were watching a video and not only we learned the entropy of the password but also its evolution at each character typed!
In other words, the character set size detection is usable at each keystroke. The password length is actually just displayed in the video. We could also count how many times the entropy changed (no computation involved).
For the pleasure of having a side note, the resulting entropy of the password in the video is high enough to discourage any attacker to attempt a brute-force attack IMO. Therefore, we'll go on with much simpler passwords to illustrate how this knowledge is interesting to a hacker.
Let's assume we'll have to break the following password: `123abcDEF`.
"How scandalous sir! We know the password therefore there is no information to recover and the entropy is null!"
Wow, you read the first article right? That's a real pleasure to see. But, let's pretend we don't know it just for the example please.
✨ Invoking some magical graphs right now! ✨
On the chart, we see 4 curves, the straight ones are the evolution of entropy given a mask size and different password lengths. The one on top of the others is the entropy evolution of the password we want to break.
From the final entropy of the password we could know that it is composed of 9 characters and from a set of 62 characters.
We can also notice the following:
- The 3 first characters follow the first curve
- The second set of 3 characters are following the second curve
- The 3 last characters are following the last curve.
- The final entropy of the password confirms it's a 9 character long password and that it uses 62 different characters
- There are jumps on the curve for the characters 4 and 7.
This matches the password pattern we have:
- 3 digits
- 3 small letters
- 3 capital letters
- characters 4 and 7 are characters from a new set
Based on that we can deduce the size of the character set for each character. Moreover, each character that produces a jump on another curve is part of the additional character set only and not the entire character set.
It means that we can have an idea of the structure of the password with its entropy evolution.
What could be done as an attacker if entropy is leaked?
Let's wear hacker shoes again. We spoke earlier about how an attacker drives a brute-force attack. A way to do it is by using tools such as hashcat and John the Ripper. They could be used to run brute-force attacks with given password structures.
By using a well-known structure, the brute-force can focus on only the potential candidates that could work and thus eliminate all the passwords that are sure not to be working (they don't follow the right structure).
The more an attacker knows about the structure of a password the less information is to be recovered. Thus, the real entropy of the secret is reduced. In other words, showing the evolution of the entropy of a password could make the password weaker than what is actually measured.
A next question that naturally comes is "by how much the entropy is reduced if we learnt about its structure?". The answer is not straightforward as it depends on the discovered structure.
Let's imagine a password is composed of only capital letters. Knowing the final entropy will show the size of the password and the size of the character set right?
Information: If the character set is unchanged from the beginning, no more information is recovered. We know all the information from the final entropy already.
If there are structural changes in the password when typing (the jumps on the plot) we can disclose interesting information as an attacker.
Let's pick 5 passwords that have 1 structural change in a password but at different places:
- abcdefgh1: the change happens on the very last character
- abcdefg1h: the change happens on the before last character
- 1abcdefgh: the change happens on the second character
- a1bcdefgh: the change happens on the second character but it starts with a higher entropy
- abcd1efgh: the change happens in the middle of the password
Their final entropies are the same regardless of where the `1` is placed (and it is 46.53 bits).
To compute the entropy we can adapt the formula a bit to make it a sum of the carried information character per character
- abcdefgh1: 8 x log2(26) + 1 x log2(10) = 40.93
- abcdefg1h: 7 x log2(26) + 1 x log2(10) + 1 x log2(36) = 41.39
- 1abcdefgh: 1 x log2(10) + 1 x log2(26) + 7 x log2(36) = 44.21
- a1bcdefgh: 1 x log2(26) + 1 x log2(10) + 7 x log2(36) = 44.21
- abcd1efgh: 4 x log2(26) + 1 x log2(10) + 4 x log2(36) = 42.80
The loss compared to not knowing the password structure is the following:
- abcdefgh1: 46.53 - 40.93 = 5.60
- abcdefg1h: 46.53 - 41.39 = 5.14
- 1abcdefgh and a1bcdefgh: 46.53 - 44.21 = 2.32
- abcd1efgh: 46.53 - 42.80 = 3.73
We have a loss around 2 to 6 bits as we can see. Even when having the same final entropy we can have different loss amounts. This is due to the structure found. At first glance we can think that the earliest a structural change happens the better is to keep the strongest entropy.
It can be explained by the fact that the earlier a structural change happens the earlier the character set grows and the less we know about the password structure.
Without any knowledge of the structure of the password a single character carried 5.17 bits of entropy, so in some cases when knowing the structure we learnt about 1 symbol (in quantity of information).
In this example, for the worst case, the loss represents 1 less character to find in comparison with the full entropy.
Here's a quick demonstration:
Not knowing the structure of the password, the final entropy is 46.53 for 9 characters. It means each symbol carries 46.53 / 9 = 5.17 bits.
The worst case scenario is about losing 5.6 bits of information.
passwordLength x 5.17 = 46.53 - 5.60
passwordLength = 40.93 /5.17
passwordLength = 7.91 characters
We had 9 characters to guess, with the worst case we have now 8 (7.91) characters to guess with 1 partially found. So, there is around 1 less character to find out for an attacker.
A more realistic scenario
How much information is lost if we follow NIST and OWASP recommendations and we use an 8 character long password that is randomly generated by passbolt and an attacker sees the evolution of our entropy while typing?
Let's consider `q!@8/F.P` as our password and it has been used on a video for instance where we can see its entropy evolution.
The final entropy is 51.00 bits and it means each symbol carries in theory 6.375 bits (`51 / 8`).
In that case, there is almost 1 change of character set per character typed.
On passbolt application the character sets are the following:
- `q`: 26 new chars in the set
- `!`: 6 new chars in the set
- `@`: 7 new chars in the set
- `8`: 10 new chars in the set
- `/`: 4 new chars in the set
- `F`: 26 new chars in the set
- `.`: 4 new chars in the set
- `P: 0 new char in the set
Learning that structure we can deduce the "effective" entropy:
entropy = log2(26) + log2(6) + log2(7) + log2(10) + log2(4) + log2(26) + log2(4) + log2(83)
entropy = 28.49 bits
Ouch! This one hurts, the loss is not a small one, it's almost diving by 2 the final entropy.
Following the previous logic, let's deduce how many characters we have to find now:
passwordLength x 6.375 = 28.49
passwordLength = 28.49 / 6.375
passwordLength = 4.47 characters
There are 5 characters to find out with one which is partially known (as it's not exactly 5 chars to guess but 4.47).
We've lost the equivalent of a bit more than 3 characters here 🥲.
Is it sensitive?
We can rarely answer such a question with a straight yes or no. I would say yes even if of course, everything depends on the situation. In the video, John Hammond was creating his own password for his account. Even though you would guess the structure of the password used, there is still work to be done to hack his account.
- Only the structure is known, not the password
- The final entropy is sky high
- His passbolt instance is maybe not reachable for a user to test
- The password is not the password to access passbolt directly but a password to decrypt an openPGP private key. You need an access to the key in order to test which password is the right one
Could we break his secret used in the video? Probably not. There could be other ways, but, with only the entropy evolution approach, I wouldn't even try.
Another scenario would be that for example a user shows the entropy evolution of a password and uses the password on a reachable service from an attacker.
Let's consider that the password is `q!@8/F.P` like previously:
- Only the structure is known, not the password
- The final entropy is the minimum recommended by NIST and OWASP
- The service is reachable
- The account email or username is known
- The "effective" entropy after knowing the structure is way under NIST and OWASP recommendation (28.49 bits against about 50 bits)
Are we at risk? Hooooooo yes!!!!
Obviously, it depends on the service and the authentication mechanism + the protection implemented by the service etc. But, anyway the final entropy is almost half the very minimum recommended, it's way too low and we could consider our account broken here.
Conclusion
TL;DR;
We wanted to answer the question "is the entropy a sensitive information?". We saw that, yes it could be considered sensitive. The final entropy alone could tell an attacker if a brute-force attempt is realistic or not and in the end help the brute-force procedure by eliminating passwords that are sure not to be working.
We learnt that showing the evolution of the entropy is even more critical as it can reduce considerably the strength of the password.
On some scenarios it is very critical and would put an account in immediate danger, on others it would show that a brute-force is not to be attempted.
What to do if it happened?
Be aware that you should always play safety first and if the entropy is shown, it could be used for a brute-force attempt (if the conditions are met etc) and it's even more critical if the evolution of the entropy is shown.
First, better not to show the evolution of an entropy or even the final entropy on a video. We should blur out the entropy just like we would for a password, just in case of.
Second, if you're at least in doubt, don't hesitate to update your password, again, just in case of.
An important notice
I would like to emphasise the "if you're at least in doubt", as even if it is said that a high entropy could convince an attacker not to try the brute-force attempt, knowing the structure of a password could give some interesting information.
It happens a lot for remembering and practical reasons that users create passwords following a pattern. An attacker knows that obviously and, unfortunately, it's not rare to find a password like `companyname-user-year`.
It's easy to remember, it could fit the password rules etc, so it's practical for users. However, the evolution of an entropy could be used to guess if a password is following such a structure.
Imagine now, that 2 users from the same company showed a video where we can see the evolution of the entropy of their password and that the company uses a pattern in their password. With the first video, the attacker would have good feelings, on the second, it becomes a certainty that there is a pattern. Entropies just collapse in such a case.
Some recommendations
Safety first!
Consider the entropy as a sensitive information. Don't show publicly the entropy of a password. Just blur out the fields if you need to show some screenshots, just in case of, we never know.
Continue reading
4 min. read
New Developer Documentation is Now Available
Good news everyone! The developer documentation for the Passbolt API was updated, and it’s now better, faster, stronger.
2 min. read
Passbolt Partners with SUSE to Enhance Open Source Security Solutions
We are pleased to announce that Passbolt is partnering with SUSE to bring enhanced security solutions to organisations that value open source software.