Basic Maths and Ideas Behind Password Complexity

If you want to "crack on" with checking the strength of your own passwords, and you have a test box and plenty of time on your/it's hands/cpu – read the tut here:

As passwords are such an important part of everyday IT experience, I want to look at some of the basics of what is involved in their creation and use, and appreciate how complexity increases the difficulty level in guessing or brute forcing a crack attempt at them becomes – or not – without prior system working knowledge or other tool availability.

The best beginner example I know that everyone is familiar with to get going on the maths, is the 4 digit PIN – Personal ID Number – used with a cash card at the ATM or Automatic Teller Machine.

The numerical complexity comes from increases in powers of 10 for each extra digit allowed in the PIN length. As the ATM's were initially designed and programmed to accept a maximum of 4 digits – which seems quite simplistic nowadays given the length of phone numbers most of us remember to some degree – it still has a high chance of not being guessed if your card is stolen, 1:10,000. Why is that?

If you imagine you only were allowed to have a 1 digit PIN, then you can simply see that you could choose any number from 0-9, a choice out of ten digits:

0 1 2 3 4 5 6 7 8 9

Mathematically, this is represented by ten to the power of one = 10 ^1 = 10.

If this card was stolen, someone would have a 1:10 chance of guessing my one digit PIN on the first go.

The other feature of card security that is involved at this point is a maximum number of attempts before the card is retained by the machine, which is usually 3 attempts. If this safety measure was not implemented, then the person could just try all the numbers from 0-9 until they got access.

The chances of success would therefore increase with each successive attempt, the next being a 1:9 as one digit has already between tried, then 1:8 for the three attempt maximum. OK, simple to understand so far.

If a 2 digit PIN was allowed, the complexity jumps 10 fold, because now you can have the first ten digits combined in any order with the second ten digits giving possibilities of 10 x 10 = 100:

00 01 02 03 04 05 06 07 08 09

10 11 12 13 14 15 16 17 18 19

20 21 22 23 24 25 26 27 28 29…all the way to 99.

You can see this is a total of 100 or 10^ 2 combinations 00 – 99 inclusive. You should now see the pattern emerging from 10^1 and 10^2.

For a 3 digit PIN, you would get 000-999 or 10 x 10 x 10 = 1000 combinations, so for our standard 4 digit PIN we have 10 x 10 x 10 x 10 =10,000 combinations overall. This gives only a 1:10,000 chance of someone guessing the PIN in the first random attempt, 1:9999 for the second attempt and 1:9998 for the final go.

Note one other important aspect of the combinations here though. If you chose simple, easy to remember, repeating digits like 0000, 1111 etc. then this cuts down the guessing by the thief massively, as you now hand them back the same odds as if you had only a 1 digit PIN – they have 10 opportunities total to guess right from 0000-9999, with 3 attempts to do so, so back to only a 1:8 chance of access out of a possible 1:10,000! That's why ATMs don't (or shouldn't) allow repeating characters if you change your PIN.

Do you see where that is going for easily guessable passwords?

To get a perspective, as most passwords are now required to be a minimum of 8 characters, a maximum complexity for an digit only PIN is 100,000,000 or one hundred million (10^8) combinations.

So, in summary, the amount of digits you can use for a PIN increases complexity 10 times more for each extra digit, but repeating digits reduces complexity potentially massively also. Setting a maximum attempt lockout time is imperative, to stop people being able to spend all day going through the possibilities methodically.

For example, if you had unlimited PIN access attempts so no lockout, and could physically input a new code every second it would take 10,000 seconds or only 10,000/60 x 60 = 2.8 hours to crack your pin – maximum – and if that was the very last attempt that got access and not earlier. If the thief can withdraw £350 maximum per day from your account, then that is a good hourly rate of pay eh?

You see where that idea is going with speed of attempts (brute forcing, computing power), and number of attempt access restrictions (stolen password files/dictionary lists)?

The classic historical example of repeating patterns – back when most people were unfamiliar with computer keyboards – for an 8 character password was "abcdefgh" or 12345678, or indeed, "password" – that same Gary Mckinnon claim that the US Defence networks he accessed in 2000 looking for UFO evidence, had either no security at all, or just the word "password" required to gain admin access to Windows machines:

I remember a classic example of usual MS lack of testing and foresight back in 1998 when I bought a new Win98 laptop in the US while working there on a sub sea optical fibre project in Rhode Island – a superfast 233Mhz machine with a massive 256MB ram! All for the bargain price of \$2000!! Ouch. Still a lot cheaper than rip off Britain at the time though. I set a user password for my account login screen, that my colleague – with almost no PC knowledge at all – bypassed with trial and error, by hitting the escape key! Unbelievable now isn't it?

So, how does adding letters increase complexity? The idea is the same idea as above with 10 available digits, but now you can have massive multiplication factors included with just the English alphabet of 26 letters, each of which can be lower or upper case.

Starting with lower case letters only, you would have the same idea as for digits, but instead of powers of base 10, the powers are to base 26. A single character, of lower case only allows a 1:26 chance of a correct guess. If the password could be 2 characters long, then you would have 26 x 26 (26^2) = 676 possible 2 character combinations. An 8 character password would have 26^8 combinations = 208,827,064,576 or nearly 209 billion possibilities.

Already, it's impossible to crack at one access per second as a human, doing it methodically with no dictionaries, at about 208,827,064,576 / 60secs x 60mins x 24hours x 365days = 6,622 years.

For a computer, being billions of times quicker, that is nothing though eh? Divide that back by a 1GHz pc at one guess per cycle and it is back to about 209 seconds, or 3.5 minutes to cycle all possible 8 letter lower case combinations.

If both lower AND upper case characters were allowed, then from the maths logic above, you get the same for upper case at 26^8 for an 8 character password, so for mixing both you get a character set of 52 letters total so for an 8 character password you now have 52^8 = 53,459,728,531,456 or 53 trillion combinations.

If you then add the extra possible 10 digits of 0-9, this adds up and makes a total character set of 26 upper + 26 lower + 10 digits = 62 characters, in combinations of 8 characters long = 62^8 combinations = 218,340,105,584,896 or 218 trillion combinations.

This hasn't even considered Special Characters like!"£\$%^&*()_+=- or language inflection characters with umlauts, graves, circumflex accents, or Chinese or Arabic symbols that don't cross encode as equal keys in a particular keyboard ASCII map so may add an extra possible character (disregardless of language map) to an already growing set of the 62 standard letter/number international characters above.

So, if 8 character passwords are so potentially complex, why is there a password complexity issue?

Well, we're human and so predictable to a large degree, and we like to keep things simple for ourselves where possible. We like to use short, common words that are easy to type and remember. These words are common in dictionaries and easily available as script additions that computers can use to cycle through, and the Oxford English dictionary contains about 200,000 entries, the bulk of which would be 8 characters or less depending on how you define and count them.

https://www.oxforddictionaries.com/words/how-many-words-are-there-in-the-english-language

As an entry comparison is not the same as a character combination attempt, it takes a lot loss attempts to compare a password to a dictionary entry of only about 200,000 options, so if the password exists there, it takes less also for 8 character words than the full 200,000 too, of my human entries per second example to try the word. This is about 200000 / 60 x 60 x 24 = 2.3 human days.

My 1GHz PC may compare per cycle (not in reality I know but for perspective) in only about 200 microseconds in comparison.

That's a big difference between a word comparison and a methodical combination trial of 62^8 number/letter mix which takes 218340105584896 / 1000000000 = 218340 seconds or about 2.5 days for my theoretical 1GHz PC.

That's why you shouldn't use dictionary words for passwords.

If you want an idea of what makes a complex password, there are plenty of programs that generate random passwords, such as:

apt-get install pwgen

man pwgen

"The pwgen program generates passwords which are designed to be easily memorized by humans, while being as secure as possible. Human-memorable passwords are never going to be as secure as completely random passwords. In particular, passwords generated by pwgen without the -s option should not be used in places where the password could be attacked via an off-line brute-force attack. On the other hand, completely randomly generated passwords have a tendency to be written down, and are subject to being compromised in that fashion."

Examples of random but more easily remembered 8 char passwords are:

root@debianP4:~# pwgen

Raong9ti aed9UuNa huwai5Ri ood5oo5E zahx2Ree phais5Th Lohh6eij aigh8Quo

Ow2eexai EiPee2wi eeNah6uH aijaiV4s AeR1reQu iG3aomoh aePu3jai thiv8soR

cah1Ohti ueshei1U fenaij0I teiph5Oh aeKei9de zoh4AiY2 ieVei4ci Dahdoh6e

aiBai5ch aiqu3xuB kii6eoN2 li7Wei1e Ax5taith pheDah7e dieReeT9 Ohsh9ahb

There are switches for random capitals, or more overall security:

root@debianP4:~# pwgen –sc

TNFNC5qu ZwNS7jXT z3IUbG8i mhseH8ch gpO1cqKw yHVa94Bj Wpa3Isba zHJlGJD0

1zvEQkMM mkf9hTXJ UEJTzu21 Nx3bcjVM Ue1mP2ag o3qKSls8 7kBA1YZz 6AwN02bg

Pretty horrible though eh?

Obviously, as humans we have to have easily remembered words which generally exist in dictionaries, and given the predictable social nature of humans, we tend to choose something that relates personally, like child, car, pet, relatives, sports names etc. The result is a risk of being able to find relevant info about a particular person from sources such as social media or social engineering.

If something gets too complicated we usually write it down – the office "Post It" note problem, a password stuck to the underside of a calculator etc.

Easier ways of adding complexity without memory difficulty are shown in many "hacker" style names on forums, where a number that looks like a character is substituted, or "reversed" like using a "3" for an "e" as in "R3V3RS3" or a "0" for an "o" or "5" for an "s", a "9" for a "p" etc:

"ch0053 y00r 9a55w0rd"

These substitutions have been around for decades so it is reasonable to assume that plenty of lists exist to include these variations, so again where word comparison over character combination can occur, it is of no real security gain – just harder to remember for you.

A bit better way to address usability over complexity may be to use words from another language you are familiar with, or place names from other countries you have visited etc. – easy association for you, more difficult to guess and almost certainly not in standard dictionaries of your native language – if you used the lingual place name – Londres for London etc. but again does not negate pre-compiled hacker lists – and no doubt every language dictionary with variations has been pre-hashed by someone somewhere, in many hash types, so again, it's just a question of hash comparison – easy for fast computers as seen in the last video below.

You see the scale and importance of hacked social media type, large user number systems, if the password files are stolen then the hashes released to the Internet? If other important info about a user – like what bank they use can be found out, and they used the same username and password for that as their social media account…they may just get a Wish You Where Here..? post card from Tahiti – posted after the thief has enjoyed the free holiday…

This brings me to Passphrases over Passwords. They add complexity due to the addition of spaces alone as characters and overall string length, and combined with Capitals and/or special characters, make very robust, but easy to remember strings, depending how written, so a relatively insecure known phrase like:

A Rolling Stone Gathers No Moss

This can then be modified away from the well known phrase depending how your mind works:

A Rolling Stone Gathers No Mick Jagger!

"Who would guess that?" you may think, let alone how it was spelled? But it still uses common, public and potentially guessable or comparable dictionary words which can be attacked using statistical analysis or other methods. Statistical analysis is based on the relative occurrence frequency of particular letters in words of a given language, the most common being the letter "e" in English with the letter "s" the most common used at a word's start.

https://www.oxforddictionaries.com/words/what-is-the-frequency-of-the-letters-of-the-alphabet-in-english

Better to use a phrase with no public aspect at all, and based around something only you would know, such as a personal event – the more obscure to others but easy for you, the better.

So, with a bit of thought it should be simple to create really easy to remember but relatively hard to guess or compare passwords or phrases that even the most powerful computer would find too time consuming to break when they don't appear in dictionary type lists. It may be good to mentally prepare these too for when you are faced with rushed decisions on password creation like creating a job agency account, or a workplace monthly change of password that doesn't just add a 1 or ! to the last one you had etc. This level of similarity can be disallowed in many enforced server OS password options, to stop this prior password similarity so force users to be less predictable.

A simple idea that you could expand on in other character forms may be a numerical keypad layout where you choose numbers dependent on the shape they make when pressed instead of the actual number string, so for a keypad layout, you may make an X shape of 159357:

1 2 3

4 5 6

7 8 9

0

Using these as part of your password would seem random to most people except you.

This next bit should be common sense – keep social media passwords COMPLETELY different to those used for a bank or Paypal account, and use very different ones for each money related service you use but be aware that if a Google, MS or other major service account is hacked and you use the browser password storage option – which is of course very handy – then every site you have used in your chrome or IE browser is potentially compromised even if encrypted, and also depends on how the web service handles passwords:

Why do you think these corporations offer that service, knowing since Snowden the type of relationships that exist between these large corporations and the government agencies? Your security and convenience is not what they are primarily interested in at all, and some would argue that this situation was always the intention of Big Brother – to have the gullible masses willingly give away all their personal if mostly mundane secrets on social media, along with their common passwords to boot, rather than waste agency resources actively attacking systems for it. If you use these browser options, just make sure you don't care enough about anything relating to you being found out – though having a bank account emptied or identity cloned would not be pleasant for anyone.

Needless to say, having a Gdrive or Picasso photo album of child pornography protected only by a password would get you little sympathy from most people for many reasons, and of course, rightly so, stupidity notwithstanding.

For hackers and secret agencies, this comes round to the principle of how much time and effort is required to break a code compared to the intrinsic value of the data accessed and advantage gained – a variation on risk over reward.

This is highlighted specifically in the topical case of the WW2 Enigma Machine – a recent film titled The Imitation Game based around Bletchley Park maths and logic genius Alan Turing, the father of computing theory as the inventor of the Turing Machine – well worth a watch if you like encryption and code breaking, which does a great job of covering the main areas of the complexities faced by the code breakers, when working against the clock.

Their most pressing issue every day – what use is cracking the German codes if the information gained is out of date or there is no time to act on it?

Does it make sense now that the Numberphile video on NSA mail access shows backdoors or known PQ numbers HAD to exist for the agency to have a reasonable chance – despite their resources – of getting easy access to accounts and encrypted traffic within the lifetime of this Universe?:

So, back to John the Ripper. It's good to have an idea what real world PCs can actually do in terms of cracking speed, so with john installed on an AMD64 dual core 2.8GB PC with 6GB RAM and on an old HP Pavilion 32 bit Intel Celery with 1GB RAM:

sudo apt-get install john

I set up 3 new users from Rush, with their first names as passwords:

cat /etc/passwd | grep 100.:

alexlifeson:x:1001:1001:,,,:/home/alexlifeson:/bin/bash
geddylee:x:1002:1002:,,,:/home/geddylee:/bin/bash
neilpeart:x:1003:1003:,,,:/home/neilpeart:/bin/bash

Pointing john at /etc/shadow, the AMD64 cracked all 3 pwords in about 10 secs:

Loaded 5 password hashes with 5 different salts (crypt, generic crypt(3) [?/64])
Press 'q' or Ctrl-C to abort, almost any other key for status
alex (alexlifeson)
neil (neilpeart)
geddy (geddylee)

the Pent32 cracked all 3 pwords in more like 10 minutes – and in a different order with neil first for some reason?

A way to generate some data would be to run john against the same file (as it logs all prior successes to not repeat itself) after changing the pword for the same user by one char extra and graphing the time taken for each successive attempt, to see how much difference each extra char makes to crack time. Starting with user geddy say, as his pword is 5 chars long and took seconds, you could change his pword to geddy0, then geddy00 etc. and time each successive success with:

Loaded 5 password hashes with 5 different salts (crypt, generic crypt(3) [?/32])
Press 'q' or Ctrl-C to abort, almost any other key for status

….yeah, I'm bored now – 6 hours later and the AMD still hasn't got the new password, geddy0…That extra digit makes a big diff to the hash and so the pw security eh…?!

BUT – just to scare ya for passwords of 8 chars or less – if a REALLY fast PC is available: