Introduction
In this article, I’ll go over how we can leverage machine learning to classify the strength of a password. You've probably asked yourself the following questions: "What makes a good password?", "Is my password strong?", "Could someone guess my password?" The National Institue of Standards and Technology (NIST) defines a password as a secret value that is intended to be chosen and easily memorized or recorded by the subscriber. Passwords must consist strength, making it impractical for attackers to guess what it is. Password security has evolved over time from simple watchwords to more complex requirements that enhance protection against unauthorized access. Many companies still follow rule-based systems that contain inconsistencies. For instance, these systems often rely on LUDS, requiring a password to contain at least one lowercase letter, one uppercase character, one digit, and one special character, in addition to being eight or more characters in length. A rule-based system such as this would mistakenly classify a weak and easily guessable password like "P@ssw0rd" as strong. There are important metrics to take into consideration when evaluating what makes a strong password. These include:
- The length of a the password. Longer is stronger.
- The randomness of a password. A password should be random to enhance security.
- The uniqueness of a password.
- The commonality of password
Getting Started
If you're using Google Colab, you can skip this section. If not, to get started make sure you have the following installed on your system:
- Python 3.1.0 or newer
- An integrated development environment of choice (Jupyter Notebook, PyCharm, or VSCode)
- Download the following libraries required for this exercise:
- pandas, zxcvbn, scikit-learn, and numpy
Imports
Copy the following imports into your environment:
import pandas as pd
from zxcvbn import zxcvbn
from pandas import json_normalize
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import classification_report
from sklearn.model_selection import train_test_split
import numpy as np
import warnings
warnings.filterwarnings("ignore", category=UserWarning)
What is zxcvbn?
zxcvbn is a library for measuring password strength, offered by DropBox. zxcvbn's algorithmic approach takes password security to a whole new level. Through pattern matching and conservative estimation, it recognizes common passwords based on human names, popular English words from various sources, and it accounts for other common patterns like dates, repeats, sequences, keyboard patterns, and l33t speak. Based off these estimates the library provides other metrics we could use to train our model. The library provides scores for a given password, returns time estimates on how long it would take to guess the password, etc. zxcvbn was originally created using JavaScript but others have graciously ported the library to different languages. DropBox developers recommend using zxcvbn-python by Daniel Wolf and other amazing contributors.
Data Collection
The dataset that is being used here comes from Here. The passwords from this dataset are from the 000webhost leak that is available online. This dataset has been uploaded to GitHub for easy access. We're only interested in reading the passwords column and not the strength column, as we'll be using the zxcvbn library to extract features from given passwords. Do be advised, there may be some passwords with vulgar language.
df = pd.read_csv("https://raw.githubusercontent.com/dug22/datasets/refs/heads/main/password-strength-dataset.csv", usecols=["password"])
Since the zxcvbn library only works with passwords that have a length of 72 characters or less, we must drop passwords that exceed max length.
df = df[df['password'].str.len() <= 72]
After filtering the dataframe we should be left with 669,634 passwords. Just to verify the shape of the filtered dataframe the following can be done:
df.shape
Output:
(669634, 1)
Next, we'll iterate through each row in the dataframe and append the findings from zxcvbn into an empty array. This will result in an array of dictionaries, with zxcvbn providing crucial information about our passwords. Key details will include the estimated number of guesses required to crack the password, the time it would take to crack it, and more.
results = []
for i, row in enumerate(df.itertuples(index=False), start=1):
results.append(zxcvbn(row.password))
if i % 1000 == 0:
print("Passwords Iterated:", i)
To create a dataframe with our resulting array, we need to pass our array through json_normalize(). The json_normalize() function is used to flatten semi-structured data into a tabular format.
df = json_normalize(results)
df
Sample Output
| password | guesses | guesses_log10 | sequence | calc_time | score | crack_times_seconds.online_throttling_100_per_hour | crack_times_seconds.online_no_throttling_10_per_second | crack_times_seconds.offline_slow_hashing_1e4_per_second | crack_times_seconds.offline_fast_hashing_1e10_per_second | crack_times_display.online_throttling_100_per_hour | crack_times_display.online_no_throttling_10_per_second | crack_times_display.offline_slow_hashing_1e4_per_second | crack_times_display.offline_fast_hashing_1e10_per_second | feedback.warning | feedback.suggestions | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | kzde5577 | 100000001 | 8.000000 | [{'pattern': 'bruteforce', 'token': 'kzde5577', 'i': 0, 'j': 7, 'guesses': 100000000, 'guesses_log10': 8.0}] | 0 days 00:00:00.001859 | 2 | 3600000036.000000199840146431 | 10000000.1 | 10000.0001 | 0.0100000001 | centuries | 4 months | 3 hours | less than a second | [Add another word or two. Uncommon words are better.] | |
| 1 | kino3434 | 1010000 | 6.004321 | [{'pattern': 'bruteforce', 'token': 'kino', 'i': 0, 'j': 3, 'guesses': 10000, 'guesses_log10': 4.0}, {'pattern': 'repeat', 'i': 4, 'j': 7, 'token': '3434', 'base_token': '34', 'base_guesses': 21, 'base_matches': [{'pattern': 'sequence', 'i': 0, 'j': 1, 'token': '34', 'sequence_name': 'digits', 'sequence_space': 10, 'ascending': True, 'guesses': 20, 'guesses_log10': 1.301029995663981}], 'repeat_count': 2.0, 'guesses': 50, 'guesses_log10': 1.6989700043360185}] | 0 days 00:00:00.001334 | 2 | 36360000.00000000201838545877 | 101000 | 101 | 0.000101 | 1 year | 1 day | 2 minutes | less than a second | [Add another word or two. Uncommon words are better.] | |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 669632 | marken22a | 10010000 | 7.000434 | [{'pattern': 'dictionary', 'i': 0, 'j': 3, 'token': 'mark', 'matched_word': 'mark', 'rank': 14, 'dictionary_name': 'male_names', 'reversed': False, 'l33t': False, 'base_guesses': 14, 'uppercase_variations': 1, 'l33t_variations': 1, 'guesses': 50, 'guesses_log10': 1.6989700043360185}, {'pattern': 'bruteforce', 'token': 'en22a', 'i': 4, 'j': 8, 'guesses': 100000, 'guesses_log10': 5.0}] | 0 days 00:00:00.000387 | 2 | 360360000.0000000200039984577 | 1001000 | 1001 | 0.001001 | 11 years | 12 days | 17 minutes | less than a second | [Add another word or two. Uncommon words are better.] | |
| 669633 | fxx4pw4g | 100000001 | 8.000000 | [{'pattern': 'bruteforce', 'token': 'fxx4pw4g', 'i': 0, 'j': 7, 'guesses': 100000000, 'guesses_log10': 8.0}] | 0 days 00:00:00.000321 | 2 | 3600000036.000000199840146431 | 10000000.1 | 10000.0001 | 0.0100000001 | centuries | 4 months | 3 hours | less than a second | [Add another word or two. Uncommon words are better.] |
Cleaning and Preparing the Data
Its time to clean up our data and prepare it for our model to train on. For starters, we need to map our scoring to specific ratings e.g (0 -> weak, 1 -> some what weak, etc) for our target labels
mapping = {
0: "weak",
1: "somewhat weak",
2: "average",
3: "good",
4: "strong"
}
df["label"] = df["score"].map(mapping)
Before training any machine learning model, we first need to identify which pieces of information (features) the model will learn from. In this case, we're using several metrics generated by zxcvbn, such as the estimated number of guesses required to crack the password and multiple cracking-time scenarios (online, offline, throttled, etc.). These metrics capture the complexity of a password from different attack perspectives, which makes them valuable inputs for a classifier.
We start by listing the columns we want to use:
features = [
"guesses_log10",
"crack_times_seconds.online_throttling_100_per_hour",
"crack_times_seconds.online_no_throttling_10_per_second",
"crack_times_seconds.offline_slow_hashing_1e4_per_second",
"crack_times_seconds.offline_fast_hashing_1e10_per_second"
]
Some of our cracking time values returned by zxcvbn are large, even represented as infinity. These larege values will cause instability for our model to learn. In order to fix this we apply logarithmic transformation (log10) to each cracking-time column. This approach compresses large values into managable values for our model to train off of.
df["crack_times_seconds.online_throttling_100_per_hour"] = np.log10(df["crack_times_seconds.online_throttling_100_per_hour"])
df["crack_times_seconds.online_no_throttling_10_per_second"] = np.log10(df["crack_times_seconds.online_no_throttling_10_per_second"])
df["crack_times_seconds.offline_slow_hashing_1e4_per_second"] = np.log10(df["crack_times_seconds.offline_slow_hashing_1e4_per_second"])
df["crack_times_seconds.offline_fast_hashing_1e10_per_second"] = np.log10(df["crack_times_seconds.offline_fast_hashing_1e10_per_second"])
After cleaning and preparing the data, we separate it into X and y variables. X will hold our input features, and our y will hold our target labels.
X = df[features]
y = df["label"]
Dividing our dataset into two subsets (a training set and testing set) is very crucial to evaluate our model's performance on unseen data. We will split our data 80 20. Our training set will contain 80% of our data, and our testing set will contain 20% of our data.
X_train, X_test, y_train, y_test = train_test_split( X, y, test_size=0.2, random_state=42, stratify=y)
Training Our Model
A RandomForestClassifier will be used as our model of choice to train on the cleaned and prepared data. A RandomForestClassifier is an ideal candidate in this scenario, as it builds multiple decision trees on different sub-samples of the dataset and averages their predictions to improve accuracy and reduce overfitting. To implement our model we can simply do the following:
model = RandomForestClassifier(
n_estimators=300, #n_estimators creates 300 trees
random_state=42, #Our seed. This will ensure reproducible results
class_weight="balanced" #Used to handle imbalanced datasets by assigning different weights to different classes
)
model.fit(X_train, y_train)
Evaluating Our Model
Our model performed perfectly according to the given classification report.
y_pred = model.predict(X_test)
print(classification_report(y_test, y_pred))
Output:
precision recall f1-score support
average 1.00 1.00 1.00 37883
good 1.00 1.00 1.00 44349
somewhat weak 1.00 1.00 1.00 25355
strong 1.00 1.00 1.00 26145
weak 1.00 1.00 1.00 195
accuracy 1.00 133927
macro avg 1.00 1.00 1.00 133927
weighted avg 1.00 1.00 1.00 133927
Let's put our model to the test by predicting the strength of multiple passwords based on our data, and see what appropriate strength labels each password gets.
def predict_password_strength(pw, model, features):
z = zxcvbn(pw)
row = {
"guesses_log10": z["guesses_log10"],
"crack_times_seconds.online_throttling_100_per_hour": z["crack_times_seconds"]["online_throttling_100_per_hour"],
"crack_times_seconds.online_no_throttling_10_per_second": z["crack_times_seconds"]["online_no_throttling_10_per_second"],
"crack_times_seconds.offline_slow_hashing_1e4_per_second": z["crack_times_seconds"]["offline_slow_hashing_1e4_per_second"],
"crack_times_seconds.offline_fast_hashing_1e10_per_second": z["crack_times_seconds"]["offline_fast_hashing_1e10_per_second"],
}
X = np.array([row[f] for f in features], dtype=float)
for i in range(1, 4):
X[i] = np.log10(X[i])
pred = model.predict([X])[0]
return pred
passwords = [
"123456", "123456789", "qwerty", "password", "1234567", "12345678",
"12345", "iloveyou", "111111", "123123", "abc123", "qwerty123",
"1q2w3e4r", "admin", "qwertyuiop", "654321", "555555", "lovely",
"7777777", "welcome", "888888", "princess", "dragon", "password1",
"123qwe", "monkey", "letmein", "sunshine", "football", "shadow",
"baseball", "superman", "starwars", "master", "hello123",
"purplecat", "summer2020", "winter2021", "jessica1", "qwerty!@#",
"ilovebooks", "happydays", "banana12", "coffee123", "mountain1",
"mydogspot", "starbright", "winner2022", "seattle99", "trustno1",
"PurpleSunset42", "CoffeeBean2023", "GoldenRetriever12",
"SkylineView88", "OceanBreeze77", "GalaxyNote2024", "ForestTrail55",
"SilverCloud19", "BookLover321", "NightRunner2000",
"correct-horse-battery-staple",
"sunset-lizard-harp-93",
"paper-train-orange-lamp",
"silent-river-candle-night",
"fast-turtle-apple-stone-24",
"supercalifragilisticexpialidocious",
"wN3zQ9pM4fTXbRkx0uJg1hFV"
]
for password in passwords:
prediction = predict_password_strength(password, model, features)
print(f"{password} is a {prediction} password!")
Output:
123456 is a weak password!
123456 is a weak password!
123456789 is a weak password!
qwerty is a weak password!
password is a weak password!
1234567 is a weak password!
12345678 is a weak password!
12345 is a weak password!
iloveyou is a weak password!
111111 is a weak password!
123123 is a weak password!
abc123 is a weak password!
qwerty123 is a weak password!
1q2w3e4r is a weak password!
admin is a weak password!
qwertyuiop is a weak password!
654321 is a weak password!
555555 is a weak password!
lovely is a weak password!
7777777 is a weak password!
welcome is a weak password!
888888 is a weak password!
princess is a weak password!
dragon is a weak password!
password1 is a weak password!
123qwe is a weak password!
monkey is a weak password!
letmein is a weak password!
sunshine is a weak password!
football is a weak password!
shadow is a weak password!
baseball is a weak password!
superman is a weak password!
starwars is a weak password!
master is a weak password!
hello123 is a weak password!
purplecat is a somewhat weak password!
summer2020 is a somewhat weak password!
winter2021 is a average password!
jessica1 is a weak password!
qwerty!@# is a somewhat weak password!
ilovebooks is a average password!
happydays is a somewhat weak password!
banana12 is a somewhat weak password!
coffee123 is a somewhat weak password!
mountain1 is a somewhat weak password!
mydogspot is a average password!
starbright is a somewhat weak password!
winner2022 is a average password!
seattle99 is a somewhat weak password!
trustno1 is a weak password!
PurpleSunset42 is a good password!
CoffeeBean2023 is a strong password!
GoldenRetriever12 is a good password!
SkylineView88 is a good password!
OceanBreeze77 is a good password!
GalaxyNote2024 is a strong password!
ForestTrail55 is a good password!
SilverCloud19 is a good password!
BookLover321 is a good password!
NightRunner2000 is a good password!
correct-horse-battery-staple is a strong password!
sunset-lizard-harp-93 is a strong password!
paper-train-orange-lamp is a strong password!
silent-river-candle-night is a strong password!
fast-turtle-apple-stone-24 is a strong password!
supercalifragilisticexpialidocious is a strong password!
wN3zQ9pM4fTXbRkx0uJg1hFV is a strong password!
Conclusion
In this article, we demonstrate how to use machine learning to classify password strength. By leveraging zxcvbn metrics as input features and mapping them to labeled outcomes, our model was able to learn patterns that depict weak, somewhat weak, average, good, and strong passwords. Employing machine learning can enhance security systems by automatically identifying weak passwords and encouraging stronger choices.
Sources
- NIST Special Publication 800-63B By NIST
- Use Strong Passwords By CISA
- Passwords have a long history – how much do you know…? By Cisco
- Five Algorithms to Measure Real Password Strength By nulab
- zxcvbn Original GitHub Repository By DropBox
- zxcvbn-python GitHub Repository By dwolfhub and other amazing contributing developers
- Password Strength Classifier Dataset By Bhavik Bansal