Phishing Website Prediction Using Random Forest
INTRODUCTION:
Phishing is when attackers send scam emails (or text messages) that contain links to malicious websites. The websites may contain malware (such as ransomware) which can sabotage systems and organisations.
CODE ππ:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix
from sklearn.preprocessing import StandardScaler
# Load dataset
# Replace ‘phishing_data.csv’ with the actual file path
df = pd.read_csv(‘phishing_data.csv’)
# Display dataset information
print(“Dataset Info:”)
print(df.info())
print(“\nFirst 5 Rows of Dataset:”)
print(df.head())
# Handle missing values if there are any
df.fillna(df.mean(), inplace=True)
# Separate features and target
X = df.drop(columns=[‘id’, ‘CLASS_LABEL’]) # Dropping ‘id’ and ‘CLASS_LABEL’ columns
y = df[‘CLASS_LABEL’] # Target variable (1 for phishing, 0 for legitimate)
# Split the data into training and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Scale the features
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)
# Train a Random Forest Classifier
model = RandomForestClassifier(n_estimators=100, random_state=42)
model.fit(X_train, y_train)
# Predict on the test set
y_pred = model.predict(X_test)
# Evaluate the model
accuracy = accuracy_score(y_test, y_pred)
print(f”Accuracy: {accuracy:.2f}”)
print(“\nClassification Report:”)
print(classification_report(y_test, y_pred))
print(“Confusion Matrix:”)
print(confusion_matrix(y_test, y_pred))
DATASET ππ:
https://www.kaggle.com/datasets/shashwatwork/phishing-dataset-for-machine-learning
No comments:
Post a Comment