Instagram Fake Account Detection Using Logistic Regression
INTRODUCTION:
Logistic regression is a data analysis technique that uses mathematics to find the relationships between two data factors. It then uses this relationship to predict the value of one of those factors based on the other. The prediction usually has a finite number of outcomes, like yes or no.
CODE 😃👇 :
# Import necessary libraries
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import classification_report, accuracy_score, confusion_matrix
# Load the dataset
# Replace ‘your_dataset.csv’ with the actual path to your dataset
df = pd.read_csv(‘your_dataset.csv’)
# Display the first few rows of the dataset
print(df.head())
# Check for missing values
print(“\nMissing values:\n”, df.isnull().sum())
# Drop rows with missing values if any
df.dropna(inplace=True)
# Separate features (X) and target variable (y)
X = df.drop(columns=[‘is_fake’])
y = df[‘is_fake’]
# Encode categorical columns if necessary
# Assuming ‘is_private’, ‘is_joined_recently’, ‘has_channel’, etc. are binary features
# If they are not binary, you may need to apply OneHotEncoding for non-binary categorical features
# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Standardize the features
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)
# Initialize and train the Logistic Regression model
log_reg = LogisticRegression()
log_reg.fit(X_train, y_train)
# Make predictions on the test set
y_pred = log_reg.predict(X_test)
# Evaluate the model
print(“\nAccuracy:”, accuracy_score(y_test, y_pred))
print(“\nConfusion Matrix:\n”, confusion_matrix(y_test, y_pred))
print(“\nClassification Report:\n”, classification_report(y_test, y_pred))
# Optional: Display coefficients to understand feature importance
feature_importance = pd.DataFrame({
‘Feature’: X.columns,
‘Coefficient’: log_reg.coef_[0]
}).sort_values(by=’Coefficient’, ascending=False)
print(“\nFeature Importance:\n”, feature_importance)
No comments:
Post a Comment