Thursday, November 14, 2024

Diabetes Prediction Project Using Logistic Regression

 

Diabetes Prediction Project Using Logistic Regression


INTRODUCTION:

Logistic regression is a data analysis technique that uses mathematics to find the relationships between two data factors. It then uses this relationship to predict the value of one of those factors based on the other. The prediction usually has a finite number of outcomes, like yes or no.

CODE 😃👇:

# Import necessary libraries
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix
from sklearn.preprocessing import StandardScaler
from sklearn.impute import SimpleImputer

# Load the dataset
# Replace ‘path_to_csv’ with the path where the dataset is saved
df = pd.read_csv(“path_to_csv/diabetes.csv”)

# Display the first few rows of the dataset
print(“Dataset Preview:\n”, df.head())

# Check for missing values
print(“\nMissing values in the dataset:\n”, df.isnull().sum())

# Replacing zeros with NaN for certain columns where zero values don’t make sense
df[[‘Glucose’, ‘BloodPressure’, ‘SkinThickness’, ‘Insulin’, ‘BMI’]] = df[[‘Glucose’, ‘BloodPressure’, ‘SkinThickness’, ‘Insulin’, ‘BMI’]].replace(0, np.nan)

# Impute missing values with the mean of each column
imputer = SimpleImputer(strategy=’mean’)
df[[‘Glucose’, ‘BloodPressure’, ‘SkinThickness’, ‘Insulin’, ‘BMI’]] = imputer.fit_transform(df[[‘Glucose’, ‘BloodPressure’, ‘SkinThickness’, ‘Insulin’, ‘BMI’]])

# Separate the features (X) and target variable (y)
X = df.drop(“Outcome”, axis=1)
y = df[“Outcome”]

# Split the data into training and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Standardize the feature values
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

# Initialize and train the Logistic Regression model
log_reg_model = LogisticRegression()
log_reg_model.fit(X_train, y_train)

# Make predictions on the test set
y_pred = log_reg_model.predict(X_test)

# Evaluate the model
accuracy = accuracy_score(y_test, y_pred)
print(“\nAccuracy of the Logistic Regression model:”, accuracy)
print(“\nClassification Report:\n”, classification_report(y_test, y_pred))
print(“\nConfusion Matrix:\n”, confusion_matrix(y_test, y_pred))


DATASET LINK 😃👇:

https://www.kaggle.com/datasets/mathchi/diabetes-data-set


SUPPORT ME 😟

FREE C++ SKILLSHARE COURSE

https://skl.sh/3AUpE4C


FREE C SKILLSHARE COURSE

https://skl.sh/3Ynolmw


All Courses 😃👇

https://linktr.ee/Freetech2024


All Products 😃👇

https://linktr.ee/rockstararun


HP Laptop 🤩👇

https://dir.indiamart.com/impcat/hp-laptop.html?utm_source=freetech-xu1ob&utm_medium=affiliate&utm_campaign=1024&utm_content=29&mTd=1

Asus Laptop 🤩👇

https://www.indiamart.com/proddetail/24957009948.html?utm_source=freetech-xu1ob&utm_medium=affiliate&utm_campaign=1024&utm_content=43&mTd=1



No comments:

Post a Comment

SQL INJECTION DETECTION USING RANDOM FOREST CLASSIFIER

  SQL INJECTION DETECTION USING RANDOM FOREST CLASSIFIER