Friday, November 15, 2024

Ecommerce Product Category Classification Project Using Logistic Regression

 

Ecommerce Product Category Classification Project Using Logistic Regression


Introduction:

Logistic regression is a data analysis technique that uses mathematics to find the relationships between two data factors. It then uses this relationship to predict the value of one of those factors based on the other. The prediction usually has a finite number of outcomes, like yes or no.

CODE 😃👇:

Import pandas as pd

from sklearn.model_selection import train_test_split

from sklearn.feature_extraction.text import TfidfVectorizer

from sklearn.linear_model import LogisticRegression

from sklearn.metrics import accuracy_score, classification_report, confusion_matrix

import nltk

from nltk.corpus import stopwords

import re

import string

# Download NLTK stopwords

nltk.download(‘stopwords’)

stop_words = set(stopwords.words(‘english’))

# Step 1: Load the Dataset

# Load the CSV file

data_path = ‘/path/to/amazon_reviews.csv’ # Update with your dataset path

df = pd.read_csv(data_path)

# Sample data structure — keep only relevant columns

# Ensure dataset has columns: ‘product_title’ and ‘category’

df = df[[‘product_title’, ‘category’]]

df = df.dropna()

# Step 2: Data Preprocessing

def clean_text(text):

text = text.lower()

text = re.sub(f’[{re.escape(string.punctuation)}]’, ‘’, text) # Remove punctuation

text = re.sub(r’\d+’, ‘’, text) # Remove numbers

text = ‘ ‘.join([word for word in text.split() if word not in stop_words]) # Remove stopwords

return text

df[‘cleaned_title’] = df[‘product_title’].apply(clean_text)

# Step 3: Vectorization (Convert text data to numerical data)

vectorizer = TfidfVectorizer(max_features=5000)

X = vectorizer.fit_transform(df[‘cleaned_title’]).toarray()

y = df[‘category’]

# Step 4: Split Data into Training and Testing Sets

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Step 5: Train the Model

model = LogisticRegression(max_iter=1000)

model.fit(X_train, y_train)

# Step 6: Make Predictions

y_pred = model.predict(X_test)

# Step 7: Evaluate the Model

accuracy = accuracy_score(y_test, y_pred)

print(f’Accuracy: {accuracy:.4f}’)

print(“Classification Report:\n”, classification_report(y_test, y_pred))

# Confusion matrix (optional)

conf_matrix = confusion_matrix(y_test, y_pred)

print(“Confusion Matrix:\n”, conf_matrix)



DATASET 😃👇

https://www.kaggle.com/datasets/lakritidis/product-classification-and-categorization

SUPPORT ME 😟

FREE C++ SKILLSHARE COURSE

https://skl.sh/3AUpE4C

FREE C SKILLSHARE COURSE

https://skl.sh/3Ynolmw

All Courses 😃👇

https://linktr.ee/Freetech2024

All Products 😃👇

https://linktr.ee/rockstararun

HP Laptop 🤩👇

https://dir.indiamart.com/impcat/hp-laptop.html?utm_source=freetech-xu1ob&utm_medium=affiliate&utm_campaign=1024&utm_content=29&mTd=1

Asus Laptop 🤩👇

https://www.indiamart.com/proddetail/24957009948.html?utm_source=freetech-xu1ob&utm_medium=affiliate&utm_campaign=1024&utm_content=43&mTd=1


No comments:

Post a Comment

SQL INJECTION DETECTION USING RANDOM FOREST CLASSIFIER

  SQL INJECTION DETECTION USING RANDOM FOREST CLASSIFIER