Machine Learning Projects Using Regression

1.PROBLEM STATEMENT

we have to predict the car prices .
so we have several information about used cars by existing data we are going to predict the data.

COLLECT THE DATA FROM KAGGLE USING THIS LINK kaggle.com/code/mohaiminul101/car-price-pre..

2.WORKFLOW

CAR DATA->DATA PREPROCESSING->TRAIN AND TEST SPLIT->REGRESSION MODEL->TRAIN NEW MODEL

***********************************************************************

Importing the Dependencies

import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.linear_model import Lasso
from sklearn import metrics

Data Collection and Processing

loading the data from csv file to pandas dataframe

car_dataset = pd.read_csv("C:/Users/shravani/Downloads/car data.csv")

inspecting the first 5 rows of the dataframe

car_dataset.head()

Output:

Car_Name	Year	Selling_Price	Present_Price	Kms_Driven	Fuel_Type	Seller_Type	Transmission	Owner
ritz	2014	3.35	5.59	27000	Petrol	Dealer	Manual	0
sx4	2013	4.75	9.54	43000	Diesel	Dealer	Manual	0
ciaz	2017	7.25	9.85	6900	Petrol	Dealer	Manual	0
wagon r	2011	2.85	4.15	5200	Petrol	Dealer	Manual	0
swift	2014	4.6	6.87	42450	Diesel	Dealer	Manual	0

# checking the number of rows and columns
car_dataset.shape

output:(301, 9)

getting some information about the dataset

car_dataset.info()

# checking the number of missing values
car_dataset.isnull().sum()# checking the distribution of categorical data
print(car_dataset.Fuel_Type.value_counts())
print(car_dataset.Seller_Type.value_counts())
print(car_dataset.Transmission.value_counts())

Encoding the Categorical Data

# encoding "Fuel_Type" Column
car_dataset.replace({'Fuel_Type':{'Petrol':0,'Diesel':1,'CNG':2}},inplace=True)

# encoding "Seller_Type" Column
car_dataset.replace({'Seller_Type':{'Dealer':0,'Individual':1}},inplace=True)

# encoding "Transmission" Column
car_dataset.replace({'Transmission':{'Manual':0,'Automatic':1}},inplace=True)

car_dataset.head()

Car_Name	Year	Selling_Price	Present_Price	Kms_Driven	Fuel_Type	Seller_Type	Transmission	Owner
ritz	2014	3.35	5.59	27000	0	0	0	0
sx4	2013	4.75	9.54	43000	1	0	0	0
ciaz	2017	7.25	9.85	6900	0	0	0	0
wagon r	2011	2.85	4.15	5200	0	0	0	0
swift	2014	4.6	6.87	42450	1	0	0	0

Splitting the data and Target

X = car_dataset.drop(['Car_Name','Selling_Price'],axis=1)
Y = car_dataset['Selling_Price']X = car_dataset.drop(['Car_Name','Selling_Price'],axis=1)
Y = car_dataset['Selling_Price']

Splitting Training and Test data

X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size = 0.1, random_state=2)

Model Training

1.Linear Regression

# loading the linear regression model
lin_reg_model = LinearRegression()
lin_reg_model.fit(X_train,Y_train)

Model Evaluation

# prediction on Training data
training_data_prediction = lin_reg_model.predict(X_train)

# R squared Error
error_score = metrics.r2_score(Y_train, training_data_prediction)
print("R squared Error : ", error_score)

output: R squared Error : 0.8799451660493711

Visualize the actual prices and Predicted prices

plt.scatter(Y_train, training_data_prediction)
plt.xlabel("Actual Price")
plt.ylabel("Predicted Price")
plt.title(" Actual Prices vs Predicted Prices")
plt.show()

# prediction on Testing data
test_data_prediction = lin_reg_model.predict(X_test)

# R squared Error
error_score = metrics.r2_score(Y_test, test_data_prediction)
print("R squared Error : ", error_score)

plt.scatter(Y_test, test_data_prediction)
plt.xlabel("Actual Price")
plt.ylabel("Predicted Price")
plt.title(" Actual Prices vs Predicted Prices")
plt.show()

2.Lasso Regression

# loading the linear regression model
lass_reg_model = Lasso()

lass_reg_model.fit(X_train,Y_train)