Thursday, November 23, 2023

Simple Linear Regression - Python Code

 To illustrate a simple linear regression example in Python, we can use synthetic data. Let's create a small dataset that simulates the relationship between engine size (in liters) and fuel efficiency (in miles per gallon) for a set of cars. We'll use the scikit-learn library for the regression analysis and matplotlib for plotting.

Here's a step-by-step guide along with the Python code:

  1. Generate Synthetic Data: Create a dataset of engine sizes and corresponding fuel efficiencies.
  2. Create a Linear Regression Model: Use scikit-learn to fit a linear regression model.
  3. Predict and Plot: Predict fuel efficiency for a range of engine sizes and plot the results.

Python Code

import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score

# Generate Synthetic Data
np.random.seed(0)  # for reproducibility
engine_sizes = np.random.rand(100, 1) * 5  # Engine sizes between 0 and 5 liters
fuel_efficiency = 30 - 3 * engine_sizes + np.random.randn(100, 1) * 2  # MPG

# Create a Linear Regression Model
model = LinearRegression()
model.fit(engine_sizes, fuel_efficiency)

# Predict and Plot
engine_sizes_range = np.linspace(0, 5, 100).reshape(-1, 1)
predicted_efficiency = model.predict(engine_sizes_range)

# Plotting the results
plt.scatter(engine_sizes, fuel_efficiency, color='blue', label='Data Points')
plt.plot(engine_sizes_range, predicted_efficiency, color='red', label='Regression Line')
plt.xlabel('Engine Size (Liters)')
plt.ylabel('Fuel Efficiency (MPG)')
plt.title('Simple Linear Regression Example')
plt.legend()
plt.show()



# Coefficients and Performance Metrics
slope = model.coef_[0][0]
intercept = model.intercept_[0]
mse = mean_squared_error(fuel_efficiency, model.predict(engine_sizes))
r2 = r2_score(fuel_efficiency, model.predict(engine_sizes))

print(f"Slope: {slope}")
print(f"Intercept: {intercept}")
print(f"Mean Squared Error: {mse}")
print(f"R-squared: {r2}")



The Python code successfully generated and analyzed a synthetic dataset using simple linear regression. Here's a summary of the results:

  • Scatter Plot: The blue dots represent the synthetic data points, indicating the relationship between engine size (in liters) and fuel efficiency (in miles per gallon).
  • Regression Line: The red line is the best-fit linear regression line through the data points, showing the predicted relationship.

Regression Equation

From the regression model, we obtained:

  • Slope (b): 3.03. This means for each additional liter in engine size, the fuel efficiency decreases by approximately 3.03 MPG.
  • Intercept (a): 30.44. This represents the predicted fuel efficiency when the engine size is 0 liters.

Performance Metrics

  • Mean Squared Error (MSE): 3.97. This is the average squared difference between the observed actual outcomes and the outcomes predicted by the model.
  • R-squared (R²): 0.83. This value indicates that 83% of the variance in fuel efficiency is explained by the engine size.

No comments:

Post a Comment