To illustrate a simple linear regression example in Python, we can use synthetic data. Let's create a small dataset that simulates the relationship between engine size (in liters) and fuel efficiency (in miles per gallon) for a set of cars. We'll use the scikit-learn
library for the regression analysis and matplotlib
for plotting.
Here's a step-by-step guide along with the Python code:
- Generate Synthetic Data: Create a dataset of engine sizes and corresponding fuel efficiencies.
- Create a Linear Regression Model: Use
scikit-learn
to fit a linear regression model. - Predict and Plot: Predict fuel efficiency for a range of engine sizes and plot the results.
Python Code
import numpy as np import matplotlib.pyplot as plt from sklearn.linear_model import LinearRegression from sklearn.metrics import mean_squared_error, r2_score # Generate Synthetic Data np.random.seed(0) # for reproducibility engine_sizes = np.random.rand(100, 1) * 5 # Engine sizes between 0 and 5 liters fuel_efficiency = 30 - 3 * engine_sizes + np.random.randn(100, 1) * 2 # MPG # Create a Linear Regression Model model = LinearRegression() model.fit(engine_sizes, fuel_efficiency) # Predict and Plot engine_sizes_range = np.linspace(0, 5, 100).reshape(-1, 1) predicted_efficiency = model.predict(engine_sizes_range) # Plotting the results plt.scatter(engine_sizes, fuel_efficiency, color='blue', label='Data Points') plt.plot(engine_sizes_range, predicted_efficiency, color='red', label='Regression Line') plt.xlabel('Engine Size (Liters)') plt.ylabel('Fuel Efficiency (MPG)') plt.title('Simple Linear Regression Example') plt.legend() plt.show()
# Coefficients and Performance Metrics slope = model.coef_[0][0] intercept = model.intercept_[0] mse = mean_squared_error(fuel_efficiency, model.predict(engine_sizes)) r2 = r2_score(fuel_efficiency, model.predict(engine_sizes)) print(f"Slope: {slope}") print(f"Intercept: {intercept}") print(f"Mean Squared Error: {mse}") print(f"R-squared: {r2}")
The Python code successfully generated and analyzed a synthetic dataset using simple linear regression. Here's a summary of the results:
- Scatter Plot: The blue dots represent the synthetic data points, indicating the relationship between engine size (in liters) and fuel efficiency (in miles per gallon).
- Regression Line: The red line is the best-fit linear regression line through the data points, showing the predicted relationship.
Regression Equation
From the regression model, we obtained:
- Slope (b): . This means for each additional liter in engine size, the fuel efficiency decreases by approximately 3.03 MPG.
- Intercept (a): . This represents the predicted fuel efficiency when the engine size is 0 liters.
Performance Metrics
- Mean Squared Error (MSE): . This is the average squared difference between the observed actual outcomes and the outcomes predicted by the model.
- R-squared (R²): . This value indicates that 83% of the variance in fuel efficiency is explained by the engine size.
No comments:
Post a Comment