Sales prediction based on advertising expenditure
Optimizing the Ridge regression model to predict sales based on advertising expenditure.
In this scenario, we have a dataset that contains information about advertising expenditure in different channels (TV, Radio, and Newspaper) and the corresponding sales. We want to build a regression model that can predict sales based on the advertising expenditure.
This scenario would be useful for a company that wants to predict sales based on their advertising expenditure across different channels. With the optimized Ridge regression model, the company can make informed decisions on how to allocate their advertising budget to maximize sales.
Initially, we visualize the relationship between sales and advertising expenditure in each channel (TV, Newspaper, Radio) using scatter plots and a trend line.
In this data analysis study, a series of steps were taken to build and evaluate a Ridge regression model. First, the data was prepared by splitting it into features (X) and the target variable (y) corresponding to sales. Next, normalization was applied to the numerical features using the StandardScaler scaler, ensuring that all variables were on the same scale for more effective processing. Then, the data was split into training and test sets using the train_test_split function, allowing for evaluation of the model's performance on independent data.
Subsequently, a Ridge regression model was built to make predictions on sales based on advertising expenditure in TV, Radio, and Newspaper. The model's accuracy was evaluated using cross-validation with 5 folds, which helped estimate the model's generalization ability across different subsets of data. To optimize the model, a grid search was performed where different values of alpha, the regularization parameter, were tested. Next, the model was fitted using the training data and the best parameters found during the grid search.
To assess the final accuracy of the model, the Mean Squared Error (MSE) was used on the test data. A lower MSE indicates better predictive capability of the model. Finally, a prediction was made using the trained model and a new sample of features corresponding to advertising expenditure in TV, Radio, and Newspaper. The resulting prediction provides an estimate of sales based on the new sample.

A prediction is made using the trained model and a new sample of features: [230.1, 37.8, 69.2]. These features correspond to the values of advertising expenditure in TV, Radio, and Newspaper. The resulting prediction is 21.36950524, suggesting that, according to the model, sales are expected to be approximately 21.37 units based on the provided advertising expenditure values.
The result of this analysis is a model that can predict sales with reasonable accuracy based on advertising spending. This model could be of great use to companies looking to optimize their advertising spending to maximize sales. However, it is important to bear in mind that this is a simplified model and that in practice, sales can be influenced by many other factors that have not been taken into account in this analysis.