Iris Project - Species classification with ML
🌸 Iris Project – Species Classification with Machine Learning
Project Overview:
This project focuses on the analysis and supervised classification of the well-known Fisher's Iris dataset, aiming to accurately predict the species of iris flowers (Iris setosa, Iris versicolor, and Iris virginica) based on numerical features: sepal length, sepal width, petal length, and petal width.
Objectives
-
Perform exploratory data analysis (EDA) to identify key patterns and relationships between variables.
-
Create interactive 3D visualizations to explore class separability.
-
Build and evaluate machine learning classification models.
-
Optimize model hyperparameters using cross-validation techniques.
Key Tasks Performed
-
Exploratory Data Analysis:
Investigated feature distributions, correlations, and class separability using pandas, seaborn, and matplotlib. -
Interactive 3D Visualization:
Built a 3D scatter plot with plotly, mapping three features and coloring points by species to enhance interpretability. -
Supervised Modeling:
Implemented and compared two classic classification algorithms using scikit-learn:-
Logistic Regression
-
K-Nearest Neighbors (KNN)
-
-
Model Evaluation:
Assessed performance using metrics such as accuracy, precision, recall, F1-score, and confusion matrices. -
Hyperparameter Tuning:
Applied GridSearchCV to optimize KNN parameters (e.g., number of neighbors, weighting scheme, distance metric) and improve generalization.
Results
-
Both models achieved a 93.3% accuracy on the test set.
-
Cross-validation suggested a marginal improvement for the optimized KNN, although this did not translate into significantly better test set performance.
-
Iris setosa was classified with perfect accuracy; minor misclassifications occurred between versicolor and virginica, consistent with their natural overlap.