As mentioned in the Technical requirements section, the dataset can be download from the UCI website directly. Now, let's use the pandas pd.read_csv() method to load the dataset into the Python environment. By now, this operation should be relatively easy and intuitive:
- We start by loading the pandas library and create two different dataframes, namely, df_red for holding the red wine dataset and df_white for holding the white wine dataset:
import pandas as pd
df_red = pd.read_csv("https://archive.ics.uci.edu/ml/machine-learning-databases/wine-quality/winequality-red.csv", delimiter=";")
df_white = pd.read_csv("https://archive.ics.uci.edu/ml/machine-learning-databases/wine-quality/winequality-white.csv", delimiter=";")
- We have two dataframes created. Let's check the name of the available columns:
df_red.columns
Furthermore, the output of the preceding code is given here:
Index(['fixed acidity', 'volatile acidity', 'citric acid', 'residual sugar',
'chlorides', 'free sulfur dioxide', 'total sulfur dioxide', 'density',
'pH', 'sulphates', 'alcohol', 'quality'],
dtype='object')
As shown in this output, the dataset contains the following columns:
- Fixed acidity: It indicates the amount of tartaric acid in wine and is measured in g/dm3.
- Volatile acidity: It indicates the amount of acetic acid in the wine. It is measured in g/dm3.
- Citric acid: It indicates the amount of citric acid in the wine. It is also measured in g/dm3.
- Residual sugar: It indicates the amount of sugar left in the wine after the fermentation process is done. It is also measured in g/dm3.
- Free sulfur dioxide: It measures the amount of sulfur dioxide (SO2) in free form. It is also measured in g/dm3.
- Total sulfur dioxide: It measures the total amount of SO2 in the wine. This chemical works as an antioxidant and antimicrobial agent.
- Density: It indicates the density of the wine and is measured in g/dm3.
- pH: It indicates the pH value of the wine. The range of value is between 0 to 14.0, which indicates very high acidity, and 14 indicates basic acidity.
- Sulphates: It indicates the amount of potassium sulphate in the wine. It is also measured in g/dm3.
- Alcohol: It indicates the alcohol content in the wine.
- Quality: It indicates the quality of the wine, which is ranged from 1 to 10. Here, the higher the value is, the better the wine.
Having discussed different columns in the dataset, let's now see some basic statistics of the data in the next section.