SVD applied on handwritten digits using scikit-learn

SVD can be applied on the same handwritten digits data to perform an apple-to-apple comparison of techniques.

# SVD 
>>> import matplotlib.pyplot as plt 
>>> from sklearn.datasets import load_digits 
 
>>> digits = load_digits() 
>>> X = digits.data 
>>> y = digits.target 

In the following code, 15 singular vectors with 300 iterations are used, but we encourage the reader to change the values and check the performance of SVD. We have used two types of SVD functions, as a function randomized_svd provide the decomposition of the original matrix and a TruncatedSVD can provide total variance explained ratio. In practice, uses may not need to view all the decompositions and they can just use the TruncatedSVD function for their practical purposes.

>>> from sklearn.utils.extmath import randomized_svd 
>>> U,Sigma,VT = randomized_svd(X,n_components=15,n_iter=300,random_state=42) 
 
>>> import pandas as pd 
>>> VT_df = pd.DataFrame(VT) 
 
>>> print ("
Shape of Original Matrix:",X.shape) 
>>> print ("
Shape of Left Singular vector:",U.shape) 
>>> print ("Shape of Singular value:",Sigma.shape) 
>>> print ("Shape of Right Singular vector",VT.shape) 

By looking at the previous screenshot, we can see that the original matrix of dimension (1797 x 64) has been decomposed into a left singular vector (1797 x 15), singular value (diagonal matrix of 15), and right singular vector (15 x 64). We can obtain the original matrix by multiplying all three matrices in order.

>>> n_comps = 15 
>>> from sklearn.decomposition import TruncatedSVD 
>>> svd = TruncatedSVD(n_components=n_comps, n_iter=300, random_state=42) 
>>> reduced_X = svd.fit_transform(X) 
 
>>> print("
Total Variance explained for %d singular features are %0.3f"%(n_comps, svd.explained_variance_ratio_.sum()))  

The total variance explained for 15 singular value features is 83.4 percent. But the reader needs to change the different values to decide the optimum value.

The following code illustrates the change in total variance explained with respective to change in number of singular values:

# Choosing number of Singular Values 
>>> max_singfeat = 30 
>>> singfeats = [] 
>>> totexp_var = [] 
 
>>> for i in range(max_singfeat): 
...     svd = TruncatedSVD(n_components=i+1, n_iter=300, random_state=42) 
...     reduced_X = svd.fit_transform(X) 
...     tot_var = svd.explained_variance_ratio_.sum() 
...     singfeats.append(i+1) 
...     totexp_var.append(tot_var) 
 
>>> plt.plot(singfeats,totexp_var,'r') 
>>> plt.plot(singfeats,totexp_var,'bs') 
>>> plt.xlabel('No. of Features',fontsize = 13) 
>>> plt.ylabel('Total variance explained',fontsize = 13) 
 
>>> plt.xticks(pcs,fontsize=13) 
>>> plt.yticks(fontsize=13) 
>>> plt.show()

From the previous plot, we can choose either 8 or 15 singular vectors based on the requirement.

The R code for SVD applied on handwritten digits data is as follows:

#SVD    
library(svd)   
   
digits_data = read.csv("digitsdata.csv")   
   
remove_cols = c("target")   
x_data =   digits_data[,!(names(digits_data) %in% remove_cols)]   
y_data = digits_data[,c("target")]   
   
sv2 <- svd(x_data,nu=15)   
   
# Computing the square of the   singular values, which can be thought of as the vector of matrix energy   
# in order to pick top singular   values which preserve at least 80% of variance explained   
energy <- sv2$d ^ 2   
tot_varexp = data.frame(cumsum(energy)   / sum(energy))   
   
names(tot_varexp) = "cum_var_explained"   
tot_varexp$K_value =   1:nrow(tot_varexp)   
   
plot(tot_varexp[,2],tot_varexp[,1],type   = 'o',xlab = "K_Value",ylab = "Prop. of Var Explained")   
title("SVD - Prop. of Var   explained with K-value")    
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset