In R, we slice objects in the following three ways:
[
: This always returns an object of the same type as the original and can be used to select more than one element.[[
: This is used to extract elements of list or DataFrame; and can only be used to extract a single element,: the type of the returned element will not necessarily be a list or DataFrame.$
: This is used to extract elements of a list or DataFrame by name and is similar to [[
.Here are some slicing examples in R and their equivalents in pandas:
Let's see matrix creation and selection in R:
>r_mat<- matrix(2:13,4,3) >r_mat [,1] [,2] [,3] [1,] 2 6 10 [2,] 3 7 11 [3,] 4 8 12 [4,] 5 9 13
To select first row, we write:
>r_mat[1,] [1] 2 6 10
To select second column, we use the following command:
>r_mat[,2] [1] 6 7 8 9
Let's now see NumPy array creation and selection:
In [60]: a=np.array(range(2,6)) b=np.array(range(6,10)) c=np.array(range(10,14)) In [66]: np_ar=np.column_stack([a,b,c]) np_ar Out[66]: array([[ 2, 6, 10], [ 3, 7, 11], [ 4, 8, 12], [ 5, 9, 13]])
To select first row, write the following command:
In [79]: np_ar[0,] Out[79]: array([ 2, 6, 10])
To select second column, write the following command:
In [81]: np_ar[:,1] Out[81]: array([6, 7, 8, 9])
Another option is to transpose the array first and then select the column, as follows:
In [80]: np_ar.T[1,] Out[80]: array([6, 7, 8, 9])
Here is an example of list creation and selection in R:
>cal_lst<- list(weekdays=1:8, mth='jan') >cal_lst $weekdays [1] 1 2 3 4 5 6 7 8 $mth [1] "jan" >cal_lst[1] $weekdays [1] 1 2 3 4 5 6 7 8 >cal_lst[[1]] [1] 1 2 3 4 5 6 7 8 >cal_lst[2] $mth [1] "jan"
Series creation and selection in pandas is done as follows:
In [92]: cal_df= pd.Series({'weekdays':range(1,8), 'mth':'jan'}) In [93]: cal_df Out[93]: mthjan weekdays [1, 2, 3, 4, 5, 6, 7] dtype: object In [97]: cal_df[0] Out[97]: 'jan' In [95]: cal_df[1] Out[95]: [1, 2, 3, 4, 5, 6, 7] In [96]: cal_df[[1]] Out[96]: weekdays [1, 2, 3, 4, 5, 6, 7] dtype: object
Here, we see a difference between an R-list and a pandas series from the perspective of the []
and [[]]
operators. We can see the difference by considering the second item, which is a character string.
In the case of R, the []
operator produces a container type, that is, a list containing the string, while the [[]]
produces an atomic type: in this case, a character as follows:
>typeof(cal_lst[2]) [1] "list" >typeof(cal_lst[[2]]) [1] "character"
In the case of pandas, the opposite is true: []
produces the atomic type, while [[]]
results in a complex type, that is, a series as follows:
In [99]: type(cal_df[0]) Out[99]: str In [101]: type(cal_df[[0]]) Out[101]: pandas.core.series.Series
In both R and pandas, the column name can be specified in order to obtain an element.
In R, this can be done with the column name preceded by the $
operator as follows:
>cal_lst$mth [1] "jan" > cal_lst$'mth' [1] "jan"
In pandas, we subset elements in the usual way with the column name in square brackets:
In [111]: cal_df['mth'] Out[111]: 'jan'
One area where R and pandas differ is in the subsetting of nested elements. For example, to obtain day 4 from weekdays, we have to use the [[]]
operator in R:
>cal_lst[[1]][[4]] [1] 4 >cal_lst[[c(1,4)]] [1] 4
However, in the case of pandas, we can just use a double []
:
In [132]: cal_df[1][3] Out[132]: 4
Selecting data in R DataFrames and pandas DataFrames follows a similar script. The following section explains on how we perform multi-column selects from both.
In R, we specify the multiple columns to select by stating them in a vector within square brackets:
>stocks_table[c('Symbol','Price')] Symbol Price 1 GOOG 518.70 2 AMZN 307.82 3 FB 74.90 4 AAPL 109.70 5 TWTR 37.10 6 NFLX 334.48 7 LINKD 219.90 >stocks_table[,c('Symbol','Price')] Symbol Price 1 GOOG 518.70 2 AMZN 307.82 3 FB 74.90 4 AAPL 109.70 5 TWTR 37.10 6 NFLX 334.48 7 LINKD 219.90
In pandas, we subset elements in the usual way with the column names in square brackets:
In [140]: stocks_df[['Symbol','Price']] Out[140]:Symbol Price 0 GOOG 518.70 1 AMZN 307.82 2 FB 74.90 3 AAPL 109.70 4 TWTR 37.10 5 NFLX 334.48 6 LNKD 219.90 In [145]: stocks_df.loc[:,['Symbol','Price']] Out[145]: Symbol Price 0 GOOG 518.70 1 AMZN 307.82 2 FB 74.90 3 AAPL 109.70 4 TWTR 37.10 5 NFLX 334.48 6 LNKD 219.90