Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Chapter 10. R and pandas Compared

This chapter focuses on comparing pandas with R, the statistical package on which much of pandas' functionality is modeled. It is intended as a guide for R users who wish to use pandas, and for users who wish to replicate functionality that they have seen in the R code in pandas. It focuses on some key features available to R users and shows how to achieve similar functionality in pandas by using some illustrative examples. This chapter assumes that you have the R statistical package installed. If not, it can be downloaded and installed from here: http://www.r-project.org/.

By the end of the chapter, data analysis users should have a good grasp of the data analysis capabilities of R as compared to pandas, enabling them to transition to or use pandas, should they need to. The various topics addressed in this chapter include the following:

R data types and their pandas equivalents
Slicing and selection
Arithmetic operations on datatype columns
Aggregation and GroupBy
Matching
Split-apply-combine
Melting and reshaping
Factors and categorical data

R data types

R has five primitive or atomic types:

Character
Numeric
Integer
Complex
Logical/Boolean

It also has the following, more complex, container types:

Vector: This is similar to numpy.array. It can only contain objects of the same type.
List: It is a heterogeneous container. Its equivalent in pandas would be a series.
DataFrame: It is a heterogeneous 2D container, equivalent to a pandas DataFrame
Matrix:- It is a homogeneous 2D version of a vector. It is similar to a numpy.matrix.

For this chapter, we will focus on list and DataFrame, which have pandas equivalents as series and DataFrame.

Note

For more information on R data types, refer to the following document at: http://www.statmethods.net/input/datatypes.html.

For NumPy data types, refer to the following document at: http://docs.scipy.org/doc/numpy/reference/generated/numpy.array.html and http://docs.scipy.org/doc/numpy/reference/generated/numpy.matrix.html.

R lists

R lists can be created explicitly as a list declaration as shown here:

>h_lst<- list(23,'donkey',5.6,1+4i,TRUE)
>h_lst
[[1]]
[1] 23

[[2]]
[1] "donkey"

[[3]]
[1] 5.6

[[4]]
[1] 1+4i

[[5]]
[1] TRUE

>typeof(h_lst)
[1] "list"

Here is its series equivalent in pandas with the creation of a list and the creation of a series from it:

In [8]: h_list=[23, 'donkey', 5.6,1+4j, True]
In [9]: import pandas as pd
        h_ser=pd.Series(h_list)
In [10]: h_ser
Out[10]: 0        23
         1    donkey
         2       5.6
         3    (1+4j)
         4      True
dtype: object

Array indexing starts from 0 in pandas as opposed to R, where it starts at 1. Following is an example of this:

In [11]: type(h_ser)
Out[11]: pandas.core.series.Series

R DataFrames

We can construct an R DataFrame as follows by calling the data.frame() constructor and then display it as follows:

>stocks_table<- data.frame(Symbol=c('GOOG','AMZN','FB','AAPL',
                                      'TWTR','NFLX','LINKD'), 
                            Price=c(518.7,307.82,74.9,109.7,37.1,
                                           334.48,219.9),
MarketCap=c(352.8,142.29,216.98,643.55,23.54,20.15,27.31))

>stocks_table
Symbol  PriceMarketCap
1   GOOG 518.70    352.80
2   AMZN 307.82    142.29
3     FB  74.90    216.98
4   AAPL 109.70    643.55
5   TWTR  37.10     23.54
6   NFLX 334.48     20.15
7  LINKD 219.90     27.31

Here, we construct a pandas DataFrame and display it:

In [29]: stocks_df=pd.DataFrame({'Symbol':['GOOG','AMZN','FB','AAPL', 
                                           'TWTR','NFLX','LNKD'],
                                 'Price':[518.7,307.82,74.9,109.7,37.1,
         334.48,219.9],
'MarketCap($B)' : [352.8,142.29,216.98,643.55,
                                                    23.54,20.15,27.31]
                                 })
stocks_df=stocks_df.reindex_axis(sorted(stocks_df.columns,reverse=True),axis=1)
stocks_df
Out[29]:
Symbol  PriceMarketCap($B)
0       GOOG    518.70  352.80
1       AMZN    307.82  142.29
2       FB      74.90   216.98
3       AAPL    109.70  643.55
4       TWTR    37.10   23.54
5       NFLX    334.48  20.15
6       LNKD219.90  27.31

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for 10. R and pandas Compared

Create new playlist

Sign In

Sign Up

Chapter 10. R and pandas Compared

R data types

Note

R lists

R DataFrames

Table of Contents for
10. R and pandas Compared