Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Benefits of using pandas

The pandas forms a core component of the Python data analysis corpus. The distinguishing feature of pandas is the suite of data structures that it provides, which is naturally suited to data analysis, primarily the DataFrame and to a lesser extent Series (1-D vectors) and Panel (3D tables).

Simply put, pandas and statstools can be described as Python's answer to R, the data analysis and statistical programming language that provides both the data structures, such as R-data frames, and a rich statistical library for data analysis.

The benefits of pandas over using a language such as Java, C, or C++ for data analysis are manifold:

Data representation: It can easily represent data in a form naturally suited for data analysis via its DataFrame and Series data structures in a concise manner. Doing the equivalent in Java/C/C++ would require many lines of custom code, as these languages were not built for data analysis but rather networking and kernel development.
Data subsetting and filtering: It provides for easy subsetting and filtering of data, procedures that are a staple of doing data analysis.

Concise and clear code: Its concise and clear API allows the user to focus more on the core goal at hand, rather than have to write a lot of scaffolding code in order to perform routine tasks. For example, reading a CSV file into a DataFrame data structure in memory takes two lines of code, while doing the same task in Java/C/C++ would require many more lines of code or calls to non-standard libraries, as illustrated in the following table. Here, let's suppose that we had the following data:

Country	Year	CO2 Emissions	Power Consumption	Fertility Rate	Internet Usage Per 1000 People	Life Expectancy	Population
Belarus	2000	5.91	2988.71	1.29	18.69	68.01	1.00E+07
Belarus	2001	5.87	2996.81		43.15		9970260
Belarus	2002	6.03	2982.77	1.25	89.8	68.21	9925000
Belarus	2003	6.33	3039.1	1.25	162.76		9873968
Belarus	2004		3143.58	1.24	250.51	68.39	9824469
Belarus	2005			1.24	347.23	68.48	9775591

In a CSV file, this data that we wish to read would look like the following:

Country,Year,CO2Emissions,PowerConsumption,FertilityRate,InternetUsagePer1000, LifeExpectancy, PopulationBelarus,2000,5.91,2988.71,1.29,18.69,68.01,1.00E+07Belarus,2001,5.87,2996.81,,43.15,,9970260Belarus,2002,6.03,2982.77,1.25,89.8,68.21,9925000...Philippines,2000,1.03,514.02,,20.33,69.53,7.58E+07Philippines,2001,0.99,535.18,,25.89,,7.72E+07Philippines,2002,0.99,539.74,3.5,44.47,70.19,7.87E+07...Morocco,2000,1.2,489.04,2.62,7.03,68.81,2.85E+07Morocco,2001,1.32,508.1,2.5,13.87,,2.88E+07Morocco,2002,1.32,526.4,2.5,23.99,69.48,2.92E+07..

Note

The data here is taken from World Bank Economic data available at: http://data.worldbank.org.

In Java, we would have to write the following code:

public class CSVReader {
public static void main(String[] args) {
        String[] csvFile=args[1];CSVReader csvReader = new csvReader();List<Map>dataTable=csvReader.readCSV(csvFile);}public void readCSV(String[] csvFile){BufferedReader bReader=null;String line="";String delim=",";
  //Initialize List of maps, each representing a line of the csv fileList<Map> data=new ArrayList<Map>();
  try {bufferedReader = new BufferedReader(new   FileReader(csvFile));// Read the csv file, line by linewhile ((line = br.readLine()) != null){String[] row = line.split(delim);Map<String,String> csvRow=new HashMap<String,String>();
           csvRow.put('Country')=row[0]; csvRow.put('Year')=row[1];csvRow.put('CO2Emissions')=row[2]; csvRow.put('PowerConsumption')=row[3];csvRow.put('FertilityRate')=row[4];csvRow.put('InternetUsage')=row[1];csvRow.put('LifeExpectancy')=row[6];csvRow.put('Population')=row[7];data.add(csvRow);
        }
     } catch (FileNotFoundException e) {e.printStackTrace();
     } catch (IOException e) {e.printStackTrace();
    } 
 return data;}

But, using pandas, it would take just two lines of code:

import pandas as pdworldBankDF=pd.read_csv('worldbank.csv')

In addition, pandas is built upon the NumPy libraries and hence, inherits many of the performance benefits of this package, especially when it comes to numerical and scientific computing. One oft-touted drawback of using Python is that as a scripting language, its performance relative to languages like Java/C/C++ has been rather slow. However, this is not really the case for pandas.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Benefits of using pandas

Create new playlist

Sign In

Sign Up

Benefits of using pandas

Note

Table of Contents for
Benefits of using pandas