Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Chapter 1. Getting Started

Faster and efficient information retrieval is the primary objective of most computer programs. Data structures and algorithms help us in achieving the objective by processing and retrieving data faster. Information retrieval can easily be integrated with algorithms to answer inferential questions from data such as:

How are sales increasing over time?
What is customer's arrival distribution over time?
Out of all the customers who visit between 3:00 and 6:00 PM, how many order Asian versus Chinese?
Of all the customers visiting, how many are from the same city?

In the efficient processing of the preceding queries, especially in big data scenarios, data structures and algorithms utilized for data retrieval play a significant role. This book will introduce primary data structures such as lists, queues, and stacks, which are used for information retrieval and trade-off of different data structures. We will also introduce the data structure and algorithms evaluation approach for retrieval and processing of defined data structures.

Algorithms are evaluated based on complexity and efficiency, where complexity refers to an algorithm design which is easy to program and debug, and efficacy ensures that the algorithm is utilizing computer resources optimally. This book will focus on the efficiency part of algorithms using data structures, and the current chapter introduces the importance of data structure and the algorithms used for retrieval of data from data structures.

Introduction to data structure

Moore's law in 1965 observed that the number of transistors per square inch in a dense integrated circuit (IC) had doubled every year since its invention, thus enhancing computational power per computer. He revised his forecast in 1975, stating that the number of transistors would double every 2 years, instead of every year, due to saturation:

Figure 1.1: Moore's law (Ref: data credit - Transistor count, Wikipedia)

Also, although the computational power has been increasing, problem complexity and data sources have also been increasing exponentially over the decade, enforcing the need for efficient algorithms:

Figure 1.2: Increase in size of unstructured data (Ref: Enterprise strategy group 2010)

This explosion in data from 2008 to 2015 has led to a new field of data science where people put in a lot of effort to derive insights using all kinds of datasets such as structured, semi-structured, and unstructured. Thus, to efficiently deal with data scalability, it is important to efficiently store and retrieve datasets. For example, searching for a word in a dictionary would take a long time if the data is randomly organized; thus, sorted list data structures are utilized to ensure a faster search of words. Similarly, searching for an optimal path in a city based on the input location requires data about road network, position, and so on to be stored in the form of geometries. Ideally, even a variable stored as any built-in data type such as character, integer, or float can be considered as a data structure of scalar nature. However, data structure is formally defined as a scheme of organizing related information in a computer so that it can be used efficiently.

Similarly, for algorithms, given sufficient space and time, any dataset can be stored and processed to answer questions of interest. However, selecting the correct data structure will help you save a lot of memory and computational power. For example, assigning a float to integer data type, such as the number of customers coming in a day, will require double the memory. However, in the real-world scenario, computers are constrained by computational resources and space. Thus, a solution can be stated as efficient if it is able to achieve the desired goal in the given resources and time. Thus, this could be used as a cost function to compare the performance of different data structures while designing algorithms. There are two major design constraints to be considered while selecting the data structure:

Analyze the problem to decide on the basic operation that must be supported into the selected data structure, such as inserting an item, deleting an item, and searching for an item
Evaluate the resource constraint for each operation

The data structure is selected depending on the problem scenario, for example, a case where complete data is loaded at the beginning, and there is no change/insertion in data, requires similar data structure. However, similar including deletion into data structure will make data structure implementation more complex.

Tip

Detailed steps to download the code bundle are mentioned in the Preface of this book. Please have a look. The code bundle for the book is also hosted on GitHub at https://github.com/PacktPublishing/R-Data-Structures-and-Algorithms. We also have other code bundles from our rich catalog of books and videos available at https://github.com/PacktPublishing/. Check them out!

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for 1. Getting Started

Create new playlist

Sign In

Sign Up

Chapter 1. Getting Started

Introduction to data structure

Tip

Table of Contents for
1. Getting Started