Chapter 3. Data Structures and Manipulation

Most of the time that you spend in programming, you do something to manipulate data. You process properties of data, derive conclusions based on the data, and change the nature of the data. In this chapter, we will take an exhaustive look at various data structures and data manipulation techniques in JavaScript. With the correct usage of these expressive constructs, your programs will be correct, concise, easy to read, and most probably faster. This will be explained with the help of the following topics:

  • Regular expressions
  • Exact match
  • Match from a class of characters
  • Repeated occurrences
  • Beginning and end
  • Backreferences
  • Greedy and lazy quantifiers
  • Arrays
  • Maps
  • Sets
  • A matter of style

Regular expressions

If you are not familiar with regular expressions, I request you to spend time learning them. Learning and using regular expressions effectively is one of the most rewarding skills that you will gain. During most of the code review sessions, the first thing that I comment on is how a piece of code can be converted to a single line of regular expression (or RegEx). If you study popular JavaScript libraries, you will be surprised to see how ubiquitous RegEx are. Most seasoned engineers rely on RegEx primarily because once you know how to use them, they are concise and easy to test. However, learning RegEx will take a significant amount of effort and time. A regular expression is a way to express a pattern to match strings of text. The expression itself consists of terms and operators that allow us to define these patterns. We'll see what these terms and operators consist of shortly.

In JavaScript, there are two ways to create a regular expression: via a regular expression literal and constructing an instance of a RegExp object.

For example, if we wanted to create a RegEx that matches the string test exactly, we could use the following RegEx literal:

var pattern = /test/;

RegEx literals are delimited using forward slashes. Alternatively, we could construct a RegExp instance, passing the RegEx as a string:

var pattern = new RegExp("test");

Both of these formats result in the same RegEx being created in the variable pattern. In addition to the expression itself, there are three flags that can be associated with a RegEx:

  • i: This makes the RegEx case-insensitive, so /test/i matches not only test, but also Test, TEST, tEsT, and so on.
  • g: This matches all the instances of the pattern as opposed to the default of local, which matches the first occurrence only. More on this later.
  • m: This allows matches across multiple lines that might be obtained from the value of a textarea element.

These flags are appended to the end of the literal (for example, /test/ig) or passed in a string as the second parameter to the RegExp constructor (new RegExp("test", "ig")).

The following example illustrates the various flags and how they affect the pattern match:

var pattern = /orange/;
console.log(pattern.test("orange")); // true

var patternIgnoreCase = /orange/i;
console.log(patternIgnoreCase.test("Orange")); // true

var patternGlobal = /orange/ig;
console.log(patternGlobal.test("Orange Juice")); // true

It isn't very exciting if we can just test whether the pattern matches a string. Let's see how we can express more complex patterns.

