Chapter 3. Data Types and Types of Data

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Chapter 3

Data Types and Types of Data

This chapter covers Objective 1.2 (Compare and contrast different data types) of the CompTIA Data+ exam and includes the following topics:

Introduction to data types
Different data types compared and contrasted
Discrete vs. continuous data
Categorical vs. dimension data
Types of data audio, video, and images

For more information on the official CompTIA Data+ exam topics, see the Introduction.

This chapter covers topics related to data types and types of data. It is essential to understand the types of data that are available to you as well as what a data type is. This chapter introduces additional types of data, discrete vs. continuous data, and categorial and dimensional data types. Finally, this chapter looks at various types of data, including images, audio, and video.

Introduction to Data Types

CramSaver

If you can correctly answer these questions before going through this section, save time by completing the Cram Quiz at the end of the section

1. What is the minimum length of any data type?

a. 1 byte

b. 2 bytes

c. 4 bytes

d. 8 bytes

2. Which of the following are defined as reserved locations for memory to store the data values?

a. Arrays

b. Variables

c. Strings

d. Pointers

3. Which of the following is the storage size of the double data type?

a. 12 bytes

b. 8 bytes

c. 4 bytes

d. 2 bytes

Answers

1. Answer: a. 1 byte. The minimum length of any data type is 1 byte (that is, 8 bits).

2. Answer: b. Variables. Variables are reserved locations in memory for storing values; for example, a user who creates a variable allocates certain memory space. Based on the variable type of data, the OS allots memory and determines what can be placed in reserved memory.

3. Answer: b. 8 bytes. The storage space required for a double data type is 64 bits (that is, 8 bytes). Such a data type may have between a 1.7 e-038 minimum value and a 1.7 e+038 maximum value.

Think of the beautiful images that you see on your phone or computer or the audio files that contain your favorite music. How is such data interpreted and stored or retrieved by machines? In a computer, even complex data consists of some basic types of data or data types. There are several kinds of data types available for storing and retrieving data. They are the building blocks of modern computing systems and the foundation for data analysis.

Note

Don’t forget that the main aim of computer programs is to extract, process, and store data. Therefore, organized data can have a profound impact on the memory needs and running time of a program leveraging the data.

A data type can be defined as:

A number of values (or a single finite value) along with rules set for varied operations
A data classification that instructs the interpreter or compiler on the use of data
What and how the data is inserted into the programming language
A system for denoting functions and variables of varied types of data

Data types can be of various lengths. The minimum length of a data type is 1 byte (that is, 8 bits). Every type of data has a default value. The data type determines how you insert data into a database or how it is leveraged in a programming language.

These are the basic data types:

Character
Float
Integer
Double
String

Note

There are variations to these basic data types, such as long float and short integer.

Storage Sizes of Various Data Types

Different data types tend to have different sizes. The size of a data type depends on the compiler or system architecture (for example, 32-bit or 64-bit architecture). Table 3.1 lists the storage sizes of various data types.

TABLE 3.1 Storage Sizes of Various Data Types

Data Type	Storage Size
Character	1 byte
Integer	2 or 4 bytes
Float	4 bytes
Double	8 bytes
String	1 byte per character

The following sections cover these data types in more detail.

Character

The character data type is used for an individual value. It can include numeric digits (that is, 0 through 9), upper- and lowercase letters (that is, a through z or A through Z), and special characters and symbols (such as . , ; and :).

Note

The values of characters are represented as ASCII.

You use the keyword char to denote the character data type. For example, in MySQL, the character data type is defined as CHAR(value). For example, CHAR(5) would create a character data type with five characters (for example, ABCDE or 12345 or A1B3C).

There are two types of character data types: signed char and unsigned char. Whereas the signed char type can store zero, positive, and negative integer values, the unsigned char type can store only non-negative integer values.

The next section covers the topic of integer data type.

Integer

Integers are the whole numbers that can have negative, zero, and positive values. An integer cannot include decimal places or fractional parts. Examples of integers are 1, 6, 9, and 99.

Note

An integer is a set of binary bits in a computer program.

The size of an integer data type is usually 4 bytes.

The integer data type is created using the keyword int. For example, in MySQL, the integer data type is defined as INT(value).

The next section covers the topic of float and double data types.

Float and Double

A number that includes a fractional and/or decimal portion is known as a floating point number (or a float). A floating point number may include a decimal portion (for example, 0.1, 3.15, 7.3, and 130.5) or may be a fraction (such as 1/10, 7/30, or 9/90).

Usually, the keyword float is used to indicate a floating point number. In MySQL, you create a float by using the syntax FLOAT(value).

Note

Floating point numbers are real numbers, but for real-life values, approximations may be used for floats. For example, in discussing distance in meters, whereas a computer will use a precise value such as 1.01 m, humans might round off to 1 m. For computers, every decimal value is important as these values can change outcomes greatly.

There’s an alternative to float: the double data type. While a float data type occupies 32 bits, a double data type occupies 64 bits. The major difference between the two is precision; that is, a double has 15 decimal digits of precision, whereas a float has only 7. The digits of precision refer to total number of digits, including the decimal places. For example, a float data type might show the decimal value 1.123456, whereas a double data type can show the same value as 1.12345671234567. As you can see, a double data type gives much more precise results when performing calculations.

With MySQL, a double data type would be created as follows:

DOUBLE(p,s)

where p is the total size, or precision, of the number, and s is the scale, or the number of digits shown after the decimal point. (You’ll learn more about precision and scale later in this chapter.)

Note

A double data type has twice the precision of a floating point number and uses double the space in memory.

The next section covers the topic of array data type.

Array

An array is a linear data structure that comprises of a set of data elements of similar data type. An array is stored in a contiguous memory location. For example, in C++ you can define an array as follows:

int array[10] = { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 };

In this case, the array has been defined as type integer with 10 values, 0 to 9.

The next section covers the topic of string data type.

String

A string data type contains alphanumeric data—that is, letters, numbers, spaces, and other symbols. A real-world example of a string is plaintext. For example, “Data Packet” and “1001 Data packets” are both string data types.

Note

Whereas floating point, double, and integer data types are used for numerals, the string data type is used for text. A number can be set as a string, but if it is, numeric calculations cannot be performed on it as it is treated as text.

Although a string can include numbers as well as letters and special characters, it is handled as text. In SQL, strings can be represented by CHAR, VARCHAR, and TEXT.

The next section contrasts and compares the various data types.

Cram Quiz

Answer these questions. If you cannot answer these questions correctly, consider reading this section again until you can.

1. The word “School” and the phrase “I go to college” are _____ data types.

a. string

b. character

c. integer

d. char

2. True or false: A number that includes a fractional or decimal portion is a floating point number.

a. True

b. False

3. What is the decimal place precision of the float data type?

a. 7

b. 10

c. 13

d. 14

Cram Quiz Answers

1. Answer: a. string. Both the word “School” and the phrase “I go to college” are string data types.

2. Answer: a. True. A number that includes a fractional or decimal portion is a floating point number. For example, 3.15 and 130.5 are floating point numbers.

3. Answer: a. 7. A float has 7 decimal places of precision, whereas a double has 15 places of precision.

Comparing and Contrasting Different Data Types

CramSaver

If you can correctly answer these questions before going through this section, save time by skimming the Exam Alerts in this section and then completing the Cram Quiz at the end of the section.

1. True or false: VARCHAR and TEXT can both have a maximum of 65,535 characters.

a. True

b. False

2. The alphanumeric data type can contain which of the following? (Choose all that apply.)

a. Letters

b. Numeric values

c. Whitespace

d. Special symbols

3. The _____ data type is used for non-integer constants.

a. float

b. numeric

c. integer

d. constant

4. What is the result of applying money_used(5,1) to the number 56789 in context to precision value?

a. 556789

b. 5678.9

c. 5.67890

d. .56789

Answers

1. Answer: a. True. Both VARCHAR and TEXT can store a maximum of 65,535 characters.

2. Answer: a. Letters, b. Numeric values, c. Whitespace, d. Special symbols. The alphanumeric data type can consist of letters, numeric values, whitespace, and special symbols.

3. Answer: b. numeric . The numeric data type is used for non-integer constants.

4. Answer: b. 5678.9. Precision is the total number of digits, whereas scale is the number of digits after the decimal place. Therefore, given the number 56789, if the precision value is 5 and the scale value is 1, the result is 5678.9.

ExamAlert

Expect the CompTIA Data+ exam to test you on date, numeric, alphanumeric, currency, and text data types in the context of the exam objectives and in relation to real-world applications.

This section focuses on different data formats and types and covers their features and how they are leveraged. These data formats and types are the basic elements that build a database and that are leveraged as inputs/outputs in computer programs.

We will start by covering the details around date data type.

Date

As its name suggests, the date data type reflects date information. The type of data in this field is formatted in a particular way, and the format depends on the way the computer is set up in terms of region and user preference, as well as the software in use. A date field is used, for example, to capture initiation/termination dates, creation or order dates, and follow-up dates.

The following are some of the examples of date data formats:

mm/dd/yyyy

dd/mm/yyyy

dd/mm/yy

m/d/yy

d/m/yy

As you can see in these examples, date data includes year, month, and day values. A date data type stores a date from a calendar as an integer value.

Figure 3.1 illustrates how date data can be formatted on a Windows PC.

Images — Figure 3.1 Date Formatting on a Windows PC

Fun Fact

Try to set a date beyond 9999 A.D. on a PC.

Let’s consider an example. In the date 10/15/2007, 10 is the month value (which can range from 1 to 12), 15 is the day of the month (which can range from 1 to 31), and 2007 is the year (which can range from 0001 to 9999).

The values of a date data type have independent output and input formats. This means that a user can enter date data values in one style and handle them in a different style.

Note

The date data type manages years from 1 A.D. to 9999 A.D. according to the Gregorian calendar system.

The next section covers the topic of alphanumeric data type.

Alphanumeric

The alphanumeric data type is restricted to numeric and alphabetic characters—that is, letters, numbers, whitespace, and some common symbols. For a database that expects alphanumeric entries, a user can enter any name that includes both letters and numbers (for example, Happy2022, CompTIA Data Exam 2022).

Note

The alphanumeric character set includes punctuation marks in addition to upper-and lowercase letters.

The alphanumeric field type is helpful in describing the input that can be entered in a field (for example, an alphanumeric password).

What happens if non-alphanumeric characters are entered in an alphanumeric field (in the absence of data validation)? Well, in this case, the non-alphanumeric characters are considered symbols, which are handled as whitespace.

The next section covers the topic of numeric data type.

Numeric

Numeric data consists of just numbers. It can be represented using small integer, integer, float, and double data types.

Numeric data is broadly classified into two types: approximate and exact. Approximate numeric data can be stored in the floating point data type, and exact numeric data can be stored in the decimal and integer data types.

ExamAlert

It is important to understand the key difference between exact and approximate. With exact, all the values in the data type range can be represented exactly with adjusting precision and scale. With approximate, not all values in the data type range can be represented exactly.

In SQL, exact data types can be represented as NUMERIC(p,s) and DECIMAL(p,s), where p is precision and s is scale.

What are precision and scale? Essentially, precision is the total number of essential digits that the data type stores or the number of digits both before and after the decimal point. Scale is the number of decimal places to the right of the decimal point. The scale must be less than or equal to the precision.

In a real-world application, you would see numeric data leveraged for things like money, so that you could insert a value into a table with a certain precision. For example, say that you use the following notation to set the precision to 9 and the scale to 4:

money_balance numeric(9,4)

This allows numbers such as 11111.1111 and 99999.9999, which both have precision of 9 and scale of 4.

The next section covers the topic of text data type.

Text

The text data type stores any text data. It can include both multibyte and single-byte characters that are defined by the locale. The text type of field is used for alphanumeric data (that is, letters, numerals, symbols, and whitespace). This type of field is the least restrictive type of field in a database.

Text data can be broadly categorized into three categories: TEXT, MEDIUMTEXT, and LONGTEXT. Whereas TEXT can support up to 65,535 characters, MEDIUMTEXT can store strings up to 16 MB, and LONGTEXT can store strings up to 4 GB.

In addition, you have another option for storing more characters than CHAR supports. For example, in MySQL you can use VARCHAR (which stands for variable CHAR). VARCHAR supports 65,535 characters, but while the text field is fixed at that length, you can actually define a variable field between 0 and 65,535 characters by using VARCHAR. You might, for example, use VARCHAR for storing a few strings and use TEXT for storing paragraphs.

The next section covers the topic of currency data type.

Currency

Currency data is numeric monetary data that is formatted using a currency symbol (such as $ or €) and two decimal places. Currency variables are 64-bit numbers in integer format scaled by 10,000 to present a fixed-point number with 15 digits to the left of the decimal point and 4 digits to the right of the decimal point. A currency field permits users to enter the values in the currencies of various countries.

Figure 3.2 shows the currency settings in Windows that you can set for a particular region.

Microsoft Access has a field called Currency that holds up to 15 digits before the decimal point and 4 digits after. SQL, on the other hand, has functions such as money (8 bytes) or smallmoney (4 bytes).

The next section gives insight to the topics of categorical vs. dimensional and discrete vs. continuous data types.

Cram Quiz

Answer these questions. If you cannot answer these questions correctly, consider reading this section again until you can.

1. True or false: Currency can only be shown with the symbol $ in a database.

a. True

b. False

2. What is the storage size of the LONGTEXT data type?

a. 1 KB

b. 1 MB

c. 16 MB

d. 4 GB

3. Numeric data can be classified into which of the following? (Choose two.)

a. Exact

b. Approximate

c. Continuous

d. Dimensional

Cram Quiz Answers

1. Answer: b. False. Currency can be shown in the local currency or in any other currency.

2. Answer: d. 4 GB. Text data can be broadly categorized into three categories: TEXT, MEDIUMTEXT, and LONGTEXT. Whereas TEXT can support up to 65,535 characters, MEDIUMTEXT can store strings up to 16 MB, and LONGTEXT can store strings up to 4 GB.

3. Answer: a. Exact, b. Approximate. Numeric data can be broadly classified into two types: approximate and exact.

Categorical vs. Dimension and Discrete vs. Continuous Data Types

CramSaver

If you can correctly answer these questions before going through this section, save time by completing the Cram Quiz at the end of the section.

1. Categorical data can be represented using which of the following? (Choose all that apply.)

a. Bar graph

b. Pie chart

c. Bar chart

d. Line graph

e. None of these options are correct.

2. Discrete data can be best represented using which of the following?

a. Bar graph

b. Plot graph

c. Line graph

d. Scatter plot graph

3. Continuous data is best represented using which of the following?

a. Line graph

b. Scatter plot graph

c. Bar graph

d. All of these answers are correct.

Answers

1. Answer: a. Bar graph, b. Pie chart. Categorical data can be represented using bar graphs and pie charts. Bar graphs show categorical data using bars, with gaps between the bars, and pie charts show categorical data in a pie fashion, with each category occupying a piece of pie.

2. Answer: d. Scatter plot graph. Discrete data can be represented using a scatter plot graph, which shows the relationship between two or more numeric variables.

3. Answer: a. Line graph. Continuous data can be best represented using a line graph, which is estimated on a scale with several possible values.

This section covers the categorical and dimension data types.

Categorical/Dimension Data Types

The categorical data type represents variables with two or more categories or classifications. At times it is also used for data that can be identified by groups of observations that share a similar trait. Figure 3.3 shows categories of sales across infrastructure, server, database, SaaS, and PaaS as a pie chart; the same data could instead be shown using a bar graph or histogram.

Another example of categorical data is survey data. Say that a survey is launched to capture responses about shopping experiences. This type of data would not have any numeric values; rather, it would be qualitative in nature (for example, what is working for the customers, what can be done better to create a good customer experience while people shop).

Categorical data can be classified into nominal and ordinal data. Nominal data implies named categories—like SaaS, PaaS, database, servers, and infrastructure in Figure 3.3. Ordinal data is ranked. For example, in the survey example, customers can be asked to provide feedback on a scale of 0 to 5, where 0 is worst service and 5 is best service; these ordinal values from the survey reflect areas of excellence and improvement, without absolute meaning assigned to each value (that is, each value is a label rather than a number that can be used in calculations).

Dimensions (such as names or group names or scale values) can be used with categorical data that contains qualitative values. Dimensions are typically leveraged to appropriately group/segment data as well as to understand details.

The next section gives an overview to the topic of discrete vs. continuous data.

Discrete vs. Continuous Data Types

The discrete data type is a numerical data type that includes numbers with fixed and specific values. Discrete data can comprise the values that are not divisible (absolute values) and that are presented as a set of incremental values. Examples of discrete data are as follows:

The number of students in a class can only be an absolute whole value, such as 20, 25, 30, or 35, and can’t be 22.5, 30.5, or 35.1.
Standard shoe sizes can be 7, 7.5, 8, 8.5, 9, and so on but can’t be 7.3 or 7.7.
A software program can have 100 or 200 or more lines of code; however, it cannot have 100.85 or 200.90 lines of code.

Figure 3.4 uses a scatter plot to illustrate discrete data regarding software lines of code.

As you have seen in earlier examples, discrete values are discontinuous and have definite boundaries. Continuous data, on the other hand, involves different data values that are estimated over a particular interval of time. Continuous data can include any values. These values can be abstract and represent divisible (fractional) values. For example:

The speed of wind measured over a week can be 30.4 km/hr, 50.8 km/hr, or 65 km/hr.
Daily temperature measured in degrees Celsius can be 22, 30, 28.2, 31.4, and so on.
Courier box dimensions can be 10×10×10 cm, 10×10.5×10.8 cm, or 10×10×20.5 cm.
The height of students in a university can be 4′10″, 5′10″, 5′11″, 6′7″, and so on.

Continuous data cannot be counted in absolute terms, however, but can be measured over a period of time. Continuous data is best shown on a line graph or chart, which makes it possible to show how data values change in a given time frame. For example, Figure 3.5 shows a line graph that illustrates the relationship between age and height of children in a school. As you can see, this representation makes it much easier to interpret the data.

Cram Quiz

Answer these questions. If you cannot answer these questions correctly, consider reading this section again until you can.

1. What type of data does the following graph illustrate?

a. Continuous data

b. Discrete data

c. Noncontinuous data

d. Nominal data

2. The number of students in a class can only be an absolute whole value such as 20, 25, 30, or 35. What type of data is this?

a. Continuous data

b. Discrete data

c. Cumulative data

d. Nominal data

3. Categorical data can be classified into which two types of data? (Choose two.)

a. Exact

b. Approximate

c. Nominal

d. Ordinal

Cram Quiz Answers

1. Answer: a. Continuous data. This figure shows continuous data represented using a line chart.

2. Answer: b. Discrete data. The discrete data type is a numeric data type that includes numbers with fixed and specific values.

3. Answer: c. Nominal, d. Ordinal. Categorical data can be classified into nominal and ordinal data.

Types of Data: Audio, Video, and Images

CramSaver

If you can correctly answer these questions before going through this section, save time by completing the Cram Quiz at the end of the section.

1. Image, audio, and video signals are candidates for ______________.

a. data cleansing

b. data compression

c. data classification

d. All of these answers are correct.

2. Video data usually exists as what type of analog signals?

a. Discrete

b. Continuous

c. Logarithmic

d. None of these options are correct.

3. An image is explained in terms of which of the following?

a. Raster graphics

b. Vector graphics

c. Raster graphics and vector graphics

d. Raster graphics or vector graphics

4. Which of the following file formats is recommended for good-quality sound?

a. Lossless

b. Compressed

c. Lossy

d. All of these answers are correct.

5. Which of the following is an open source container format used for storing audio and video data?

a. MP3

b. OGG

c. WAV

d. All of these answers are correct.

Answers

1. Answer: b. Data compression. Video and audio signals can be compressed during transmission as well as during storage to save bandwidth.

2. Answer: b. Continuous. The video type of data usually exists as continuous analog signals, and video data is generally stored as a set of bits in computer memory or on a hard disk.

3. Answer: d. Raster graphics or vector graphics. An image is explained in terms of raster graphics or vector graphics, and it is referred to as a bitmap in raster form.

4. Answer: a. Lossless. The best format for good-quality sound is the lossless audio file format. Lossless files are regarded as high resolution.

5. Answer: b. OGG. OGG is an open source container format used to store audio and video data.

Audio, images, and video together are known as multimedia. There has been a great transformation in multimedia across the past few decades, from black-and-white TV programs to visually appealing standard definition (SD) graphics to modern ultra-high definition (UHD) and 4K graphics and videos. Let’s take a peek into the world of multimedia data, starting with audio.

Audio

The sounds that you hear using your ears as sensory organs occur in the form of analog signals. Early audio systems were analog; for example, conventional tape recorder and gramophone technologies captured sound waves and stored them in analog format on magnetic tapes and vinyl records. Audio data that is being recorded, read, retrieved, interpreted, or compressed has unique requirements.

Note

This section focuses specifically on digital media—that is, digital audio, digital video, and images. This is primarily because, when data is stored in computer systems or storage media, it is stored as digital data rather than as analog data.

Because computers are digital devices, it is necessary to convert analog sound data to a digitized format in order to store it on a computer. A digital recording system works by capturing audio waveforms at specific intervals (known as the sampling rate) and converting those samples (after quantization) to equivalent binary audio signals. Sampling implies the process of observing/recording the values of a composite analog signal during regular intervals of time. Figure 3.6 illustrates this process.

As you can see, the digitized audio is a binary representation of the analog signal.

Note

There is a lot of theory involved in how sound is sampled and converted by leveraging amplitude (that is, the height of curves or crests and troughs), but the details are beyond the scope of this book. If you are keen to study more about this topic, you can start with Nyquist theorem and encoding algorithms such as pulse code modulation (PCM), differential pulse code modulation (DPCM), and adaptive differential pulse code modulation (ADPCM).

Each captured waveform is converted to a binary integer value and is stored on computer storage media. The quality of an audio signal depends on how identical the sample is to the original sound. In other words, the higher the quality of the sample, the higher the quality of the digitized audio format.

The manner in which an audio signal is compressed and stored is called the codec (which stands for “code and decode”), and it determines the file size. For example, files with the .mp3 extension use the MPEG Layer 3 codec, and files with the .wav extension are encoded with the PCM codec. Also, it is important to note that not all audio formats are lossless; some are lossy in nature, due to compression and type of codecs used. While compression makes it possible to save audio files using a reduced amount of space, it can decrease the quality of the audio compared to the original file. On the other hand, a lossless format (such as WAV) preserves the original quality and does not use any compression algorithm.

For this discussion, we can divide audio file formats into two broad categories:

Open standard: As the name suggests, an open standard can be used by any vendor to store audio files. For example, Microsoft leverages the open standard WAV format for the Windows sound effects for startup, logon, and so on.
Proprietary: A proprietary format is created by an organization for its own use, and any other organization that leverages that format has to pay royalty rights or fees to the organization that developed it. For example, Windows Media Audio (WMA) format was created and licensed by Microsoft.

Note

This section looks at common open standard and proprietary file formats, but it does not provide exhaustive coverage of all file formats.

Now let’s look at the formats for audio files. The following standard audio formats are available for digitized media:

WAV: Waveform Audio File Format (WAV), the most common audio file format, is typically used for storing uncompressed and PCM-encoded sound files. WAV files tend to be much larger than other file formats; they may be as large as around 10 MB per minute of music at 16 bits and 44.1 kHz. The WAV format was developed by Microsoft and IBM in the early 1990s and continues to be used openly across systems today.

Figure 3.7 shows the various Microsoft Windows audio files that are in WAV format by default. You can find these files under C:WindowsMedia (though the drive letter and folder may be different, depending on where you installed Windows on your machine).

AIFF: Apple developed Audio Interchange File Format (AIFF) primarily for its Mac platform—much as Microsoft developed WAV format. AIFF is an open format with file extension .aiff and can be used across platforms.
MP3: Since the early 2000s, MPEG Layer 3 (MP3) has been a very popular file format for downloading and storing music.

Fun Fact

We have fond memories of one of the most revered pieces of MP3 software in the early 2000s: Nullsoft Winamp. It was perhaps the most popular program for playing MP3 files back in the day. It could be customized using skins, which was a pretty advanced concept at the time!

An MP3 file leverages MPEG Layer 3 for encoding. MP3 is a compressed file format and takes a fraction of the space required for uncompressed WAV files (which explains why it is a popular format for downloading and storing music). For example, the Windows Logon file is 375 KB in WAV format but only 65 KB in MP3 format (see Figure 3.8). That’s a huge savings in terms of space, and the reduction in quality is unnoticeable.

AAC: The Advanced Audio Coding (AAC) audio format is based on the MPEG-4 audio standard defined by AT&T Bell Labs, Dolby, Nokia, and Sony. AAC is an enhanced version of MP3 and is better in multiple aspects, such as support for enhanced compression with better quality and a wider range of sampling rates. Apple leverages the AAC format extensively and has implemented digital rights management (DRM) for music in form of FairPlay.
FLAC: Free Lossless Audio Codec (FLAC) offers lossless compression, reducing the size of a music file to half the size of a WAV file—but with the same quality. If you compress a WAV (PCM) file to FLAC and then decompress it again, you end up with a file that is a perfect copy of the original.
GSM: This format, which based on Graphic Description Language, is very closely related to Global System for Mobile. This file format, which was created for Internet telephony in Europe, is used for recording mobile conversations. Files with the extension .gsm are encoded using constant bitrate (CBR) encoding and offer a compromise between sound quality and file size.
OGG: This open source container format is used to store audio and video data. The OGG file format leverages unpatented Ogg Vorbis audio compression. OGG audio files may have the extension .ogg or .oga.

Note

OGG file format was developed by Xiph.Org. You can visit www.xiph.org to learn more about the projects and file formats created and supported by Xiph.Org.

OGG is often compared to MP3 format in terms of quality and compression, though the quality of an OGG file is better than that of an equivalent MP3 file. This is perhaps the reason that Spotify has chosen to use OGG for its streaming service.

RAW: As the name suggests, this file format is for raw uncompressed audio data. RAW audio files typically (though not always) contain audio in PCM encoding.

Now let’s look at the most commonly used proprietary audio file formats:

WMA: Windows Media Audio (WMA) is a proprietary file format created by Microsoft. It is a compressed audio format, with the file extension .wma, and primarily works with Windows Media Player and Apple iTunes. WMA was designed with DRM incorporated for copy protection.

Fun Fact

WMA was commonly used with Apple iPods in the early 2000s, and it was difficult to get around the DRM protection. Users had to use a DRM-capable audio player to play WMA files.

The idea behind creating the WMA format was to compete with MP3 by offering better compression while keeping an equivalent level of sound quality at low bit rates. Both file formats offer a higher bit rate of 320 Kbps, which was commonly used with iPods and other MP3 players.

ATRAC: The Adaptive Transform Acoustic Coding (ATRAC) file format, which was developed by Sony primarily for Windows computers, leverages Sony’s SonicStage software. ATRAC was created as another attempt to compete against the popular MP3 format and lock in music to Sony’s ecosystem. Playing ATRAC format audio files required either a plug-in with a software application like Nullsoft Winamp or a Sony device such as PlayStation Portable (PSP) or Sony Walkman.

The next section covers the topic of video data.

Video

You probably use video in everyday life, and while you consume it on mobile devices, via on-demand TV, on websites, in news articles, and in many other streams, you might not think about how video files are encoded and saved. Video data usually exists as continuous analog signals, and it must be stored digitally (that is, as sets of bits or in a binary form) in computer memory or on a hard disk.

Interestingly, the video we are used to seeing on big screens is recorded by cameras at a rate of 24 fps (frames per second), but today’s modern displays can easily scale from 30 to 60 to 120 on a regular monitor and up to 144 fps on a gaming monitor. The human eye can adjust to and distinguish between lower and higher refresh rates or fps. Technically, 50 fps = 50 Hz.

Without getting into the intricacies of video displays, this section focuses on how video is stored in various formats. This section explores the most commonly used video file formats, which include the following:

AVI: Audio Video Interleave (AVI) is an extensively used video file format created by Microsoft in the early 1990s. AVI files, which have the extension .avi, tend to be larger than files in other video formats. It is a lossless video format and is therefore widely used for recording, processing (editing), and storing videos. AVI video files can contain different types of video compression codecs. AVI files are natively supported on the Windows platform and are also playable on other platforms using video players like VLC by VideoLAN.
WMV: Windows Media Video (WMV) is a video file format that was created by Microsoft in the late 1990s. WMV files leverage Advanced Systems Format (ASF) for encoding to produce small file sizes—but with poor video quality. While this format can be played cross-platform by leveraging VLC and other open source players, it is not very popular.
MPEG: MPEG (or MPG) is a video format that was developed by the Moving Picture Experts Group (MPEG) in the early 1990s. It is one of the most commonly used formats, and MPEG files have the file extension .mpg or .mpeg.

Note

Moving Picture Experts Group created a number of formats under the MPEG umbrella, including MPEG-1, MPEG-2, MPEG-3, MPEG-4, MPEG-7, and MPEG-21.

Both MPEG-1 and MPEG-2 support lossy compression of video and audio.

MP4: MPEG-4 (also known as MP4 or MPEG-4 Part 14) format was developed by the Moving Picture Experts Group in 2001. This format can contain audio, video, and subtitle data across multiple tracks.

Note

MP4 is based on the Apple MOV file type and is a container for audio and video content.

MP4 files are commonly used for streaming video via the Internet, for transferring video files over messaging apps, and for posting videos on social media. MP4 files are higher quality, with lower storage footprint due to high compression.

MOV and QT: The MOV and QT (QuickTime) file formats developed by Apple are used specifically with Apple’s QuickTime player. MOV files, however, use MPEG-4 encoding and are compatible across Mac and Windows. QuickTime uses atoms and QT atoms for storage of the QT file type on the system. MOV and QT files are high quality but have a large storage size.
WebM: Google developed the WEBM file format in 2010 specifically for online media exchange. WebM files, which have the .webm extension, are containers for audio and video content, and codecs like Vorbis (OGG) are used for encoding. WebM video files have a relatively small size, and they are not as high quality as MP4 files. You can find more details at https://www.webmproject.org/about/.

Fun Fact

The WebM video format is used for one of the largest video streaming services in the world: YouTube.

OGG: Yes, you’re reading this correctly: OGG format is not just for audio but works for video files as well. (This makes sense when you think about the fact that it is a container format rather than a type of encoding for a specific audio/video file.) OGG files are used mostly for streaming services and are high quality compared to WebM files. OGG video files have the .ogv extension; however, in HTML source code, the .ogg extension is used within the <video> tag.

Note

WebM, OGG, and MP4 are the three key file formats supported in HTML pages for video files (see https://www.w3schools.com/html/html5_video.asp).

The next section covers the topic of image data.

Images

When you browse the Internet or read a news article or a blog post, you typically come across multiple images. While images make pages interesting to look at (after all, a picture is worth a thousand words!), you aren’t likely to think about the details behind what makes an image bright or colorful but probably just consume the information as it is presented.

Graphics are broadly classified into two formats: raster and vector. Table 3.2 differentiates between them.

Table 3.2 Raster vs. Vector Graphics

Characteristic	Raster Graphics	Vector Graphics
Composition	Composed of pixels	Composed of paths based on mathematic calculations
Common uses	Commonly used across computer systems and the Internet for images	Uncommon outside of 3D animation, computer-aided design (CAD), and other engineering programs
Zoom quality	Become blurry (based on pixels per inch [ppi]) upon zooming the image; that is, doesn’t scale optimally	Retains image quality upon zooming without any significant loss; that is, scalable
File size	Images or graphic files are much smaller than vector files	Images or graphic files tend to be much larger than equivalent raster files
Conversion	Conversion from raster to vector is typically time-consuming	Conversion from vector to raster is relatively straightforward

For example, Figure 3.9 shows a comparison between raster and vector images, using the Pearson logo. The image on the left is a raster image (PNG file), and the one on the right is vector image (SVG file). Both images are at original size without any scale-out (zoom in), and they are visually similar.

Now, as you can see in Figure 3.10, the raster image starts losing clarity when scaled out (that is, zoomed in), whereas the vector image retains clarity and can be scaled without any issues. You can see that the pixels start appearing in the raster image on the left side, whereas there is no distortion in the vector image on the right side.

Common raster graphic formats include PNG, JPEG, and GIF. Common vector graphic formats include EPS, SVG, and PDF.

Next, we will look at raster images, and then we will look at vector images. The following list discusses the most common raster image formats:

BMP: Bitmap (BMP), or “map of bits,” is an older raster format that maps individual pixels with almost no compression, resulting in very large image files. A BMP file could be six to eight times larger than an equivalent JPEG file. BMP files are therefore not the best choice for online exchange as they take up a lot of precious storage space and introduce latency in file transfer. Figure 3.11 shows the same image files in BMP and JPEG formats for comparison.

JPEG: The Joint Photographic Experts Group (JPEG) helped create the JPEG (or JPG) standard. JPEG is a raster image file format with lossy compression that is suitable for sharing images. JPEGs are lossy in that they reduce file size—but at the cost of reduced image quality. JPEG is one of the image file types most commonly used on the Internet (such as in blogs and online articles).

Fun Fact

When you upload any image from a smartphone or PC to a social media platform like Facebook, the images are automatically put in JPEG file format.

PNG: Portable Network Graphics (PNG) is a raster graphics format that is typically used for reproducibility of high-quality graphics to preserve details and the contrast between colors. PNG format supports lossless compression and is usually leveraged for screenshots and infographics. PNG files are larger than equivalent JPEG files.
GIF: Graphics Interchange Format (GIF) is a very well-known and commonly used file format for image and graphics on the Internet. Text messages that have moving graphics (animations) are typically GIF images. GIF, which is a raster format and uses lossless compression, constrains images to 8 bits per pixel and a limited palette of 256 colors. Hence, GIF provides a very basic image reproduction capability at a huge size reduction, which is required for the millions of images used on the Internet and in instant messaging.
WebP: Google developed this file format along the lines of WebM specifically for web graphics. It is a raster format that supports both lossless and lossy algorithms. WebP images are approximately 30% smaller than JPEG files.

These are the most common vector image formats:

SVG: Scalable Vector Graphics (SVG) is a vector graphics file format that leverages XML text to outline shapes and lines using mathematical equations based on vectors (X, Y in 2D) to create graphs. SVG images can scale without any loss of quality. SVG is a great format for high-quality lossless graphics; however, it is not usable across platforms for all graphics because it produces very large files.

Fun Fact

SVG was developed by the World Wide Web Consortium (W3C) as a markup language to render 2D images.

EPS: Encapsulated Postscript (EPS) is a vector image file format used as a container for storing depictions across CorelDraw and Adobe Illustrator. EPS is used in text-based documents to outline shapes and lines with code (vectors), and it supports lossless scaling.
PDF: Portable Document Format (PDF) is a format that you have certainly come across for documents. However, you might not know that it leverages the same PostScript language as EPS. Just like EPS, it is lossless and can be used to store illustrations and graphics for later printing. PDF format offers much more than SVG and EPS as it provides searchable text fields. It is typically used for reporting and creating dashboards from analytics software.

Cram Quiz

Answer these questions. If you cannot answer these questions correctly, consider reading this section again until you can.

1. The PDF format is which of the following?

a. Lossy

b. Linearly lossy

c. Lossless

d. Log function lossy

2. Image, audio, and video signals are candidates for which of the following?

a. Data cleansing

b. Data compression

c. Data classification

d. All of these answers are correct.

3. Which of the following is a vector file format that acts as storage container for text and graphics?

a. SVG

b. BMP

c. GIF

d. PDF

4. PCM is which of the following?

a. A codec

b. An image format

c. A video format

d. A graphics format

5. True or false: AVI is a lossless format.

a. True

b. False

Cram Quiz Answers

1. Answer: c. Lossless. Portable Document Format (PDF) is a format that you have certainly come across for documents. However, you might not know that it leverages the same PostScript language as EPS. Just like EPS, it is lossless and can be used to store illustrations and graphics for later printing.

2. Answer: b. Data compression. Image, audio, and video signals are candidates for data compression, which involves modifying data to reduce file size.

3. Answer: d. PDF. PDF is a vector file format that offers much more than SVG and EPS as it provides searchable text fields. It is typically used for reporting and creating dashboards from analytics software.

4. Answer: a. A codec. Pulse Code Modulation (PCM) is a codec that is used to convert audio from analog format to digital format.

5. Answer: a. True. Audio Video Interleave (AVI) is an extensively used video file format created by Microsoft in the early 1990s. It is lossless video format and is used widely for recording, processing (editing), and storing videos.

What Next?

If you want more practice on this chapter’s exam objective before you move on, remember that you can access all of the Cram Quiz questions on the Pearson Test Prep software online. You can also create a custom exam by objective with the Online Practice Test. Note any objective you struggle with and go to that objective’s material in this chapter.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Chapter 3. Data Types and Types of Data

Create new playlist

Sign In

Sign Up

Chapter 3

Introduction to Data Types

Storage Sizes of Various Data Types

Character

Integer

Float and Double

Array

String

Comparing and Contrasting Different Data Types

Date

Alphanumeric

Numeric

Text

Currency

Categorical vs. Dimension and Discrete vs. Continuous Data Types

Categorical/Dimension Data Types

Discrete vs. Continuous Data Types

Types of Data: Audio, Video, and Images

Audio

Video

Images

What Next?

Table of Contents for
Chapter 3. Data Types and Types of Data