Chapter 7 Retrieving, Restricting, and Sorting Data Using SQL

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Images

CHAPTER 7 Retrieving, Restricting, and Sorting Data Using SQL

Exam Objectives

In this chapter, you will learn to

• 061.1.1 Explain the Capabilities of SQL SELECT Statements

• 061.1.2 Execute a Basic SELECT Statement

• 061.2.1 Limit the Rows Retrieved by a Query

• 061.2.2 Sort the Rows Retrieved by a Query

• 061.2.3 Use Ampersand Substitution to Restrict and Sort Output at Runtime

This chapter discusses the relational paradigm, normalizing data into relational structures, and retrieving data stored in relational tables using the SELECT statement. The statement is introduced in its basic form and is progressively built on to extend its core functionality. This chapter also discusses the WHERE clause, which specifies one or more conditions that the Oracle server evaluates to restrict the rows returned by the statement. The chapter examines the ORDER BY clause, which provides data-sorting capabilities, and the new SQL row-limiting clause; the chapter closes by discussing ampersand substitution, which is a mechanism that provides a way to reuse the same statement to execute different queries by substituting query elements at run-time.

Explain the Capabilities of SQL SELECT Statements

Knowing how to retrieve data in a set format using a query language is the first step toward understanding the capabilities of SELECT statements. Describing the relations involved provides a tangible link between the theory of how data is stored in tables and the practical visualization of the structure of these tables. These topics form an important precursor to the discussion of the capabilities of the SELECT statement. The three primary areas explored are as follows:

• Introducing the SQL SELECT statement

• Introducing the DESCRIBE table command

• Exploring the capabilities of the SELECT statement

Introducing the SQL SELECT Statement

The SELECT statement from the Structured Query Language (SQL) has to be the single most powerful nonspoken language construct. It is an elegant, flexible, and highly extensible mechanism created to retrieve information from a database table. A database would serve little purpose if it could not be queried to answer all sorts of interesting questions. For example, you may have a database that contains personal financial records such as your bank statements, your utility bills, and your salary statements. You could easily ask the database for a date-ordered list of your electrical utility bills for the last six months or query your bank statement for a list of payments made to a certain account over the same period. The beauty of the SELECT statement is encapsulated in its simple, English-like format that allows questions to be asked of the database in a natural manner.

The DESCRIBE Table Command

To get the answers you seek, you must ask the correct questions. An understanding of the terms of reference, which in this case are relational tables, is essential for the formulation of the correct questions. A structural description of a table is useful to establish what questions can be asked of it. The Oracle server stores information about all tables in a special set of relational tables called the data dictionary in order to manage them. The data dictionary is quite similar to a regular language dictionary. It stores definitions of database objects in a centralized, ordered, and structured format. The data dictionary was described in Chapter 2.

A clear distinction must be drawn between storing the definition and the contents of a table. The definition of a table includes information such as the table name, table owner, details about the columns that compose the table, and its physical storage size on disk. This information is also referred to as metadata. The contents of a table are stored in rows and are referred to as data.

You can obtain the structural metadata of a table by querying the database for the list of columns that compose it using the DESCRIBE command. The general form of the syntax for this command is as follows:

This command will be systematically unpacked. The DESCRIBE keyword can be shortened to DESC. All tables belong to a schema or owner. If you are describing a table that belongs to the schema to which you have connected, you can omit the <SCHEMA> portion of the command. Figure 7-1 shows how the EMPLOYEES table is described from SQL*Plus after connecting to the database as the HR user with the DESCRIBE EMPLOYEES command and how the DEPARTMENTS table is described using the shorthand notation: DESC HR.DEPARTMENTS. The HR. notational prefix could be omitted since the DEPARTMENTS table belongs to the HR schema. The HR schema (and every other schema) has access to a special table called DUAL, which belongs to the SYS schema. This table can be structurally described with the command describe sys.dual.

Images

Figure 7-1 Describing the EMPLOYEES, DEPARTMENTS, and DUAL tables

Describing tables yields interesting and useful results. You know which columns of a table can be selected since their names are exposed. You also know the nature of the data contained in these columns since the column data type is exposed. Chapter 5 detailed column types.

Mandatory columns, which are forced to store data for each row, are exposed by the Null? column output produced by the DESCRIBE command having the value NOT NULL. You are guaranteed that any column restricted by the NOT NULL constraint contains some data. It is important to note that NULL has special meaning for the Oracle server. NULL refers to an absence of data. Blank spaces do not count as NULL since they are present in the row and have some length even though they are not visible.

Capabilities of the SELECT Statement

Relational database tables are built on a mathematical foundation called relational theory. In this theory, relations or tables are operated on by a formal language called relational algebra. Relational algebra uses some specialized terms: Relations store tuples, which have attributes. Or in Oracle-speak, tables store rows, which have columns. SQL is a commercial interpretation of the relational algebra constructs. Three concepts from relational theory encompass the capabilities of the SELECT statement: projection, selection, and joining.

Projection refers to the restriction of columns selected from a table. When requesting information from a table, you can ask to view all the columns. You can retrieve all data from the HR.DEPARTMENTS table with a simple SELECT statement. This query will return DEPARTMENT_ID, DEPARTMENT_NAME, MANAGER_ID, and LOCATION_ID information for every department record stored in the table. What if you wanted a list containing only the DEPARTMENT_NAME and MANAGER_ID columns? Well, you would request just those two columns from the table. This restriction of columns is called projection.

Selection refers to the restriction of the rows selected from a table. It is often not desirable to retrieve every row from a table. Tables may contain many rows, and instead of requesting all of them, selection provides a means to restrict the rows returned. Perhaps you have been asked to identify only the employees who belong to department 30. With selection it is possible to limit the results set to those rows of data with a DEPARTMENT_ID value of 30.

Joining, as a relational concept, refers to the interaction of tables with each other in a query. Third normal form presents the notion of separating different types of data into autonomous tables to avoid duplication and maintenance anomalies and to associate related data using primary and foreign key relationships. These relationships provide the mechanism to join tables with each other (discussed in Chapter 10).

Assume there is a need to retrieve the e-mail addresses for employees who work in the Sales department. The EMAIL column belongs to the EMPLOYEES table, while the DEPARTMENT_NAME column belongs to the DEPARTMENTS table. Projection and selection from the DEPARTMENTS table may be used to obtain the DEPARTMENT_ID value that corresponds to the Sales department. The matching rows in the EMPLOYEES table may be joined to the DEPARTMENTS table based on this common DEPARTMENT_ID value. The EMAIL column may then be projected from this set of results.

The SQL SELECT statement is mathematically governed by these three tenets. An unlimited combination of projections, selections, and joins provides the language to extract the relational data required.

Data Normalization

This section introduces several real-world data organization scenarios to discuss the relational paradigm and introduce some practical modeling techniques. Critical to an understanding of SQL is an understanding of the relational paradigm and the ability to normalize data into relational structures. Normalization is the work of systems analysts because they model business data into a form suitable for storing in relational tables. It is a science that can be studied for years, and there are many schools of thought that have developed their own methods and notations.

Real-World Scenarios

This guide uses several hypothetical scenarios, including the two canned scenarios called HR and OE provided by Oracle and frequently used as the context for exam questions to illustrate various SQL concepts. The following scenarios evolve further as new concepts are discussed.

Car Dealership Sid runs a car dealership and needs a system to keep track of the cars that she buys and sells. She has noticed business taking a dive and wants to move into the 21st century and create a web site that advertises available stock. She needs a system to keep records of the cars she has bought and sold and the details of these transactions.

Geological Cores Core samples of the earth have been collected by your local geological survey agency. To ensure scientific rigor, the developers at GeoCore have determined that the system must track the exact geographical location, the elemental content of the core samples, and the dates of collection.

Order Entry The order entry (OE) scenario provided as a sample by Oracle contains information for a fictitious commercial system that tracks the products, the customers, and the sales orders that have been placed.

Human Resources The human resources (HR) scenario provided as a sample by Oracle records employees, departments, office locations, and job-related information for a typical HR department.

Although the hypothetical scenarios described vary in complexity, they share several characteristics, including a potential data growth that may eventually overwhelm a paper-based or spreadsheet-based data organization solution, as well as a requirement for data to be manipulated (inserted, updated, and deleted) and retrieved in an efficient manner. The challenge of producing an efficient data organization design (also known as a data model) may be overcome with both an understanding of how the data being organized is likely to be utilized and a few basic data modeling techniques. The goal is to achieve an optimal balance between data storage and data access, which will provide long-term downstream cost-saving benefits.

Data Modeling

Various formalized data modeling approaches are available, such as the Zachman framework and the Rational Unified Process, that ultimately seek to provide a systematic, standards-based approach to representing objects in an enterprise. There are a multitude of notations available to model entities and their relationships. A popular notation adopted by Oracle in its computer-aided software engineering (CASE) tools and more recently in SQL Developer is the crow’s foot notation, which will be discussed in this chapter. Other notations, such as Relational Schema notation and Universal Markup Language (UML), are also popular, but you must choose a notation that is comfortable and sensible for you.

Logical modeling is based on conceptualizing objects of interest as entities and their interactions with each other as relationships. There are many approaches to entity-relationship diagrams, each with their benefits and limitations. A brief discussion of entity-relationship diagrams and their notation follows.

Entities and Relations

Many Oracle professionals have adopted a framework that consists of three modeling stages for relational database modeling. A logical model is conceived when high-level constructs called entities, comprising various attributes and their relationships, are typically represented together in a diagram. Entities in logical models are usually depicted as rectangles with rounded corners, which comprise attributes or identifiers, sometimes denoted by an “o” symbol. Attributes that uniquely identify an instance of an entity are designated as primary keys and are sometimes denoted by the “#*” characters. Data typing the attributes may be done at this stage, but it is generally not reflected in the design.

The logical model is then turned into a relational model by translating the entities into relations, commonly referred to as tables. The idea here is that sets of instances of the entities are collectively modeled as a table. The attributes are turned into table columns. Each instance of an entity is reflected as a tuple or row of data, each having values for its different attributes or columns. The number of “rows in the table” is the “cardinality of the tuples.” Usually the attributes that are unique for each row are called unique keys, and typically a unique key is chosen to be the primary key (which is discussed later). The relationships between the entities are often modeled as foreign keys, which will also be explored in this chapter.

Relations in relational models are usually depicted as rectangles. At this stage there is typically more detail in terms of data typing for the attributes, and primary and foreign key attributes are also reflected with a P and an F, respectively, in the relational model. Finally, the relational model is engineered into a physical model by implementing the design in a relational database.

Crow’s foot notation is often used to depict relationships in logical and relational data models. The relationships between the entities can be one of the following and will be explored in the context of the car dealership scenario:

• 1:N One-to-many

• N:1 Many-to-one

• 1:1 One-to-one

• M:N Many-to-many

Consider the scenario of Sid’s car dealership introduced earlier. You could model the likely data as an entity consisting of the following car-related attributes: Make, Model, Engine Capacity, and Color. Information regarding the buying and selling of cars is also required, so you could add Purchase Date, Sold Date, Sellers Name, Sellers SSN (for Social Security number), Sellers Company, the same details for the buyer, and finally Purchase Price and Selling Price, as in Figure 7-2.

Images

Figure 7-2 A single car dealership entity

Sample transactional data stored in a table based on this entity may look like Figure 7-3, which shows three rows of data in a table comprising 14 columns called CAR_DEALERSHIP. The commands to create tables and populate them with data will be discussed later in this book. For now, there are several more fundamental important things to notice. Tables store data in rows, also called records. Each data element is found at the intersection of a row, and a column, also called a cell. It is fairly intuitive and much like a spreadsheet.

Images

Figure 7-3 Sample data in the CAR_DEALERSHIP table

The first two records in the CAR_DEALERSHIP table include the following information:

• A silver Mercedes A160 with a 1600cc engine capacity that belonged to Coda, a private seller with SSN 12345, was bought by Sid with SSN 12346 from Sid’s Cars, for $10,000 on June 1, 2013.

• A silver Mercedes A160 with a 1600cc engine capacity that belonged to Sid, with SSN 12346, from Sid’s Cars, was bought by Wags, with SSN 12347, from Wags Auto, for $12,000 on August 1, 2013.

Notice the repetition of data. Each record contains duplicate information for the cars being bought or sold and for the customer doing the buying or selling. Unnecessary duplication of data usually indicates poor design since it is wasteful and often requires needless maintenance. If this maintenance is not carefully done, this design allows errors (sometimes referred to as insert update and deletion anomalies) to creep in and reduces the overall integrity of the data.

Database normalization refers to modeling data using multiple entities with relationships between them, which may reduce or entirely eliminate data redundancy. There are many types of normal forms that have been defined theoretically, but relational database design primarily focuses on the following three:

• First normal form (1NF) deals with the issue of eliminating unnecessary repeating groups of data. An example of a repeating group in Figure 7-3 would be the first four columns on the first two rows where descriptive information about the car is repeated. You could define a new Cars entity that uniquely identifies a specific car using the Car ID primary key attribute as well as the Make, Model, Engine Capacity, and Color attributes. The Car ID identifier is then used in the related Transactions entity to avoid repeating groups of data.

• Second normal form removes attributes from the entity (1NF) that are not dependent on the primary key. In the proposed Cars entity described earlier, the Color attribute is not dependent on a specific car. You could define a new Colors entity that uniquely identifies a specific color using the Color ID primary key attribute. The Color ID can then be referenced by the Cars entity.

• Third normal form removes all interdependent attributes from a 2NF entity. The buyers and sellers of cars each have a uniquely identifying Social Security number (SSN). Their names, however, are interdependent on the SSN attribute. You could define a new Customers entity that uniquely identifies a customer using the Customer ID primary key attribute where interdependent information such as the customer’s name and company are stored.

Note that in the context of performance tuning it is intentional and acceptable to duplicate data in entities. When data is normalized across multiple entities instantiated as multiple tables that must be joined together, the Oracle server processes need to physically fetch data from multiple tables and join them in memory buffers to produce the required results set. The extra input/output (IO) required to query or manipulate normalized data sometimes justifies denormalizing data models to reduce disk IO operations and hence increase performance. This is common in data warehouses (DWHs) and decision support systems (DSSs), but is an exception rather than the rule in online transaction processing (OLTP) systems.

Consider the logical data model in Figure 7-4. The car-related data has been modeled as the Cars entity. The customer (buyers and sellers) information is essentially the same, so customers have been modeled as the Customers entity with the Customer Type attribute to differentiate between Purchasers and Sellers. The sales and purchases are recorded in the Transactions entity, while a lookup entity called Colors keeps track of different colors.

Images

Figure 7-4 The car dealership entity-relationship diagram

There are several advantages to conceptualizing this design as four interrelated entities. First, the data has been normalized, and there is no duplication of data. A practical benefit of multiple entities, each tracking a single construct such as Cars, Customers, Colors, and even Transactions, is the ease of data maintenance. New colors can be added, each with a unique code, and as new cars are purchased, these colors, defined and maintained in one place, can be used to describe multiple cars with the same color. You could improve the sophistication of this model by defining entities for tires, security systems, tracking devices, or audiovisual add-ons. You could equally enhance the details collected for each car, such as the vehicle identification numbers (VINs) and engine numbers, or for each customer, such as address and bank details, but this hypothetical scenario serves to illustrate several concepts and obviously cannot be used in a production application scenario without further enhancements.

Primary Keys

Each entity in Figure 7-4 has a primary key attribute that uniquely identifies a tuple or row of data denoted by “#*” adjacent to the attribute name. Each value of the Car ID primary key is unique in the entity. Multiple rows cannot share the same primary key value. Similarly, Color ID uniquely identifies each row in the Colors entity, as do Customer ID and Transaction ID in the Customers and Transactions entities, respectively.

Relationships

The lines in Figure 7-4 linking the various entities are known as relations. The crow’s foot notation expresses the cardinality of the relationships between the entities—one-to-one, one-to-many, many-to-one, and many-to-many. The crow’s foot notation explicitly illustrates the entity with the many side of the relationship with multiple “feet,” while the entity on the one side has one foot. Attributes in a one-to-one relationship are identical, while many-to-many relationships indicate that multiple tuples in entity A have the same attribute values as many tuples in entity B. Both one-to-one and many-to-many relationships are not common and sometimes point to flaws in the relational model. One-to-many and many-to-one relationships occur frequently when modeling relational entities. They relate attributes in two entities in a master-detail relationship. From the point of view of the relationship between the Cars and Colors entities (the order is significant), for example, many records in the Cars entity will be one Color. Many cars could have the same single Color ID attribute, indicating they are the same color. The Colors entity is the master or lookup entity, while the Cars entity is the detail entity in this relationship. From the point of view of the relationship between Colors and Cars, one Color can be associated with many Cars. So, it is just a matter of perspective whether a relationship is one-to-many or many-to-one; it all depends on which direction of the relationship you consider. The other relationships indicated by the crow’s feet show that a single car can be bought and sold multiple times, which is the reason for the one-to-many relationship between the Cars and Transactions entities and that one customer can perform many transactions (such as buying and selling many cars).

Referential Integrity and Foreign Keys

These relationships introduce the concept of referential integrity, which ensures data consistency and integrity by guaranteeing that an attribute (say attribute A) belonging to the entity on the one side of the relationship must be unique, while the attribute (say attribute B) on the entity on the many side must have a value that is in the set of unique values described by attribute A. Attribute B is called a foreign key since it has a referential dependency on attribute A. Consider the Colors-Cars relationship based on the Color ID attribute. Referential integrity ensures that the Color ID attribute in each tuple in the Cars entity must have a value that is identical to exactly one instance of the Color ID attribute in the Colors entity. This guarantee is central to relational modeling since the joining of the Cars and Colors entities on the Color ID attribute allows the Colors. Color (this is dot notation) attribute to be matched with a related tuple in the Cars entity. The Color ID attribute in the Cars entity is the foreign key that is related to the unique key, which is the Color ID attribute in the Colors entity, where it also happens to be the primary key. It is common that foreign keys in an entity are based on primary keys in a related entity, but this is not the rule.

The logical model in Figure 7-4 would typically evolve into a relational model with more data typing details and clearer primary and foreign keys, as shown in Figure 7-5.

Images

Figure 7-5 Relational model of the car dealership

The relational model could be engineered into a physical model where actual tables and other database constructs (discussed later in this chapter) are created. The sample data in Figure 7-3 transferred into the physical model built from the relational model described earlier would produce four datasets, as in Figure 7-6.

Images

Figure 7-6 Sample data using the car dealership relational model

The first two rows of data in the Transactions dataset can be interpreted as follows:

• A transaction with TX ID 100 describes the purchase of a car with Car ID 1 by Sid’s dealership from a customer with Customer ID 2 for $10,000 on June 1, 2013. You look up Customer ID 2 and see it was a sale from Coda, a private seller with SSN 12345. You look up Car ID 1 and determine that it was a 2001-A160 Mercedes with Color ID 1, which you further resolve to be Silver.

• A transaction with TX ID 101 describes the sale of Car ID 1 to Customer ID 4, which you resolve to be a dealer called Wags from Wags Auto with SSN 12347, for $12,000 on August 1, 2013.

Based on the earlier descriptions offered in a single entity design, nothing has been lost by organizing the sample data across the four-entity design. However, much has been gained. There is no duplication of data. There is a clarity and elegance that will facilitate ease of maintenance of this data as new cars are bought and sold and as new customers transact with Sid’s dealership.

Rows and Tables

The relational paradigm models data as two-dimensional tables. A table consists of a number of rows, each consisting of a set of columns. Within a table, all the rows have the same column structure, though it is possible that in some rows some columns may have nothing in them. An example of a table would be a list of one’s employees, each employee being represented by one row. The columns might be an employee number, a name, and a code for the department in which the employee works. Any employees not currently assigned to a department would have that column blank. Another table could represent the departments, with one row per department and with columns for the department’s code and the department’s name.

Relational tables conform to certain rules that constrain and define the data. At the column level, each column must be of a certain data type, such as numeric, date-time, or character. The character data type is the most general, in that it can accept any type of data. At the row level, usually each row must have some uniquely identifying characteristic. This could be the value of one column, such as the employee number and department number in the preceding examples, which cannot be repeated in different rows. There may also be rules that define links between the tables, such as a rule that every employee must be assigned a department code that can be matched to a row in the departments table. Tables 7-1 through 7-4 are examples of the tabulated data definitions (a subset of data and structures from the sample schema known as SCOTT provided by Oracle).

Looking at the layout of the DEPT and EMP tables in Tables 7-1 and 7-2, the two-dimensional structure is clear. Each row is of fixed length, each column is of fixed length (padded with spaces when necessary), and the rows are delimited with a new line. Table 7-3 shows the rows in the DEPT table stored in DEPTNO order, but this is a matter of chance, not design: Relational tables do not impose any particular ordering on their rows. Table 7-4 shows that department number 10 has one employee, and department number 40 has none. Changes to data are usually very efficient with the relational model. New employees can be appended to the employees table, or they can be moved from one department to another simply by changing the DEPTNO value in their row.

Images

Table 7-1 The DEPT Table

Images

Table 7-2 The EMP Table

Images

Table 7-3 Row Data from the DEPT Table

Images

Table 7-4 Row Data from the EMP Table

Consider an alternative structure, where the data is stored according to the hierarchical paradigm. The hierarchical model was developed before the relational model, for technology reasons. In the early days of computing, storage devices lacked the capability for maintaining the many separate files that were needed for the many relational tables. Note that this problem is avoided in the Oracle database by abstracting the physical storage (files) from the logical storage (tables); there is no direct connection between tables and files and certainly not a one-to-one mapping. In effect, many tables can be stored in a few files.

A hierarchical structure stores all related data in one unit. For example, the record for a department would include all that department’s employees. The hierarchical paradigm can be very fast and very space efficient. One file access may be all that is needed to retrieve all the data needed to satisfy a query. The employees and departments listed previously could be stored hierarchically as follows:

Images

In this example layout, the rows and columns are of variable length. Columns are delimited with a comma, and rows are delimited with a new line. Data retrieval is typically very efficient if the query can navigate the hierarchy. If you know an employee’s department, you can find the employee quickly. If you don’t, the retrieval may be slow. Changes to data can be a problem if the change necessitates movement. For example, to move employee 7566, JONES from RESEARCH to SALES would involve considerable effort on the part of the database because the move has to be implemented as a removal from one line and an insertion into another. Note that in this example while it is possible to have a department with no employees (the OPERATIONS department), it is absolutely impossible to have an employee without a department. There is nowhere to put him or her. This is excellent if there is a business rule stating that all employees must be in a department, but not so good if that is not the case.

The relational paradigm is highly efficient in many respects for many types of data, but it is not appropriate for all applications. As a general rule, a relational analysis should be the first approach taken when modeling a system. Only if it proves inappropriate should you resort to nonrelational structures. Applications where the relational model has proven highly effective include virtually all OLTP and DSS systems. The relational paradigm can be demanding in its hardware requirements and in the skill needed to develop applications around it, but if the data fits, it has proved to be the most versatile model. There can be, for example, problems caused by the need to maintain the indexes that maintain the links between tables and the space requirements of maintaining multiple copies of the indexed data in the indexes themselves and in the tables in which the columns reside. Nonetheless, relational design is, in most circumstances, the optimal model.

A number of software publishers have produced database management systems that conform (with varying degrees of accuracy) to the relational paradigm; Oracle is only one. IBM was perhaps the first company to commit major resources to it, but its product (which later developed into DB2) was not ported to non-IBM platforms for many years. Microsoft SQL Server is another relational database that has been limited by the platforms on which it runs. Oracle databases, by contrast, have always been ported to every major platform from the first release. It may be this that gave Oracle the edge in the relational database management system (RDBMS) marketplace.

A note on terminology: Confusion can arise when discussing relational databases with people used to working with Microsoft products. SQL is a language, and SQL Server is a database, but in the Microsoft world, the term SQL is often used to refer to either.

Create the Demonstration Schemas

Throughout this book, there are examples of SQL code. For the most part, the examples use tables in two demonstration schemas provided by Oracle: the HR schema, which is sample data that simulates a simple human resources application, and the OE schema, which simulates a more complicated order entry application.

These schemas can be created when the database is created; it is an option presented by the Database Configuration Assistant (DBCA). If they do not exist, they can be created later by running some scripts that will exist in the database Oracle Home.

Users and Schemas

In Oracle parlance, a database user is a person who can log on to the database. A database schema is all the objects in the database owned by one user. The two terms can often be used interchangeably because there is a one-to-one relationship between users and schemas. Note that while there is, in fact, a CREATE SCHEMA command, this does not actually create a schema—it is only a quick way of creating objects in a schema. A schema is initially created empty when a user is created with the CREATE USER command.

Schemas are used for storing objects. These may be data objects such as tables or programmatic objects such as PL/SQL stored procedures. User logons are used to connect to the database and access these objects. By default, users have access to the objects in their own schema and to no others, but most applications change this. Typically, one schema may be used for storing data that is accessed by other users who have been given permission to use the objects, even though they do not own them. In practice, few users will ever have objects in their own schema or permission to create them. They will have access rights (which will be strictly controlled) only to objects in another schema. These objects will be used by all users who run the application whose data that schema stores. Conversely, the users who own the data-storing schemas may never, in fact, log on; the only purpose of their schemas is to contain data used by others.

It is impossible for a data object to exist independently of a schema. In other words, all tables must have an owner. The owner is the user in whose schema the table resides. The unique identifier for a table (or any other schema object) is the username, followed by the object name. It follows that it is not possible for two tables with the same name to exist in the same schema but that two tables with the same name (though possibly different structures or contents) can exist in different schemas. If an object does not exist in one’s own schema, to access it a user must qualify its name with the name of the schema in which it resides. For example, HR.EMPLOYEES is the table called EMPLOYEES in user HR’s schema. Unless synonyms are available, only a user connected as HR could get to the table by referring to EMPLOYEES without a schema name qualifier. A synonym is a construct that makes an object accessible to other users without requiring its schema name as a prefix.

The HR and OE Schemas

The HR demonstration schema consists of seven tables, linked by primary key–to–foreign key relationships. Figure 7-7 illustrates the relationships between the tables as an entity-relationship diagram.

Images

Figure 7-7 The HR entity-relationship diagram

Two of the relationships shown in Figure 7-7 may not be immediately comprehensible. First, there is a many-to-one relationship from EMPLOYEES to EMPLOYEES. This is what is known as a self-referencing foreign key. This means that many employees can be connected to one employee, and it’s based on the fact that many employees may have one manager, but the manager is also an employee. The relationship is implemented by the column manager_id being a foreign key to employee_id, which is the table’s primary key.

The second relationship that may require explanation is between DEPARTMENTS and EMPLOYEES, which is bidirectional. The one-department-to-many-employees relationship simply states that there may be many staff members in each department, based on the EMPLOYEES dept_id column being a foreign key to the DEPARTMENTS primary key dept_id column. The one-employee-to-many-departments relationship shows that one employee could be the manager of several departments and is implemented by the manager_id column in DEPARTMENTS being a foreign key to the primary key employee_id column in EMPLOYEES.

Table 7-5 shows the columns of each table in the HR schema, using the notation described in the earlier section “Data Normalization” to indicate primary keys (#), foreign keys (), and whether columns are optional (o) or mandatory (*).

Images

Table 7-5 The Tables and Columns on the HR Schema

The tables are as follows:

• REGIONS has rows for major geographical areas.

• COUNTRIES has rows for each country, which are optionally assigned to a region.

• LOCATIONS includes individual addresses, which are optionally assigned to a country.

• DEPARTMENTS has a row for each department, optionally assigned to a location and optionally with a manager (who must exist as an employee).

• EMPLOYEES has a row for every employee, each of whom must be assigned to a job and optionally to a department and to a manager. The managers must themselves be employees.

• JOBS lists all possible jobs in the organization. It is possible for many employees to have the same job.

• JOB_HISTORY lists previous jobs held by employees, uniquely identified by employee_id and start_date; it is not possible for an employee to hold two jobs concurrently. Each job history record will refer to one employee, who will have had one job at that time and may have been a member of one department.

This HR schema is used for most of the exercises and many of the examples embedded in the chapters of this book and does need to be available.

The OE schema is considerably more complex than the HR schema. The table structures are much more complicated; they include columns defined as nested tables, user-defined data types, and Extensible Markup Language (XML) data types. The objects referred to are described as they are used.

Demonstration Schema Creation

If the database you are using was created specifically for studying for the SQL examination, the demonstration schemas should have been created already. They are an option presented by the Database Configuration Assistant when it creates a database. After database creation, the schemas may have to be unlocked and their passwords set; by default, the accounts are locked, which means you cannot log on to them. These commands, which could be issued from SQL*Plus or SQL Developer, will make it possible to log on as users HR and OE using the passwords hr and oe:

These alter user commands can be issued only when connected to the database as a user with database administrator (DBA) privileges, such as the user SYSTEM.

If the schemas were not created at database creation time, they can be created later by running scripts installed into the Oracle Home of the database. If these scripts are not present, you may download and install the Oracle Database Examples software from Oracle. These scripts will need to be run from SQL*Plus or SQL Developer as a user with SYSDBA privileges. The script will prompt for certain values as it runs. For example, on Linux, first launch SQL*Plus from an operating system prompt.

There are various options for this connection, but the preceding syntax will usually work if the database is running on the same machine where you are running SQL*Plus. Then invoke the script from the SQL> prompt.

The ? character is a variable that SQL*Plus will expand into the path to the Oracle Home directory. The script will prompt for HR’s password, default tablespace, and temporary tablespace; the SYS password; and a destination for a log file of the script that is running. Typical values for the default tablespace and temporary tablespace are USERS and TEMP, but these will need to have been created already. After completion, you will be connected to the database as the new HR user. To verify this, run these statements:

You will see that you are currently connected as HR; then run the following:

You will see a list of the seven tables in the HR schema.

To create the OE schema, follow the same process, nominating the script.

The process for creating the schemas on Windows is identical, except for the path delimiters—where most operating systems use forward slashes, Windows uses backslashes. So, the path to the Windows HR creation script becomes the following:

Note that running these schema creation scripts will drop the schemas first if they already exist. Dropping a schema means removing every item in it and then removing the user. This should not be a problem, unless the schema has been used for some development work that needs to be kept.

Execute a Basic SELECT Statement

The practical capabilities of the SELECT statement are realized in its execution. The key to executing any query language statement is a thorough understanding of its syntax and the rules governing its usage. You will learn more about this topic first, then about the execution of a basic query, and finally about expressions and operators, which exponentially increase the utility of data stored in relational tables. Next, the concept of a null value is demystified, as its pitfalls are exposed. These topics are covered in the following four sections:

• The syntax of the primitive SELECT statement

• The rules meant to be followed

• SQL expressions and operators

• NULL, meaning nothing

Syntax of the Primitive SELECT Statement

In its most primitive form, the SELECT statement supports the projection of columns and the creation of arithmetic, character, and date expressions. It also facilitates the elimination of duplicate values from the results set. The basic SELECT statement syntax is as follows:

The special keywords or reserved words of the SELECT statement syntax appear in uppercase. When using the commands, however, the case of the reserved words in your query statement does not matter. Reserved words cannot be used as column names or other database object names. SELECT, DISTINCT, and FROM are three keywords. A SELECT statement always contains two or more clauses. The two mandatory clauses are the SELECT clause and the FROM clause. The pipe symbol (|) is used to denote OR. So, you can read the first form of the preceding SELECT statement as follows:

In this format, the asterisk (*) is used to denote all columns. SELECT * is a succinct way of asking Oracle to return all possible columns. It is used as a shorthand, time-saving symbol instead of typing in SELECT column1, column2, column3, column4, …, columnX, to select all the columns. The FROM clause specifies which table to query to fetch the columns requested in the SELECT clause.

You can issue the following SQL command to retrieve all the columns and all the rows from the REGIONS table in the HR schema:

When this command is executed, it returns all the rows of data and all the columns belonging to this table. Use of the asterisk in a SELECT statement is sometimes referred to as a blind query because the exact columns to be fetched are not specified.

The second form of the basic SELECT statement has the same FROM clause as the first form, but the SELECT clause is different.

This SELECT clause can be simplified into two formats. Here’s the first:

Here’s the second:

An alias is an alternative name for referencing a column or expression. Aliases are typically used for displaying output in a user-friendly manner. They also serve as shorthand when referring to columns or expressions to reduce typing. Aliases will be discussed in detail later in this chapter. By explicitly listing only the relevant columns in the SELECT clause, you, in effect, project the exact subset of the results you want to retrieve. The following statement will return just the REGION_NAME column of the REGIONS table:

You may be asked to obtain all the job roles in the organization that employees have historically fulfilled. For this you can issue the command SELECT * FROM JOB_HISTORY. However, in addition, the SELECT * construct returns the EMPLOYEE_ID, START_DATE, and END_DATE columns. The uncluttered results set containing only the JOB_ID and DEPARTMENT_ID columns can be obtained with the following statement:

Using the DISTINCT keyword allows duplicate rows to be eliminated from the results set. In numerous situations a unique set of rows is required. It is important to note that the criterion employed by the Oracle server in determining whether a row is unique or distinct depends entirely on what is specified after the DISTINCT keyword in the SELECT clause. Selecting distinct JOB_ID values from the JOB_HISTORY table with the following query will return the eight distinct job types:

An important feature of the DISTINCT keyword is the elimination of duplicate values from combinations of columns.

Rules Are Meant to Be Followed

SQL is a fairly strict language in terms of syntax rules, but it remains simple and flexible enough to support a variety of programming styles. This section discusses some of the basic rules governing SQL statements.

Uppercase or Lowercase

It is a matter of personal taste about the case in which SQL statements are submitted to the database. Many developers, including the authors of this book, prefer to write their SQL statements in lowercase. There is also a common misconception that SQL reserved words need to be specified in uppercase. Again, this is up to you. Adhering to a consistent and standardized format is advised.

There is one caveat regarding case sensitivity. When interacting with literal values, case does matter. Consider the JOB_ID column from the JOB_HISTORY table. This column contains rows of data that happen to be stored in the database in uppercase, as in SA_REP and ST_CLERK. When requesting that the results set be restricted by a literal column, the case is critical. The Oracle server treats the request for the rows in the JOB_HISTORY table that contain a value of St_Clerk in the JOB_ID column differently from the request for rows that have a value of ST_CLERK in the JOB_ID column.

Metadata about different database objects is stored by default in uppercase in the data dictionary. If you query a database dictionary table to return a list of tables owned by the HR schema, it is likely that the table names returned will be stored in uppercase. This does not mean a table cannot be created with a lowercase name; it can be. It is just more common and is the default behavior of the Oracle server to create and store tables, columns, and other database object metadata in uppercase in the database dictionary.

Statement Terminators

Semicolons are generally used as SQL statement terminators. SQL*Plus always requires a statement terminator, and usually a semicolon is used. A single SQL statement, or even groups of associated statements, are often saved as script files for future use. Individual statements in SQL scripts are commonly terminated by a line break (or carriage return) and a forward slash on the next line, instead of a semicolon. You can create a SELECT statement, terminate it with a line break, include a forward slash to execute the statement, and save it in a script file. The script file can then be called from within SQL*Plus. Note that SQL Developer does not require a statement terminator if only a single statement is present, but it will not object if one is used. It is good practice to always terminate your SQL statements with a semicolon. Several examples of SQL*Plus statements follow:

Images

The first example demonstrates two important rules. First, the statement is terminated by a semicolon. Second, the entire statement is written on one line. It is entirely acceptable for a SQL statement either to be written on one line or to span multiple lines as long as no words in the statement span multiple lines. The second sample of code demonstrates a statement that spans three lines that is terminated by a new line and executed with a forward slash.

Indentation, Readability, and Good Practice

Consider the following query:

Images

This example highlights the benefits of indenting your SQL statement to enhance the readability of your code. The Oracle server does not object if the entire statement is written on one line without indentation. However, it is good practice to separate different clauses of the SELECT statement onto different lines. When an expression in a clause is particularly complex, it often enhances readability to separate that term of the statement onto a new line. When developing SQL to meet your reporting needs, the process is often iterative. The SQL interpreter is far more useful during development if complex expressions are isolated on separate lines since errors are usually thrown in the format of “ERROR at line X:” This makes the debugging process much simpler.

Exercise 7-1: Answer Your First Questions with SQL In this step-by-step exercise, you make a connection using SQL*Plus as the HR user to answer two questions using the SELECT statement.

Question 1: How many unique departments have employees currently working in them?

1. Start SQL*Plus and connect to the HR schema.

2. You may initially be tempted to find the answer in the DEPARTMENTS table. A careful examination reveals that the question asks for information about employees. This information is contained in the EMPLOYEES table.

3. The word unique should guide you to use the DISTINCT keyword.

4. Combining steps 2 and 3, you can construct the following SQL statement:

5. This query returns 12 rows. Notice that the third row is empty. This is a null value in the DEPARTMENT_ID column.

6. The answer to the first question is therefore as follows: Eleven unique departments have employees working in them, but at least one employee has not been assigned to a department.

Question 2: How many countries are there in the Europe region?

1. This question comprises two parts. Consider the REGIONS table, which contains four regions, each uniquely identified by a REGION_ID value, and the COUNTRIES table, which has a REGION_ID column indicating which region a country belongs to.

2. The first query needs to identify the REGION_ID value of the Europe region. This is accomplished by the SQL statement, which shows that the Europe region has a REGION_ID value of 1.

3. To identify which countries have 1 as their REGION_ID value, you can execute the following SQL query:

4. Manually counting the country rows with a REGION_ID of 1 returned shows that there are eight countries in the Europe region as far as the HR data model is concerned.

SQL Expressions and Operators

The general form of the SELECT statement introduced the notion that columns and expressions may be projected. An expression usually consists of an operation being performed on one or more column values or expressions. The operators that can act upon values to form an expression depend on the underlying data type. They are the four cardinal arithmetic operators (addition, subtraction, multiplication, and division) for numeric columns, the concatenation operator for character or string columns, and the addition and subtraction operators for date and timestamp columns. As in regular arithmetic, there is a predefined order of evaluation (operator precedence) when more than one operator occurs in an expression. Round brackets have the highest precedence. Division and multiplication operations are next in the hierarchy and are evaluated before addition and subtraction, which have lowest precedence.

Operators with the same level of precedence are evaluated from left to right. Round brackets may therefore be used to enforce nondefault operator precedence. Using brackets generously when constructing complex expressions is good practice and is encouraged. It leads to readable code that is less prone to error. Expressions expose a large number of useful data manipulation possibilities.

Arithmetic Operators

Consider the JOB_HISTORY table, which stores the start date and end date of an employee’s term in a previous job role. It may be useful for tax or pension purposes to calculate how long an employee worked in that role. This information can be obtained using an arithmetic expression. Several elements of both the SQL statement and the results returned from Figure 7-8 warrant further discussion.

Images

Figure 7-8 Arithmetic expression to calculate number of days worked

The SELECT clause specifies five elements. The first four are regular columns of the JOB_HISTORY table, while the latter provides the source information required to calculate the number of days that an employee filled a particular position. Consider employee number 176 on the ninth row of output. This employee started as a sales manager on January 1, 2007, and ended employment on December 31, 2007. Therefore, this employee worked for exactly one year, which, in 2007, consisted of 365 days.

The number of days for which an employee was employed can be calculated by using the fifth element in the SELECT clause, which is an expression. This expression demonstrates that arithmetic performed on columns containing date information returns numeric values that represent a certain number of days.

To enforce operator precedence of the subtraction operation, the subexpression end_date-start_date is enclosed in round brackets. Adding 1 makes the result inclusive of the first day.

Expression and Column Aliasing

Figure 7-8 introduced a new concept called column aliasing. Notice that the expression has a meaningful heading named Days Employed. This heading is an alias. An alias is an alternative name for a column or an expression. If this expression did not make use of an alias, the column heading would be (END_DATE-START_DATE)+1, which is not very user friendly. Aliases are especially useful with expressions or calculations and may be implemented in several ways. There are a few rules governing the use of column aliases in SELECT statements. The alias “Days Employed” in Figure 7-8 was specified by leaving a space and entering the alias in double quotation marks. These quotation marks are necessary for two reasons. First, this alias consists of more than one word. Second, case preservation of an alias is possible only if the alias is double quoted. If a multiworded, space-separated alias is specified, an “ORA-00923: FROM keyword not found where expected” error is returned if it is not double quoted. SQL offers a more formalized way of inserting aliases by inserting the AS keyword between the column or expression and the alias, as shown in the first line of this query:

Images

Character and String Concatenation Operator

The double pipe symbols (||) represent the character concatenation operator. This operator is used to join character expressions or columns together to create a larger character expression. Columns of a table may be linked to each other or to strings of literal characters to create one resultant character expression.

The concatenation operator is flexible enough to be used multiple times and almost anywhere in a character expression. Consider the following query:

Here, the character literal The is concatenated to the contents of the REGION_NAME column. This new string of characters is further concatenated to the character literal region is on Planet Earth, and the entire expression is aliased with the friendly heading Planetary Location.

Literals and the DUAL Table

Literals are commonly used in expressions and refer to numeric, character, or date and time values found in SELECT clauses that do not originate from any database object. Concatenating character literals to existing column data can be useful, but what about processing literals that have nothing to do with existing column data? To ensure relational consistency, Oracle offers a clever solution to the problem of using the database to evaluate expressions that have nothing to do with any tables or columns. To get the database to evaluate an expression, a syntactically legal SELECT statement must be submitted. What if you wanted to know the sum of two numeric literals? Oracle solves the problem of relational interaction with the database operating on literal expressions by providing a special single-rowed, single-columned table called DUAL.

Recall that the DUAL table described earlier contains one column called DUMMY of the character data type. You can execute the query SELECT * FROM DUAL, and the data value “X” is returned as the contents of the DUMMY column. Testing complex expressions during development by querying the dual table is an effective method to evaluate whether these expressions are correct. Literal expressions can be queried from any table, but remember that the expression will be processed for every row in the table, while querying the DUAL table returns only one row.

Images

The first statement will return four lines in the results set since there are four rows of data in the REGIONS table, while the second returns only one row.

Two Single Quotes or the Alternative Quote Operator

The literal character strings concatenated so far have been singular words prepended and appended to column expressions. These character literals are specified using single quotation marks. Here’s an example:

What about character literals that contain single quotation marks? Plurals pose a particular problem for character literal processing. Consider the following statement:

Executing this statement causes an Oracle error to be generated. So, how are words that contain single quotation marks dealt with? There are essentially two mechanisms available. The most popular of these is to add an additional single quotation mark next to each naturally occurring single quotation mark in the character string. The following statement demonstrates how the previous error is avoided by replacing the character literal 'Plural's with the literal 'Plural''s:

Using two single quotes to handle each naturally occurring single quote in a character literal can become messy and error prone as the number of affected literals increases. Oracle offers a neat way to deal with this type of character literal in the form of the alternative quote (q) operator. The problem is that Oracle chose the single quote character as the special symbol with which to enclose or wrap other character literals. These character-enclosing symbols could have been anything other than single quotation marks.

Bearing this in mind, consider the alternative quote (q) operator. The q operator enables you to choose from a set of possible pairs of wrapping symbols for character literals as alternatives to the single quote symbols. The options are any single-byte or multibyte character or the four brackets: (round brackets), {curly braces}, [square brackets], or <angle brackets>. Using the q operator, the character delimiter can effectively be changed from a single quotation mark to any other character, as shown here:

Images

The syntax of the alternative quote operator is as follows, where delimiter can be any character or bracket:

The first and second examples show the use of angle and square brackets as character delimiters, while the third example demonstrates how an uppercase X has been used as the special character delimiter symbol through the alternative quote operator. Note that the X character can itself be included in the string—so long as it is not followed by a quotation mark.

NULL Is Nothing

Null refers to an absence of data. A row that contains a null value lacks data for that column. Null is formally defined as a value that is unavailable, unassigned, unknown, or inapplicable. Failure to heed the special treatment that null values require will almost certainly lead to an error or, worse, an inaccurate answer. This section focuses on interacting with null column data with the SELECT statement and its impact on expressions.

Not Null and Nullable Columns

Tables store rows of data that are divided into one or more columns. These columns have names and data types associated with them. Some of them are constrained by database rules to be mandatory columns. It is compulsory for some data to be stored in the NOT NULL columns in each row. When columns of a table, however, are not compelled by the database constraints to hold data for a row, these columns run the risk of being empty.

Oracle offers a mechanism for interacting arithmetically with NULL values using the general functions discussed in Chapter 8. Division by a null value results in null, unlike division by zero, which results in an error. When a null is encountered by the character concatenation operator, however, it is simply ignored. The character concatenation operators ignore null, while the arithmetic operations involving null values always result in null.

Foreign Keys and Nullable Columns

Data model design sometimes leads to problematic situations when tables are related to each other via a primary and foreign key relationship but the column that the foreign key is based on is nullable.

The DEPARTMENTS table has, as its primary key, the DEPARTMENT_ID column. The EMPLOYEES table has a DEPARTMENT_ID column that is constrained by its foreign key relationship to the DEPARTMENT_ID column in the DEPARTMENTS table. This means that no record in the EMPLOYEES table is allowed to have in its DEPARTMENT_ID column a value that is not in the DEPARTMENTS table. This referential integrity forms the basis for third normal form and is critical to overall database integrity.

But what about NULL values? Can the DEPARTMENT_ID column in the DEPARTMENTS table contain nulls? The answer is no. Oracle insists that any column that is a primary key be implicitly constrained to be mandatory. But what about implicit constraints on foreign key columns? This is a quandary for Oracle since in order to remain flexible and cater to the widest audience, it cannot insist that columns related through referential integrity constraints must be mandatory. Further, not all situations demand this functionality.

The DEPARTMENT_ID column in the EMPLOYEES table is actually nullable. Therefore, the risk exists that there are records with null DEPARTMENT_ID values present in this table. In fact, there are such records in the EMPLOYEES table. The HR data model allows employees, correctly or not, to belong to no department. When performing relational joins between tables, it is entirely possible to miss or exclude certain records that contain nulls in the join column. Chapter 10 discusses ways to deal with this challenge.

Exercise 7-2: Construct Expressions In this exercise, you will construct a query using expressions and aliases to display results from the HR schema in a more user-friendly manner.

1. Query the HR.JOBS table and return a single expression of the form The Job Id for the <job_title's> job is: <job_id>. Take note that the job_title should have an apostrophe and an s appended to it to read more naturally. A sample of this output for the organization president is as follows: The Job Id for the President’s job is: AD_PRES. Alias this column expression: Job Description using the AS keyword. There are multiple solutions to this problem. The approach chosen here is to handle the naturally occurring single quotation mark with an additional single quote. You could make use of the alternative quote operator to delimit the naturally occurring quote with another character.

2. A single expression aliased as Job Description is required; you may construct it by concatenating the literal 'The Job Id for the' to the JOB_TITLE column. This string is then concatenated to the literal '''s job is: ' , which is further concatenated to the JOB_ID column. An additional single quotation mark is added to yield the SELECT statement that follows:

Limit the Rows Retrieved by a Query

One of the cornerstone principles in relational theory is selection. Selection is actualized using the WHERE clause of the SELECT statement, sometimes referred to as the predicate. Conditions that restrict the dataset returned take many forms and operate on columns as well as expressions. Only rows that conform to these conditions are returned. Conditions restrict rows using comparison operators in conjunction with columns and literal values. Boolean operators provide a mechanism to specify multiple conditions to restrict the rows returned. Boolean, conditional, concatenation, and arithmetic operators are discussed to establish their order of precedence when they are encountered in a SELECT statement.

The WHERE Clause

The WHERE clause extends the SELECT statement by providing the ability to restrict rows returned based on one or more conditions. Querying a table with just the SELECT and FROM clauses results in every row of data stored in the table being returned. Using the DISTINCT keyword, duplicate values are excluded, and the resultant rows are restricted to some degree. What if very specific information is required from a table, for example, only the data where a column contains a specific value? How would you retrieve the countries that belong to the Europe region from the COUNTRIES table? What about retrieving just those employees who work as sales representatives? These questions are answered using the WHERE clause to specify exactly which rows must be returned. The format of the SQL SELECT statement that includes the WHERE clause is as follows:

Images

The WHERE clause always follows the FROM clause. The square brackets indicate that the WHERE clause is optional. One or more conditions may be simultaneously applied to restrict the results set. A condition is specified by comparing two terms using a conditional operator. These terms may be column values, literals, or expressions. The equality operator is most commonly used to restrict result sets. An example of using a WHERE clause is shown next:

Images

This example projects the COUNTRY_NAME column from the COUNTRIES table. Instead of selecting every row, the WHERE clause restricts the rows returned to only those containing a 3 in the REGION_ID column.

Numeric-Based Conditions

Conditions must be formulated appropriately for different column data types. The conditions restricting rows based on numeric columns can be specified in several different ways. Consider the SALARY column in the EMPLOYEES table. This column has a data type of NUMBER(8,2). The SALARY column can be restricted as follows:

The LAST_NAME and SALARY values of the employees who earn $10,000 are retrieved since the data types on either side of the operator match and are compatible.

A numeric column can be compared to another numeric column in the same row to construct a WHERE clause condition, as the following query demonstrates:

This WHERE clause is too restrictive and results in no rows being selected because the range of SALARY values is 2100 to 24000, and the range of DEPARTMENT_ID values is 10 to 110. Since there is no overlap in the range of DEPARTMENT_ID and SALARY values, there are no rows that satisfy this condition and therefore nothing is returned.

WHERE clause conditions can also be used to compare numeric columns and expressions or to compare expressions to other expressions.

Images

The first example compares the SALARY column with DEPARTMENT_ID*100 for each row. The second example compares two expressions. Notice that the conditions in both examples are algebraically identical, and the same dataset is retrieved when both are executed.

Character-Based Conditions

Conditions determining which rows are selected based on character data are specified by enclosing character literals in the conditional clause, within single quotes. The JOB_ID column in the EMPLOYEES table has a data type of VARCHAR2(10). Suppose you wanted a list of the LAST_NAME values of those employees currently employed as sales representatives. The JOB_ID value for a sales representative is SA_REP. The following statement produces such a list:

If you tried specifying the character literal without the quotes, an Oracle error would be raised. Remember that character literal data is case sensitive, so the following WHERE clauses are not equivalent:

Images

Clause 1 generates an “ORA-00904: ‘SA_REP’: invalid identifier” error since the literal SA_REP is not wrapped in single quotes. Clause 2 and Clause 3 are syntactically correct but not equivalent. Further, neither of these clauses yields any data since there are no rows in the EMPLOYEES table having JOB_ID column values that are either Sa_Rep or sa_rep.

Character-based conditions are not limited to comparing column values with literals. They may also be specified using other character columns and expressions. Character-based expressions may form either one or both parts of a condition separated by a conditional operator. These expressions can be formed by concatenating literal values with one or more character columns. The following four clauses demonstrate some of the options for character-based conditions:

Images

Clause 1 concatenates the string literal A to the LAST_NAME and FIRST_NAME columns. This expression is compared to the literal A King. Clause 2 demonstrates that character expressions may be placed on both sides of the conditional operator. Clause 3 illustrates that literal expressions may also be placed on the left of the conditional operator. It is logically equivalent to clause 4, which has swapped the operands in clause 3 around. Both clauses 3 and 4 identically restrict the results.

Date-Based Conditions

DATE columns are useful for storing date and time information. Date literals must be enclosed in single quotation marks just like character data. When used in conditional WHERE clauses, date columns may be compared to other date columns, literals, or expressions. The literals are automatically converted into DATE values based on the default date format, which is DD-MON-RR. If a literal occurs in an expression involving a DATE column, it is automatically converted into a date value using the default format mask. DD represents days, MON represents the first three letters of a month, and RR represents a Year 2000–compliant year (that is, if RR is between 50 and 99, then the Oracle server returns the previous century, or else it returns the current century). The full four-digit year, YYYY, can also be specified. Consider the following four WHERE clauses:

Images

The first clause tests equality between two DATE columns. Rows that contain the same values in their START_DATE and END_DATE columns will be returned. Note, however, that DATE values are equal to each other only if there is an exact match between all their components, including day, month, year, hours, minutes, and seconds. Chapter 8 discusses the details of storing DATE values. Until then, don’t worry about the hours, minutes, and seconds components. In the second WHERE clause, the START_DATE column is compared to this character literal: '01-JAN-2001'. The entire four-digit year component (YYYY) has been specified. This is acceptable to the Oracle server. The third condition is equivalent to the second, since the literal '01-JAN-01' is converted to the date value 01-JAN-2001. This is because of the RR component being less than 50, so the current (21st) century, 20, is prefixed to the year RR component to provide a century value. The century component for the literal '01-JAN-99' becomes the previous century (19) and is converted to a date value of 01-JAN-1999 for the fourth condition since the RR component, 99, is greater than 50.

Date arithmetic using the addition and subtraction operators is supported. An expression like END_DATE – START_DATE returns the number of days between START_DATE and END_DATE. START_DATE + 30 returns a date 30 days later than START_DATE.

Comparison Operators

The equality operator is generally used to illustrate the concept of restricting rows using a WHERE clause. There are several alternative operators that may also be used. The inequality operators like “less than” or “greater than or equal to” may be used to return rows conforming to inequality conditions. The BETWEEN operator facilitates range-based comparison to test whether a column value lies between two values. The IN operator tests set membership, so a row is returned if the column value tested in the condition is a member of a set of literals. The pattern matching comparison operator LIKE is extremely powerful, allowing components of character column data to be matched to literals conforming to a specific pattern. The last comparison operator discussed in this section is the IS NULL operator, which returns rows where the column value contains a null value. These operators may be used in any combination in the WHERE clause.

Equality and Inequality

Limiting the rows returned by a query involves specifying a suitable WHERE clause. If the clause is too restrictive, then few or no rows are returned. If the conditional clause is too broadly specified, then more rows than are required are returned. Exploring the different available operators should equip you with the language to request exactly those rows you are interested in. Testing for equality in a condition is both natural and intuitive. Such a condition is formed using the “is equal to” (=) operator. A row is returned if the equality condition is true for that row. Consider the following query:

The JOB_ID column of every row in the EMPLOYEES table is tested for equality with the character literal SA_REP. For character information to be equal, there must be an exact case-sensitive match. When such a match is encountered, the values for the projected columns, LAST_NAME and SALARY, are returned for that row. Note that although the conditional clause is based on the JOB_ID column, it is not necessary for this column to be projected by the query.

Inequality-based conditions enhance the WHERE clause specification. Range and pattern matching comparisons are possible using inequality and equality operators, but it is often preferable to use the BETWEEN and LIKE operators for these comparisons. Table 7-6 describes the inequality operators.

Images

Table 7-6 Inequality Operators

Inequality operators allow range-based queries to be fulfilled. You may be required to provide a set of results where a column value is greater than another value. The following query may be issued to obtain a list of LAST_NAME and SALARY values for employees who earn more than $5,000:

The composite inequality operators (made up of more than one symbol) are utilized in the following clauses:

Clause 1 returns those rows that contain a SALARY value that is less than or equal to 3000. Clause 2 demonstrates one of the two forms of the “not equal to” operators. Clause 2 returns the rows that have SALARY column values that are not equal to the DEPARTMENT_ID values.

Numeric inequality is naturally intuitive. The comparison of character and date terms, however, is more complex. Testing character inequality is interesting because the strings being compared on either side of the inequality operator are converted to a numeric representation of its characters. Based on the database character set and National Language Support (NLS) settings, each character string is assigned a numeric value. These numeric values form the basis for the evaluation of the inequality comparison. Consider the following statement:

The character literal 'King' is converted to a numeric representation. Assuming a US7ASCII database character set with AMERICAN NLS settings, the literal 'King' is converted into its ordinal character values: K (75), i (105), n (110), and g (103). For each row in the EMPLOYEES table, the LAST_NAME column data is similarly converted into numeric values for each character, which are then compared in turn with the numeric values of the characters in the literal 'King'. For example, the row with LAST_NAME='Kaufling' is compared as follows: The first character in both strings is 'K' with an equal ordinal value of 75. So, the second character (i=105) is compared with (a=97). Since (97 < 105) or (a < i), 'Kaufling' < 'King', and the row is selected. The same process for comparing numeric data using the inequality operators applies to character data. The only difference is that character data is converted implicitly by the Oracle server to a numeric value based on certain database settings.

Inequality comparisons operating on date values follow a similar process to character data. The Oracle server stores dates in an internal numeric format, and these values are compared within the conditions.

Range Comparison with the BETWEEN Operator

The BETWEEN operator tests whether a column or expression value falls within a range of two boundary values. The item must be at least the same as the lower boundary value or’ at most’ the same as the higher boundary value or fall within the range for the condition to be true.

Suppose you want the last names of employees who earn a salary in the range of $3,400 and $4,000. A possible solution using the BETWEEN operator is as follows:

Conditions specified with the BETWEEN operator can be equivalently denoted using two inequality-based conditions.

It is shorter and simpler to specify the range condition using the BETWEEN operator.

Set Comparison with the IN Operator

The IN operator tests whether an item is a member of a set of literal values. The set is specified by comma-separating the literals and enclosing them in round brackets. If the literals are character or date values, then these must be delimited using single quotes. You may include as many literals in the set as you want. Consider the following example:

The SALARY value in each row is compared for equality to the literals specified in the set. If the SALARY value equals 1,000; 4,000; or 6,000; the LAST_NAME value for that row is returned. The following two statements demonstrate the use of the IN operator with DATE and CHARACTER data:

Images

Pattern Comparison with the LIKE Operator

The LIKE operator is designed exclusively for character data and provides a powerful mechanism for searching for letters or words. LIKE is accompanied by two wildcard characters: the percentage symbol (%) and the underscore character (_). The percentage symbol is used to specify zero or more wildcard characters, while the underscore character specifies one wildcard character. A wildcard may represent any character.

You can use the following query to provide a list of employees whose first names begin with the letter A:

The character literal that the FIRST_NAME column is compared to is enclosed in single quotes like a regular character literal. In addition, it has a percentage symbol, which has a special meaning in the context of the LIKE operator. The percentage symbol substitutes zero or more characters appended to the letter A. The wildcard characters can appear at the beginning, the middle, or the end of the character literal. They can even appear alone, as follows:

In this case, every row containing a FIRST_NAME value that is not null will be returned. Wildcard symbols are not mandatory when using the LIKE operator. In such cases, LIKE behaves as an equality operator testing for exact character matches; so, the following two WHERE clauses are equivalent:

The underscore wildcard symbol substitutes exactly one other character in a literal. Consider searching for employees whose last names are four letters long, begin with a K, have an unknown second letter, and end with an ng. You may issue the following statement:

As Figure 7-9 shows, the two wildcard symbols can be used independently, together, or even multiple times in a single WHERE condition. The first query retrieves those records where COUNTRY_NAME begins with the letter I followed by one or more characters, one of which must be a lowercase a.

Images

Figure 7-9 The wildcard symbols of the LIKE operator

The second query retrieves those countries whose names contain the letter i as its fifth character. The length of the COUNTRY_NAME values and the letter they begin with are unimportant. The four underscore wildcard symbols preceding the lowercase i in the WHERE clause represent exactly four characters (which could be any characters). The fifth letter must be an i, and the percentage symbol specifies that the COUNTRY_NAME can have zero or more characters from the sixth character onward.

What about when you are searching for a literal that contains a percentage or underscore character? A naturally occurring underscore character may be escaped (or treated as a regular nonspecial symbol) using the ESCAPE identifier in conjunction with an ESCAPE character. In the following example, any JOB_ID values that begin with the three characters SA_ will be returned:

Traditionally, the ESCAPE character is the backslash symbol, but it does not have to be. The following statement is equivalent to the preceding one but uses a dollar symbol as the ESCAPE character instead:

The percentage symbol may be similarly escaped when it occurs naturally as character data.

Exercise 7-3: Use the LIKE Operator Construct a query to retrieve a list of department names that end with the letters ing from the DEPARTMENTS table.

1. Start SQL*Plus and connect to the HR schema.

2. The WHERE clause must perform a comparison between the DEPARTMENT_NAME column values and a pattern beginning with zero or more characters but ending with three specific characters, ing. The operator enabling character pattern matching is the LIKE operator. The pattern the DEPARTMENT_NAME column must conform to is '%ing'.

3. Thus, the correct query is as follows:

NULL Comparison with the IS NULL Operator

NULL values inevitably find their way into database tables. It is sometimes required that only those records that contain a NULL value in a specific column are sought. The IS NULL operator selects only the rows where a specific column value is NULL. Testing column values for equality to NULL is performed using the IS NULL operator instead of the “is equal to” operator (=).

Consider the following query, which fetches the LAST_NAME column from the EMPLOYEES table for those rows that have NULL values stored in the COMMISSION_PCT column:

This WHERE clause reads naturally and retrieves only the records that contain NULL COMMISSION_PCT values.

Boolean Operators

Boolean or logical operators enable multiple conditions to be specified in the WHERE clause of the SELECT statement. This facilitates a more refined data extraction capability. Consider isolating those employee records with FIRST_NAME values that begin with the letter J and that earn a COMMISSION_PCT greater than 10 percent. First, the data in the EMPLOYEES table must be restricted to FIRST_NAME values like J%, and second, the COMMISSION_PCT values for the records must be tested to ascertain if they are larger than 10 percent. These two separate conditions may be associated using the Boolean AND operator and are applied consecutively in a WHERE clause. A result set conforming to any or all conditions or to the negation of one or more conditions may be specified using the OR, AND, and NOT Boolean operators, respectively.

The AND Operator

The AND operator merges conditions into one large condition to which a row must conform to be included in the results set. If two conditions specified in a WHERE clause are joined with an AND operator, then a row is tested consecutively for conformance to both conditions before being retrieved. If it conforms to neither or only one of the conditions, the row is excluded. Employee records with FIRST_NAME values beginning with the letter J and COMMISSION_PCT greater than 10 percent can be retrieved using the following query:

Notice that the WHERE clause now has two conditions but only one WHERE keyword. The AND operator separates the two conditions. To specify further mandatory conditions, simply add them and ensure that they are separated by additional AND operators. You can specify as many conditions as you want. Remember, though, the more AND conditions specified, the more restrictive the query becomes.

The OR Operator

The OR operator separates multiple conditions, at least one of which must be satisfied by the row selected to warrant inclusion in the results set. If two conditions specified in a WHERE clause are joined with an OR operator, then a row is tested consecutively for conformance to either or both conditions before being retrieved. Conforming to just one of the OR conditions is sufficient for the record to be returned. If it conforms to none of the conditions, the row is excluded. Retrieving employee records having FIRST_NAME values beginning with the letter B or those with a COMMISSION_PCT greater than 35 percent can be written as follows:

Notice that the two conditions are separated by the OR keyword. All employee records with FIRST_NAME values beginning with an uppercase B will be returned regardless of their COMMISSION_PCT values, even if they are NULL. Records with COMMISSION_PCT values greater than 35 percent (regardless of what letter their FIRST_NAME begins with) are also returned.

Further, OR conditions may be specified by separating them with an OR operator. The more OR conditions you specify, the less restrictive your query becomes.

The NOT Operator

The NOT operator negates conditional operators. A selected row must conform to the logical opposite of the condition in order to be included in the results set. Conditional operators may be negated by the NOT operator, as shown by the WHERE clauses listed in Table 7-7.

Images

Table 7-7 Conditions Negated by the NOT Operator

The NOT operator negates the comparison operator in a condition, whether it’s an equality, inequality, range-based, pattern matching, set membership, or null testing operator.

Precedence Rules

Arithmetic, character, comparison, and Boolean expressions were examined in the context of the WHERE clause. But how do these operators interact with each other? Table 7-8 shows the precedence hierarchy for the previously mentioned operators.

Images

Table 7-8 Operator Precedence Hierarchy

Operators at the same level of precedence are evaluated from left to right if they are encountered together in an expression. When the NOT operator modifies the LIKE, IS NULL, and IN comparison operators, their precedence level remains the same as the positive form of these operators.

Consider the following SELECT statement that demonstrates the interaction of various different operators:

Images

The LAST_NAME, SALARY, DEPARTMENT_ID, JOB_ID, and COMMISSION_PCT columns are projected from the EMPLOYEES table based on two discrete conditions. The first condition retrieves the records containing the character a in the LAST_NAME field AND with a SALARY value greater than 200 times the DEPARTMENT_ID value. The product of DEPARTMENT_ID and 200 is processed before the inequality operator since the precedence of multiplication is higher than the inequality comparison.

The second condition fetches those rows with JOB_ID values of either MK_MAN or MK_REP in which COMMISSION_PCT values are not null. For a row to be returned by this query, either the first or second conditions need to be fulfilled. Changing the order of the conditions in the WHERE clause changes its meaning because of the different precedence of the operators. Consider the following query:

Images

There are two composite conditions in this query. The first condition retrieves the records with the character a in the LAST_NAME field AND a SALARY value greater than 100 times the DEPARTMENT_ID value AND where the COMMISSION_PCT value is not null. The second condition fetches those rows with JOB_ID values of MK_MAN. A row is returned by this query if it conforms to either condition 1 OR condition 2, but not necessarily to both.

Sort the Rows Retrieved by a Query

The usability of the retrieved datasets may be significantly enhanced with a mechanism to order or sort the information. Information may be sorted alphabetically, numerically, and chronologically in ascending or descending order. Further, the data may be sorted by one or more columns, including columns that are not listed in the SELECT clause. Sorting is usually performed once the results of a SELECT statement have been fetched. The sorting parameters do not influence the records returned by a query, just the presentation of the results. Exactly the same rows are returned by a statement including a sort clause as are returned by a statement excluding a sort clause. Only the ordering of the output may differ. Sorting the results of a query is accomplished using the ORDER BY clause.

The ORDER BY Clause

The ORDER BY clause is usually the last clause in a SELECT statement. The format of the ORDER BY clause in the context of the SQL SELECT statement is as follows:

Images

Ascending and Descending Sorting

Ascending sort order is natural for most types of data and is therefore the default sort order used whenever the ORDER BY clause is specified. An ascending sort order for numbers is lowest to highest, while it is earliest to latest for dates and alphabetically for characters. The first form of the ORDER BY clause shows that results of a query may be sorted by one or more columns or expressions.

Suppose that a report is requested that must contain an employee’s LAST_NAME, HIRE_DATE, and SALARY information, sorted alphabetically by the LAST_NAME column for all sales representatives and marketing managers. This report could be extracted with the following:

Images

The data selected may be ordered by any of the columns from the tables in the FROM clause, including those that do not appear in the SELECT list. By appending the keyword DESC to the ORDER BY clause, rows are returned sorted in descending order. The optional NULLS LAST keywords specify that if the sort column contains null values, then these rows are to be listed last after sorting the remaining NOT NULL values. To specify that rows with null values in the sort column should be displayed first, append the NULLS FIRST keywords to the ORDER BY clause. A dataset may be sorted based on an expression as follows:

The smallest TENURE value appears first in the output since the ORDER BY clause specifies that the results will be sorted by the expression alias. Note that the results could be sorted by the explicit expression and the alias could be omitted, but using aliases renders the query easier to read.

Several implicit default options are selected when you use the ORDER BY clause. The most important of these is that unless DESC is specified, the sort order is assumed to be ascending. If null values occur in the sort column, the default sort order is assumed to be NULLS LAST for ascending sorts and NULLS FIRST for descending sorts. If no ORDER BY clause is specified, the same query executed at different times may return the same set of results in different row order, so no assumptions should be made regarding the default row order.

Positional Sorting

Oracle offers an alternative, shorter way to specify the sort column or expression. Instead of specifying the column name, the position of the column as it occurs in the SELECT list is appended to the ORDER BY clause. Consider the following example:

The ORDER BY clause specifies the numeric literal 2. This is equivalent to specifying ORDER BY HIRE_DATE since that is the second column in the SELECT clause. Positional sorting applies only to columns in the SELECT list.

Composite Sorting

Results may be sorted by more than one column using composite sorting. Multiple columns may be specified (either literally or positionally) as the composite sort key by comma-separating them in the ORDER BY clause. To fetch the JOB_ID, LAST_NAME, SALARY, and HIRE_DATE values from the EMPLOYEES table such that the results must be sorted in reverse alphabetical order by JOB_ID first, then in ascending alphabetical order by LAST_NAME, and finally in numerically descending order based on the SALARY column, you can run the following query:

Exercise 7-4: Use the ORDER BY Clause The JOBS table contains descriptions of different types of jobs an employee in the organization may occupy. It contains the JOB_ID, JOB_TITLE, MIN_SALARY, and MAX_SALARY columns. You are required to write a query that extracts the JOB_TITLE, MIN_SALARY, and MAX_SALARY columns, as well as an expression called VARIANCE, which is the difference between the MAX_SALARY and MIN_SALARY values for each row. The results must include only JOB_TITLE values that contain either the word President or Manager. Sort the list in descending order based on the VARIANCE expression. If more than one row has the same VARIANCE value, then, in addition, sort these rows by JOB_TITLE in reverse alphabetic order.

1. Start SQL Developer and connect to the HR schema.

2. Sorting is accomplished with the ORDER BY clause. Composite sorting is required using both the VARIANCE expression and the JOB_TITLE column in descending order.

3. Executing this statement returns a set of results matching the request:

Images

The SQL Row-Limiting Clause

Many analytic queries require only a portion of a dataset, such as the top three salespeople in an organization or the four most recently employed staff members. The row-limiting clause of the SELECT statement allows a dataset to be segmented in various ways. The format of the row-limiting clause in the context of the SQL SELECT statement is as follows:

Images

The OFFSET clause specifies the number of rows to skip to get to the start position in a results set. OFFSET defaults to 0 in the absence of an OFFSET clause or if a NULL, negative, or offset that is larger than the total rows in a results set is provided. The ROW and ROWS keywords have identical meaning, and both singular and plural forms are provided to make your OFFSET clause read more naturally.

The FETCH clause specifies either an exact number_of_rows or a percentage_of_rows to return. In the absence of this clause, the entire result set is returned, starting from row number OFFSET+1 to the last row. The FIRST and NEXT keywords also serve to clarify semantics but have identical meaning. Fetching the FIRST 2 rows from OFFSET 100 in a dataset retrieves the same two rows as fetching the NEXT 2 rows from OFFSET 100.

The number_of_rows option of the FETCH clause defaults to 0 if a NULL or negative row count is specified. Mixed fractions provided as number_of_rows are truncated, and only the whole number portion is used as this parameter. If the number_of_rows value is larger than the number of rows remaining starting from position OFFSET + 1, all rows in a results set are returned. The percentage_of_rows option must be a number, and a NULL or negative value defaults to 0. This option specifies the percentage of the result set to retrieve.

The ROW or ROWS keywords again have identical meaning, and both singular and plural forms are provided to make your FETCH clause read more naturally.

The keywords ONLY or WITH TIES should be read together with the ROW/ROWS keywords. ROWS ONLY retrieves the exact number or percentage of rows specified. Retrieving ROWS WITH TIES fetches any additional rows with the same sort key as the last row retrieved but has any effect only if an ORDER BY clause is specified.

As Figure 7-10 shows, the first query fetches the 7 rows after it retrieves all 107 rows from the EMPLOYEES table sorted in default ascending order by salary, skips 100 rows because of the offset clause, and finally returns the last names and salaries of the top 7 highest-paid employees. The second query prunes this list by fetching the first 5 of these 7 rows starting at row 101 (OFFSET + 1). Note how the final query returns an additional row to the second query since employees De Haan and Kochhar both earn the same salary ($17,000) and are thus tied with the same sort key.

Images

Figure 7-10 The SQL row-limiting clause

Ampersand Substitution

As you develop and perfect SQL statements, you can save them for future use. It is sometimes desirable to have a generic form of a statement that has a variable or placeholder defined that can be substituted at run-time. SQL*Plus offers this functionality in the form of ampersand substitution. Every element of the SELECT statement may be substituted, and the reduction of queries to their core elements to facilitate reuse can save you hours of tedious and repetitive work. This section examines substitution variables and the DEFINE and VERIFY commands.

Substitution Variables

Substitution variables may be regarded as placeholders. A SQL query is composed of two or more clauses. Each clause can be divided into subclauses, which are in turn made up of character text. Any text, subclause, or clause element is a candidate for substitution.

Single Ampersand Substitution

The most basic and popular form of SQL element is single ampersand substitution. The ampersand character (&) is the symbol chosen to designate a substitution variable in a statement and precedes the variable name with no spaces between them. When the statement is executed, the SQL*Plus client processes the statement, notices a substitution variable, and attempts to resolve this variable’s value in one of two ways. First, it checks whether the variable is defined in the user session. (The DEFINE command is discussed later in this chapter.) If the variable is not defined, the user process prompts for a value that will be substituted in place of the variable. Once a value is submitted, the statement is complete and is executed by the Oracle server. The ampersand substitution variable is resolved at execution time and is sometimes known as run-time binding or run-time substitution.

You may be required to look up contact information like PHONE_NUMBER data given either LAST_NAME or EMPLOYEE_ID values. This generic query may be written as follows:

When running this query, SQL*Plus prompts you to input a value for the variable called LASTNAME. You enter an employee’s last name, if you know it, for example, King. If you don’t know the last name but know the employee ID number, you can type in any value and press the ENTER key to submit the value. Oracle then prompts you to enter a value for the EMPNO variable. After typing in a value, for example, 0, and hitting ENTER, there are no remaining substitution variables for Oracle to resolve, and the following statement is executed:

Variables can be assigned any alphanumeric name that is a valid identifier name. The literal you substitute when prompted for a variable must be an appropriate data type for that context; otherwise, an “ORA-00904: invalid identifier” error is returned. If the variable is meant to substitute a character or date value, the literal needs to be enclosed in single quotes. A useful technique is to enclose the ampersand substitution variable in single quotes when dealing with character and date values. In this way, the user is required to submit a literal value without worrying about enclosing it in quotes.

Double Ampersand Substitution

When a substitution variable is referenced multiple times in the same query, Oracle will prompt you to enter a value for every occurrence of the single ampersand substitution variable. For complex scripts, this can be inefficient and tedious. The following statement retrieves the FIRST_NAME and LAST_NAME data from the EMPLOYEES table for those rows that contain the same set of characters in both these fields:

The two conditions are identical but apply to different columns. When this statement is executed, you are first prompted to enter a substitution value for the SEARCH variable used in the comparison with the LAST_NAME column. Thereafter, you are prompted to enter a substitution value for the SEARCH variable used in the comparison with the FIRST_NAME column. This poses two problems. First, it is inefficient to enter the same value twice, but second and more important, typographical errors may confound the query since Oracle does not verify that the same literal value is entered each time substitution variables with the same name are used. In this example, the logical assumption is that the contents of the variables substituted should be the same, but the fact that the variables have the same name has no meaning to the Oracle server, and it makes no such assumption. The first example in Figure 7-11 shows the results of running the preceding query and submitting two distinct values for the SEARCH substitution variable. In this particular example, the results are incorrect since the requirement was to retrieve FIRST_NAME and LAST_NAME pairs that contained the identical string of characters.

Images

Figure 7-11 Double ampersand substitution

When a substitution variable is referenced multiple times in the same query and your intention is that the variable must have the same value at each occurrence in the statement, it is preferable to make use of double ampersand substitution. This involves prefixing the first occurrence of the substitution variable that occurs multiple times in a query with two ampersand symbols instead of one. When Oracle encounters a double ampersand substitution variable, a session value is defined for that variable, and you are not prompted to enter a value to be substituted for this variable in subsequent references.

The second example in Figure 7-11 demonstrates how the SEARCH variable is preceded by two ampersands in the condition with the LAST_NAME column and thereafter is prefixed by one ampersand in the condition with the FIRST_NAME column. When the statement is executed, you are prompted to enter a value to be substituted for the SEARCH variable only once for the condition with the LAST_NAME column. This value is then automatically resolved from the session value of the variable in subsequent references to it, as in the condition with the FIRST_NAME column. To undefine the SEARCH variable, you need to use the UNDEFINE command described later in this chapter.

Images

TIP Whether you work as a developer, database administrator, or business end user, all SQL queries you encounter may be broadly classified as either ad hoc or repeated queries. Ad hoc queries are usually one-off statements written during some data investigation exercise that are unlikely to be reused. The repeated queries are those that are run frequently or periodically, which are usually saved as script files and run with little to no modification whenever required. Reuse prevents costly redevelopment time and allows these consistent queries to potentially benefit from Oracle’s native automatic tuning features geared toward improving query performance.

Substituting Column Names

Literal elements of the WHERE clause have been the focus of the discussion on substitution thus far, but virtually any element of a SQL statement is a candidate for substitution. In the following statement, the FIRST_NAME and JOB_ID columns are static and will always be retrieved, but the third column selected is variable and specified as a substitution variable named COL. The result set is further sorted by this variable column in the ORDER BY clause.

Images

Unlike character and date literals, column name references do not require single quotes either when explicitly specified or when substituted via ampersand substitution.

Substituting Expressions and Text

Almost any element of a SQL statement may be substituted at run-time. The constraint is that Oracle requires at least the first word to be static. In the case of the SELECT statement, at a minimum, the SELECT keyword is required, and the remainder of the statement may be substituted as follows:

When the statement is executed, you are prompted to submit a value for the variable called REST_OF_STATEMENT, which, when appended to the SELECT keyword, is any legitimate query. Useful candidates for ampersand substitution are statements that are run multiple times and differ slightly from each other.

Define and Verify

Double ampersand substitution is used to avoid repetitive input when the same variable occurs multiple times in a statement. When a double ampersand substitution occurs, the variable is stored as a session variable. As the statement executes, all further occurrences of the variable are automatically resolved using the stored session variable. Any subsequent executions of the statement within the same session automatically resolve the substitution variables from stored session values. This is not always desirable and indeed limits the usefulness of substitution variables. Oracle does, however, provide a mechanism to undefine these session variables. The VERIFY command is specific to SQL*Plus and controls whether substituted elements are echoed on the user’s screen prior to executing a SQL statement that uses substitution variables.

The DEFINE and UNDEFINE Commands

Session-level variables are implicitly created when they are initially referenced in SQL statements using double ampersand substitution. They persist or remain available for the duration of the session or until they are explicitly undefined. A session ends when the user exits their client tool like SQL*Plus or when the user process is terminated.

The problem with persistent session variables is they tend to detract from the generic nature of statements that use ampersand substitution variables. Fortunately, these session variables can be removed with the UNDEFINE command. Within a script or at the command line of SQL*Plus or SQL Developer, the syntax to undefine session variables is as follows:

Consider a simple generic example that selects a static and variable column from the EMPLOYEES table and sorts the output based on the variable column.

The first time this statement executes, you are prompted to supply a value for the COLNAME variable. Assume you enter SALARY. This value is substituted, and the statement executes. A subsequent execution of this statement within the same session does not prompt for any COLNAME values since it is already defined as SALARY in the context of this session and can be undefined only with the UNDEFINE COLNAME command. Once the variable has been undefined, the next execution of the statement prompts the user for a value for the COLNAME variable.

The DEFINE command serves two purposes. It can be used to retrieve a list of all the variables currently defined in your SQL session; it can also be used to explicitly define a value for a variable referenced as a substitution variable by one or more statements during the lifetime of that session. The syntaxes for the two variants of the DEFINE command are as follows:

Support of session-persistent variables may be switched off and on as required using the SET DEFINE OFF command. The SET command is not a SQL language command, but rather a SQL environment control command. When you specify SET DEFINE OFF, the client tool (for example, SQL*Plus) does not save session variables or attach special meaning to the ampersand symbol. This allows the ampersand symbol to be used as an ordinary literal character if necessary. The SET DEFINE ON|OFF command therefore determines whether ampersand substitution is available in your session. The following query uses the ampersand symbol as a literal value. When it is executed, you are prompted to submit a value for the bind variable SID.

By turning off the ampersand substitution functionality, this query can be executed without prompts.

Once the statement executes, you can use the SET DEFINE ON command to switch the substitution functionality back on. If DEFINE is OFF and the context that an ampersand is used in a statement cannot be resolved literally, Oracle returns an error.

The VERIFY Command

Two categories of commands are available when dealing with the Oracle server: SQL language commands and the SQL client control commands. The SELECT statement is a language command, while the SET command controls the SQL client environment. There are many different language and control commands available, but the control commands relevant to substitution are DEFINE and VERIFY.

The VERIFY command controls whether the substitution variable submitted is displayed onscreen so that you can verify that the correct substitution has occurred. A message is displayed showing the old clause followed by the new clause containing the substituted value. The VERIFY command is switched ON and OFF with the command SET VERIFY ON|OFF. If VERIFY is first switched OFF and a query that uses ampersand substitution is executed, you are prompted to input a value. The value is then substituted, the statement runs, and its results are displayed. If VERIFY is then switched ON and the same query is executed, once you input a value but before the statement commences execution, Oracle displays the clause containing the reference to the substitution variable as the old clause with its line number, and, immediately below this, the new clause displays the statement containing the substituted value.

Exercise 7-5: Using Ampersand Substitution You are required to write a reusable query using the current tax rate and the EMPLOYEE_ID number as inputs and return the EMPLOYEE_ID, FIRST_NAME, SALARY, ANNUAL SALARY (SALARY * 12), TAX_RATE, and TAX (TAX_RATE * ANNUAL SALARY) information for use by the HR department clerks.

1. Start SQL*Plus and connect to the HR schema.

2. The select list must include the four specified columns as well as two expressions. The first expression, aliased as ANNUAL SALARY, is a simple calculation, while the second expression, aliased as TAX, depends on the TAX_RATE value. Since TAX RATE may vary, this value must be substituted at run-time.

3. A possible solution is as follows:

Images

4. The double ampersand preceding EMPLOYEE_ID and TAX_RATE in the SELECT clause stipulates to Oracle that when the statement is executed, the user must be prompted to submit a value for each substitution variable that will be used wherever they are subsequently referenced as &EMPLOYEE_ID and &TAX_RATE, respectively.

Two-Minute Drill

Explain the Capabilities of SQL SELECT Statements

• The three fundamental operations that SELECT statements are capable of are projection, selection, and joining.

• Projection refers to the restriction of columns selected from a table. Using projection, you retrieve only the columns of interest and not every possible column.

• Selection refers to the extraction of rows from a table. Selection includes the further restriction of the extracted rows based on various criteria or conditions. This allows you to retrieve only the rows that are of interest and not every row in the table.

• Joining involves linking two or more tables based on common attributes. Joining allows data to be stored in third normal form in discrete tables instead of in one large table.

• The DESCRIBE command lists the names, data types, and nullable status of all columns in a table.

Execute a Basic SELECT Statement

• The SELECT clause determines the projection of columns. In other words, the SELECT clause specifies which columns are included in the results returned.

• The DISTINCT keyword preceding items in the SELECT clause causes duplicate combinations of these items to be excluded from the returned results set.

• Expressions and regular columns may be aliased by using the AS keyword or by leaving a space between the column or expression and the alias.

• Naturally occurring single quotes in a character literal can be selected by making use of either an additional single quote per naturally occurring quote or the alternative quote operator.

Limit the Rows Retrieved by a Query

• One or more conditions constitute a WHERE clause. These conditions specify rules to which the data in a row must conform to be eligible for selection.

• For each row tested in a condition, there are terms on the left and right of a comparison operator. Terms in a condition can be column values, literals, or expressions.

• Comparison operators may test two terms in many ways. Equality and inequality tests are common, but range, set, and pattern comparisons are also available.

• Boolean operators include the AND, OR, and NOT operators. The AND and OR operators enable multiple conditional clauses to be specified. These are sometimes referred to as multiple WHERE clauses.

• The NOT operator negates the comparison operator involved in a condition.

Sort the Rows Retrieved by a Query

• Results are sorted using the ORDER BY clause. Rows retrieved may be ordered according to one or more columns by specifying either the column names or their numeric position in the SELECT clause.

• The sorted output may be arranged in descending or ascending order using the DESC or ASC modifier after the sort terms in the ORDER BY clause.

• The row-limiting clause allows a portion of a dataset to be fetched. An optional OFFSET may be provided to indicate the start position of the row-limited set. Either an absolute number or a percentage of rows may be requested. If the WITH TIES option is specified, all rows with the same sort keys as the last row of the row-limited set will additionally be fetched.

Ampersand Substitution

• Ampersand substitution facilitates SQL statement reuse by providing a means to substitute elements of a statement at run-time. The same SQL statement may therefore be run multiple times with different input parameters.

• Session-persistent variables may be set explicitly using the DEFINE command. The UNDEFINE command may be used to unset both implicitly (double ampersand substitution) and explicitly defined session variables.

• The VERIFY environmental setting controls whether SQL*Plus displays the old and new versions of statement lines that contain substitution variables.

Self Test

1. Which query will create a projection of the DEPARTMENT_NAME and LOCATION_ID columns from the DEPARTMENTS table? (Choose the best answer.)

A. SELECT DISTINCT DEPARTMENT_NAME, LOCATION_ID FROM DEPARTMENTS;

B. SELECT DEPARTMENT_NAME, LOCATION_ID FROM DEPARTMENTS;

C. SELECT DEPT_NAME, LOC_ID FROM DEPT;

D. SELECT DEPARTMENT_NAME AS “LOCATION_ID” FROM DEPARTMENTS;

2. After describing the EMPLOYEES table, you discover that the SALARY column has a data type of NUMBER(8,2). Which SALARY values will not be permitted in this column? (Choose the best answers.)

A. SALARY=12345678

B. SALARY=123456.78

C. SALARY=1234567.8

D. SALARY=123456

E. SALARY=12.34

3. After describing the JOB_HISTORY table, you discover that the START_DATE and END_DATE columns have a data type of DATE. Consider the expression END_DATE – START_DATE. Choose two correct statements regarding this expression.

A. A value of DATE data type is returned.

B. A value of type NUMBER is returned.

C. A value of type VARCHAR2 is returned.

D. The expression is invalid since arithmetic cannot be performed on columns with DATE data types.

E. The expression is valid since arithmetic can be performed on columns with DATE data types.

4. Which statement reports on unique JOB_ID values from the EMPLOYEES table? (Choose the best answer.)

A. SELECT JOB_ID FROM EMPLOYEES;

B. SELECT UNIQUE JOB_ID FROM EMPLOYEES;

C. SELECT DISTINCT JOB_ID, EMPLOYEE_ID FROM EMPLOYEES;

D. SELECT DISTINCT JOB_ID FROM EMPLOYEES;

5. Choose the two illegal statements. The two correct statements produce identical results. The two illegal statements will cause an error to be raised.

A. SELECT DEPARTMENT_ID|| ' represents the '|| DEPARTMENT_NAME||' Department' as "Department Info" FROM DEPARTMENTS;

B. SELECT DEPARTMENT_ID|| ' represents the || DEPARTMENT_NAME||' Department' as "Department Info" FROM DEPARTMENTS;

C. select department_id|| ' represents the '||department_name|| ' Department' "Department Info" from departments;

D. SELECT DEPARTMENT_ID represents the DEPARTMENT_NAME Department as "Department Info" FROM DEPARTMENTS;

6. Which two clauses of the SELECT statement facilitate selection and projection? (Choose the best answer.)

A. SELECT, FROM

B. ORDER BY, WHERE

C. SELECT, WHERE

D. SELECT, ORDER BY

7. Choose the WHERE clause that extracts the DEPARTMENT_NAME values containing the character literal er from the DEPARTMENTS table. The SELECT and FROM clauses are SELECT DEPARTMENT_NAME FROM DEPARTMENTS. (Choose the best answer.)

A. WHERE DEPARTMENT_NAME IN ('%e%r');

B. WHERE DEPARTMENT_NAME LIKE '%er%';

C. WHERE DEPARTMENT_NAME BETWEEN 'e' AND 'r';

D. WHERE DEPARTMENT_NAME CONTAINS 'e%r'

8. Which of the following conditions are equivalent to each other? (Choose all that apply.)

A. WHERE SALARY <=5000 AND SALARY >=2000

B. WHERE SALARY IN (2000,3000,4000,5000)

C. WHERE SALARY BETWEEN 2000 AND 5000

D. WHERE SALARY > 2000 AND SALARY < 5000

E. WHERE SALARY >=2000 AND <=5000

9. Choose two false statements about the ORDER BY clause. (Choose the best answer.)

A. When using the ORDER BY clause, it must always be accompanied by a row-limiting clause.

B. The ORDER BY clause may appear in a SELECT statement that does not contain a WHERE clause.

C. The ORDER BY clause specifies one or more terms by which the retrieved rows are sorted. These terms can only be column names.

D. Positional sorting is accomplished by specifying in the ORDER BY clause the numeric position of a column as it appears in the SELECT list.

10. When using ampersand substitution variables in the following query, how many times will you be prompted to input a value for the variable called JOB the first time this query is executed?

SELECT FIRST_NAME, '&JOB'
FROM EMPLOYEES
WHERE JOB_ID LIKE '%'||&JOB||'%'
AND '&&JOB' BETWEEN 'A' AND 'Z';

(Choose the best answer.)

A. 0

B. 1

C. 2

D. 3

11. When using a row-limiting clause in the following query, what employee information is returned?

SELECT employee_id, first_name
FROM employees
ORDER BY employee_id
OFFSET 5 ROWS FETCH NEXT 5 ROWS ONLY;

(Choose the best answer.)

A. The five employees with the highest employee_id

B. The five employees with the lowest employee_id

C. The five employees with the 6^th to 10^th lowest employee_id

D. The five employees with the 6^th to 10^th highest employee_id

E. The ten employees with the lowest employee_id

F. None of the above

Self Test Answers

1. B. A projection is an intentional restriction of the columns returned from a table.
A, C, and D are incorrect. A is eliminated since the question has nothing to do with duplicates, distinctiveness, or uniqueness of data. C incorrectly selects nonexistent columns called DEPT_NAME and LOC_ID from a nonexistent table called DEPT. D returns just one of the requested columns: DEPARTMENT_NAME. Instead of additionally projecting the LOCATION_ID column from the DEPARTMENTS table, it attempts to alias the DEPARTMENT_NAME column as LOCATION_ID.

2. A and C. Columns with the NUMBER(8,2) data type can store, at most, eight digits, of which, at most, two of those digits are to the right of the decimal point. Although A and C are the correct answers, note that since the question is phrased in the negative, these values are not allowed to be stored in such a column. A and C are not allowed because they contain eight and seven whole number digits, respectively, but the data type is constrained to store six whole number digits and two fractional digits.
B, D, and E are incorrect. B, D, and E can legitimately be stored in this data type and, therefore, are the incorrect answers to this question. D shows that numbers with no fractional part are legitimate values for this column, as long as the number of digits in the whole number portion does not exceed six digits.

3. B and E. The result of arithmetic between two date values represents a certain number of days.
A, C, and D are incorrect. It is a common mistake to expect the result of arithmetic between two date values to be a date as well, so A may seem plausible, but it is false.

4. D. Unique JOB_ID values are projected from the EMPLOYEES table by applying the DISTINCT keyword to just the JOB_ID column.
A, B, and C are incorrect. A returns an unrestricted list of JOB_ID values including duplicates, B makes use of the UNIQUE keyword in the incorrect context, and C selects the distinct combination of JOB_ID and EMPLOYEE_ID values. This has the effect of returning all the rows from the EMPLOYEES table, since the EMPLOYEE_ID column contains unique values for each employee record. Additionally, C returns two columns, which is not what was originally requested.

5. B and D. B and D represent the two illegal statements that will return syntax errors if they are executed. This is a tricky question because it asks for the illegal statements and not the legal statements. B is illegal because it is missing a single quote enclosing the character literal represents the. D is illegal because it does not make use of single quotes to enclose its character literals.
A and C are incorrect. A and C are the legal statements and, therefore, in the context of the question, are the incorrect answers. A and C appear to be different, since the SQL statements differ in case and A uses the alias keyword AS, whereas C just leaves a space between the expression and the alias. Yet both A and C produce identical results.

6. C. The SELECT clause facilitates projection by specifying the list of columns to be projected from a table, while the WHERE clause facilitates selection by limiting the rows retrieved based on its conditions.
A, B, and D are incorrect. A, B, and D are incorrect because the FROM clause specifies the source of the rows being projected and the ORDER BY clause is used for sorting the selected rows.

7. B. The LIKE operator tests the DEPARTMENT_NAME column of each row for values that contain the characters er. The percentage symbols before and after the character literal indicate that any characters enclosing the er literal are permissible.
A, C, and D are incorrect. A and C are syntactically correct. A uses the IN operator, which is used to test set membership. C tests whether the alphabetic value of the DEPARTMENT_NAME column is between the letter e and the letter r. Finally, D uses the word contains, which cannot be used in this context.

8. A and C. Each of these conditions tests for SALARY values in the range of $2,000 to $5,000.
B, D, and E are incorrect. B excludes values like $2,500 from its set. D excludes the boundary values of $2,000 and $5,000, and E is illegal since it is missing the SALARY column name reference after the AND operator.

9. A and C. A row-limiting clause is optional and the terms specified in an ORDER BY clause can include column names, positional sorting, numeric values, and expressions.
B and D are incorrect because they are true.

10. D. The first time this statement is executed, two single ampersand substitution variables are encountered before the third double ampersand substitution variable. If the first reference on the first line of the query contained a double ampersand substitution, you would be prompted to input a value only once.
A, B, and C are incorrect. These are incorrect since you are prompted three times to input a value for the JOB substitution variable. In subsequent executions of this statement in the same session, you will not be prompted to input a value for this variable.

11. C. The results are sorted from lowest to highest employee_id. The OFFSET 5 clause limits the rows returned to the second set of five employee records.
A, B, D and E are incorrect. The OFFSET and ORDER_BY clauses determine which segment of the results set is extracted and the sort order, while the FETCH clause determines how many rows are ultimately retrieved.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Chapter 7 Retrieving, Restricting, and Sorting Data Using SQL

Create new playlist

Sign In

Sign Up

CHAPTER 7

Retrieving, Restricting, and Sorting Data Using SQL

Exam Objectives

Explain the Capabilities of SQL SELECT Statements

Introducing the SQL SELECT Statement

The DESCRIBE Table Command

Capabilities of the SELECT Statement

Data Normalization

Real-World Scenarios

Data Modeling

Entities and Relations

Primary Keys

Relationships

Referential Integrity and Foreign Keys

Rows and Tables

Create the Demonstration Schemas

Users and Schemas

The HR and OE Schemas

Demonstration Schema Creation

Execute a Basic SELECT Statement

Syntax of the Primitive SELECT Statement

Rules Are Meant to Be Followed

Uppercase or Lowercase

Statement Terminators

Indentation, Readability, and Good Practice

SQL Expressions and Operators

Arithmetic Operators

Expression and Column Aliasing

Character and String Concatenation Operator

Literals and the DUAL Table

Two Single Quotes or the Alternative Quote Operator

NULL Is Nothing

Not Null and Nullable Columns

Foreign Keys and Nullable Columns

Limit the Rows Retrieved by a Query

The WHERE Clause

Numeric-Based Conditions

Character-Based Conditions

Date-Based Conditions

Comparison Operators

Equality and Inequality

Range Comparison with the BETWEEN Operator

Set Comparison with the IN Operator

Pattern Comparison with the LIKE Operator

NULL Comparison with the IS NULL Operator

Boolean Operators

The AND Operator

The OR Operator

The NOT Operator

Precedence Rules

Sort the Rows Retrieved by a Query

The ORDER BY Clause

Ascending and Descending Sorting

Positional Sorting

Composite Sorting

The SQL Row-Limiting Clause

Ampersand Substitution

Substitution Variables

Single Ampersand Substitution

Double Ampersand Substitution

Substituting Column Names

Substituting Expressions and Text

Define and Verify

The DEFINE and UNDEFINE Commands

The VERIFY Command

Two-Minute Drill

Explain the Capabilities of SQL SELECT Statements

Execute a Basic SELECT Statement

Limit the Rows Retrieved by a Query

Sort the Rows Retrieved by a Query

Ampersand Substitution

Self Test

Self Test Answers

Table of Contents for
Chapter 7 Retrieving, Restricting, and Sorting Data Using SQL