15.4 Content Management Systems

As sites evolved in size and complexity, so did the problem of keeping pages up-to-date and consistent. Sites based on HTML text files face an overwhelming task when updating the overall site appearance, because an update may require individual changes to every page file.

Instead, most modern sites use a content management system (CMS) to manage the site’s contents and construct web pages. A CMS-run site is a truly dynamic site; when a user requests a page, the CMS constructs it based on easy-to-change data and parameters.

Instead of storing the website’s content in individual HTML page files, the site stores its content in a database. The CMS runs scripts to build pages based on data retrieved from the database. As shown in FIGURE 15.20, a CMS-based web host contains five separate parts:

  1. Protocol stack software—appears in every operating system that hosts web pages.

  2. Web server software—appears in every computer that hosts web pages. This implements static HTML, SSL security, and can pass a script to a separate interpreter for execution.

  3. CMS software—receives GET, POST, and other HTTP requests and returns a web page in response.

  4. Database management system (DBMS) software—stores and retrieves data in a structured manner. This provides a much easier way to handle a variety of site data than a file system by itself.

  5. Database—a set of files on the hard drive that contain the site’s data, organized and managed by the DBMS.

An illustration depicts Web content management system.

FIGURE 15.20 Web content management system.

CMS packages often consist of a set of scripts written in an interpreted language. Many are written in PHP, including Drupal, Joomla, and WordPress. When the browser requests a page from a site running one of these packages, the link leads to a PHP script. The PHP interpreter runs the script, which constructs the page’s HTML text.

Although there are many commercial CMS systems available, numerous websites rely on open-source software. The underlying elements occur so often together that they have earned the acronym LAMP, which stands for Linux, Apache, MySQL, PHP. A web server based on LAMP uses the components as follows:

  • ■   Linux for the server’s operating system

  • ■   Apache for the web server software

  • ■   MySQL for the DBMS

  • ■   PHP for the web-scripting language

15.4.1 Database Management Systems

A database systematically stores data in a structured manner. Most people keep several databases, whether they are aware of it or not; an address book is a database, for example. A DBMS is a software package with an effective set of tools for collecting, storing, organizing, and analyzing data. The DBMS manages a set of files that contain the database’s data. All operations on the database files take place through the DBMS.

A typical database today organizes its data into tables. Each data table may reside in its own file. Within each table, data is organized into records and fields. FIGURE 15.21 shows two tables from a sample database that helps organize a blogging website.

An illustration depicts sample data tables from a sample database.

FIGURE 15.21 Sample data tables from a sample database.

The left table in the figure (marked “Articles”) contains five records, each for an article written by one of the users. Each record contains three fields: the article number (“ArticleNo”), the date, and the author. The Users table lists users who may have written articles. Modern relational databases provide tools to link records in one table to records in another; this is why arrows link the articles to the authors’ entries in the Users table. The field that identifies one record so that others may link to it is often called a key field. The UserID field in the Users table is a key field.

History: Most of the practical problems solved by computers involve databases. Programmers have labored on database problems since the earliest days of electronic computers. In 1951, computer RAMs were so small that they couldn’t support modern DBMS functions. Instead, programmers constructed small programs to perform individual DBMS operations. Each program was handcrafted to search for specific fields and sort the data.

This problem led to the first computer program, which itself wrote another computer program. The program, called the “Sort Merge Generator,” was written in 1951 by Frances “Betty” Holberton (1917–2001). Holberton’s program took a description of the key fields in one or more database files and produced a program to merge the files, sorted by their key fields.

Individual database tables are similar to spreadsheets: The rows correspond to records and columns identify fields. Many people routinely use spreadsheets to solve database problems.

Structured Query Language

When a program uses a DBMS, it transmits its requests through a specially designed API. Most use a text-oriented interface based on the Structured Query Language (SQL), a standard in the database community. SQL supports queries to search and extract data from the database, plus other operations to add or update data in the database.

The DBMS generally operates as a separate process. Other programs connect to the DBMS through a socket-like interface. The SQL commands arrive via a connection between the program and the DBMS. When the program establishes the connection, it provides a user ID and selects the database to use. The DBMS will check the access permissions for that user ID and establish the connection if allowed.

FIGURE 15.22 shows a typical SQL query using the SELECT operation. We choose the arguments to SELECT according to what data we want to extract from the database. In the figure, the operation takes place on the database from Figure 15.21. A query’s result usually produces a data table.

An example of a SELECT command in SQL is shown.

FIGURE 15.22 Example of a SELECT command in SQL.

The SELECT example contains four arguments. The first argument lists the fields returned in the result. There may be one or more fields returned. If the query indicates “*” instead of a list of fields, then the query returns all fields.

The second argument is optional. The argument, prefixed by INTO, names a table that receives the query’s results. The third argument, prefixed by FROM, indicates the table containing the data to be queried. A single database may contain several tables, so this argument always appears.

The final argument, prefixed by WHERE, provides the criteria for selecting records to include in the answer. If we omit WHERE, then the query returns all records and only includes the fields listed in the first argument. The WHERE clause usually compares the contents of a named field in a particular record and accepts that record if the comparison succeeds. A single WHERE clause may perform several comparisons, combined with the logical operators “AND” and “OR.”

If we perform the query in Figure 15.22 on the sample tables in Figure 15.21, the result yields a table named Temp. The table contains three records, each with a single ArticleNo field.

Enterprise Databases

Many modern organizations maintain a data warehouse, which contains a copy of all of the organization’s transaction data for the purpose of analysis and reporting. The analysis process is sometimes called data mining, in which analysts seek insight into the organization’s operations by studying large quantities of detailed data.

Although not all organizations structure their databases into a data warehouse, all organizations rely heavily on collecting and analyzing enterprise data. To publish an accurate balance sheet for the corporate annual report, the enterprise must systematically collect and analyze the enterprise’s financial information.

Database Security

In general, database security relies on the same general concepts and mechanisms as file system security. Most databases require identification and authentication before they accept queries or other operations from remote sources. Sophisticated database systems also maintain “database user” tables that associate individual users with specific access rights. On a database, access rights are similar to those of files and directories described in Section 3.1: the right to create, modify, or delete items and collections thereof.

Databases may, however, enforce more detailed access restrictions. Even though access controls to files might be analogous to those for accessing tables, a database also may restrict a user’s right to see or modify particular fields in particular tables. This is similar to allowing a user to modify a spreadsheet, but restricting which columns may be modified.

Important challenges in database security are embodied in the terms aggregation and inference. If we forbid Eve from looking at certain fields, but still allow her to make queries including those fields, then she may be able to infer the values of hidden fields. For example, we might hide salaries from Eve, but she may still infer salary information by making clever queries.

15.4.2 Password Checking: A CMS Example

In a CMS, the DBMS stores information about the site’s format and page layouts as well as the actual contents to appear on individual pages. One table may contain articles or blog posts while another may contain images or other media.

When the CMS receives a web page request, it uses the path name and any arguments provided as input to a database query. The DBMS returns the text of the selected page article to the CMS, which formats the text into an HTML page.

Logging in to a Website

Most modern websites provide different services and capabilities to different people. The general public may see a site’s general contents, but they aren’t allowed to add information or make other changes. Most sites require visitors to register and identify themselves before they are allowed to modify the site. (As of 2019, Wikipedia is probably the most famous of the few internet sites where anonymous visitors may make changes.) When revisiting most sites, registered users must authenticate themselves; this typically requires a password.

The password database, like other databases on a CMS, resides in a database table. The Users table shown earlier in Figure 15.21 contains the basic data we require: a user ID and a password. This example does not use hashed passwords. A properly designed website should hash its passwords.

An Example Login Process

In this example, Alice logs into a CMS-based website. The CMS performs the operation in two parts. The first part displays a web page to collect the user ID and performs a database query to collect the user’s database record. The second part displays a separate web page to collect the password and performs a separate database query to verify it.

FIGURE 15.23 follows the first part of the login process from Alice’s browser through the CMS to the database. The process starts in the upper left of the figure. After Alice navigates to the login screen the following steps occur:

  1. Alice types her user ID “alice” into the login screen and presses “Enter.” The browser has collected Alice’s keystrokes. When it receives the “Enter” key, it starts processing the user ID.

  2. The browser constructs a POST command to transmit Alice’s user ID to the server. The POST command executes the script “checkuserid.php” and passes the value “alice” for the argument userid.

    Although not shown here, the POST command also carries a cookie that ties this POST command to the password Alice will enter later.

  3. The server receives the POST command and runs the “checkuserid” script. This script takes the argument value assigned to userid and inserts it into a database SELECT command. The script sends the SELECT command to the DBMS.

  4. The DBMS executes the SELECT command. The WHERE clause locates Alice’s user record. The DBMS saves the result in a temporary table named Login12. If Login12 contains no records, then the login fails. This happens if the SELECT command doesn’t find any records with a matching user ID.

An illustration depicts the steps for Alice logging into a CMS based website.

FIGURE 15.23 Alice logs into a CMS-based website.

If the Login12 table contains Alice’s record, then the CMS continues with the second part of the login: It sends Alice’s client the password page. The page provides a cookie that associates Alice’s browser with the Login12 temporary database table.

The CMS performs the same general steps a second time to check Alice’s password. The password travels via a POST command and the CMS embeds it in a SELECT command. The CMS uses the cookie from Alice’s POST command to retrieve the Login12 table and complete the login check.

15.4.3 Command Injection Attacks

Although a CMS is naturally vulnerable to the same attacks as any server or other software, it also faces a vulnerability inherent in the basic design. The CMS takes text from a URL and passes it as arguments to scripts. The script interpreter takes that information and interprets it according to the language’s rules. A CMS script often produces output in the form of SQL commands that are passed to the DBMS.

Most programs examine and rearrange data as little as possible. This yields smaller and faster programs. For example, a web form collects a user name. The CMS puts the user name into an SQL query to retrieve the user’s information. In a perfect world, we pass the user name from the URL to the SQL command with no extra muss or fuss.

In practice, however, this opens a serious vulnerability called a command injection attack. In such an attack, the attacker provides additional text in a web page field. This extra text is cleverly designed to look either like commands in the CMS’s own interpreted language (a PHP or Perl command, for example) or SQL statements to be interpreted by the DBMS.

A Password-Oriented Injection Attack

Eve will now masquerade as Alice by using a command injection attack. We use the same CMS login process described in the previous section. This time, however, we will omit password hashing.

When Eve performs the first step of the login process, she provides Alice’s user ID. The process unfolds exactly as shown in Figure 15.23. The server creates the temporary Login12 table and awaits Alice’s password. FIGURE 15.24 follows the command injection attack as it travels from the client to the DBMS:

  1. The website prompts for Alice’s password. Instead of typing in a password, Eve types in a specially crafted text string. The first letter “x” represents a guess at Alice’s password, but it won’t matter what password Eve types there.

  2. The browser takes Eve’s text string and embeds it in a POST command, assigned to the pw argument. The actual text looks like code, because the browser converts it to omit all blanks and URL-like punctuation. The “%27” represents a single quote character and the “+” signs stand in for individual space characters. The POST command also carries a cookie to tell the server which table contains its login record. Most systems do this indirectly; they put a session identifier in the cookie and store the table identifier with other session information.

  3. The CMS converts the POST command into a SELECT command to check Alice’s password. Eve’s text string appears as the underlined text. The CMS sends the command to the DBMS.

  4. The DBMS runs the SELECT command. Eve’s specially crafted text string recasts the WHERE clause into an expression that is always true, regardless of Alice’s password.

An illustration depicts the Login masquerade using a command injection attack.

FIGURE 15.24 Login masquerade using a command injection attack.

This type of attack subverts the Chain of Control by tricking the DBMS into interpreting part of the password as an SQL expression. We saw how a buffer overflow redirects the CPU to execute the wrong instructions (Section 2.3.1); here, the command insertion attack redirects the DBMS to perform the wrong password comparison.

Inside the Injection Attack

FIGURE 15.25 shows the format of the SELECT command used to verify Alice’s password. The FROM clause is filled in from cookie information. The WHERE is completed with the password text.

A SQL command depicts command injection vulnerability.

FIGURE 15.25 An SQL command injection vulnerability.

If the server hashes passwords, it performs the hash before inserting the value into the SELECT command. It is not unusual for a CMS to omit password hashing; some programmers believe it’s safest to handle passwords as little as possible. Imprudence leads to disaster here.

Eve’s injection attack exploits a feature of the SQL WHERE clause: The clause may perform a simple comparison between a single field and a data value, or it may make several comparisons and use the combined results. Eve’s attack transforms the WHERE clause from a simple password comparison into a two-part comparison.

On the left of FIGURE 15.26, we see the simple comparison that happens when Alice types her actual password. On the right, we see the two-part comparison that Eve injects with the typed password.

An illustration with two sections shows password that always matches in SQL.

FIGURE 15.26 A password that always matches in SQL.

The two-part comparison uses the “OR” operation to combine the results of two different comparisons. The first part compares the password against an arbitrary value and will usually be logically false—but the second part of the two-part comparison is always logically true. The “OR” operation combines the two comparisons; if either is true, then the comparison yields a match.

This example uses a very simple technique to pass the temporary table name from the first part of the login to the second part: It places the table name in a cookie. The second step copies the cookie name into the SELECT statement.

This simple approach yields another avenue for command injection. The semicolon character (“;”) marks the end of a command in SQL. Assume that Eve could intercept the HTTP text containing the password, modify it, and send it on its way. She might also attack successfully if she appends a semicolon to the table name and the semicolon is copied into the SELECT command. SQL will treat the semicolon as the end of the statement and may signal success. The dangling WHERE clause may or may not produce an error, depending on the DBMS.

Resisting Website Command Injection

This particular example of command injection relied on plaintext passwords. If the CMS used password hashing, the command injection itself would have been scrambled, defeating the injection.

Large websites, however, may have hundreds or thousands of other text fields. Hashing passwords won’t prevent command injections in those other fields. A clever command injection may retrieve any information from the site’s database, posing serious privacy risks for customers as well as security risks for the site.

Input Validation

Most CMS software tries to eliminate command injections through careful software design. The CMS scripts try to carefully analyze all text typed in by a user to ensure that it carries no hidden threats. These strategies include the following:

  • ■   Avoid embedding any typed-in text directly into a command or script statement. Store the text in a variable and refer to the variable instead.

  • ■   If text must be used in a script, scan it to identify and remove any “escape” characters that might be interpreted by the script language. For example, the password in Figure 15.24 includes single-quote characters that alter the meaning of the SQL statement.

  • ■   If users may post messages to the site (i.e., comments on a blog post), be sure to review the text for risky content. For example, a comment could contain or refer to a malicious script. A user’s browser will then execute the script when viewing the comment. This is a type of XSS attack. Some sites permit users to use a subset of HTML for formatting a comment and filter out all other HTML tags.

  • ■   Be cautious about files uploaded by users. A user could upload a file containing a malicious program; it is much easier to trick the CMS into executing a local file than a remote one.

Site security also depends on the integrity of the CMS scripts. An attacker can penetrate the site easily if the operator uploads scripts from an untrustworthy source. Most CMS packages use add-on modules produced by third parties to provide special features. Be sure that the module authors are reputable and trustworthy.

15.4.4 “Top 10” Web Application Security Risks

The Open Web Application Security Project (OWASP) is a nonprofit organization that develops and shares information and tools about web security. Started in 2001, OWASP has produced tools for web testing, attack demonstration, and vulnerability management. OWASP also publishes guides, standards, and models for security testing and intrusion detection.

Since 2004, OWASP has published a list of the top 10 web application security risks. At the time of this edition, the most recent list was published in 2017. Here is the list of risks:

  1. Injection: This includes both SQL injection described earlier in this section, and other types of command injection. The “finger” vulnerability described in Section 2.3.1 took advantage of an injection risk.

  2. Broken authentication: Authentication and session management are hard to implement correctly. Errors can allow attackers to compromise authentication data and masquerade as a legitimate user.

  3. Sensitive data exposure: Errors in data handling could leak sensitive data. Applications handling financial, medical, or other personally identifiable information must take special care to handle the data securely.

  4. XML external entities (XXE): The application must not retrieve and process XML entities from users or other untrustworthy sources. An XML entity could cause the application to unintentionally access internal file shares, scan ports, or even execute code embedded in the entity. This can leak sensitive data, breach system integrity, or lead to denial of service attacks.

  5. Broken access control: In an ideal world, web applications implement Least Privilege to limit the capabilities of authorized users. In practice, users may have unnecessarily broad access rights. Attackers can exploit this to retrieve other user’s data or perform unauthorized operations.

  6. Security misconfiguration: This is a very common problem. The web application may rely on default configuration settings that provide inadequate protection. Ad hoc changes may grant more access than expected and open possible attack vectors. All components must be securely configured from the start, and kept patched and up-to-date in a timely fashion.

  7. XSS: This attack vector arises if the application constructs a web page from user-supplied data that might include HTML and JS text. When a web browser visits the page, the browser executes the script, which could hijack the browser’s user session, deface websites, or redirect to malicious websites.

  8. Insecure deserialization: This is similar to the XXE risk. Serialization and deserialization are processes for encoding and decoding data objects for transmitting over a network. The application might not control the sources of serialized data, and improperly serialized data could yield remote code execution.

  9. Using components with known vulnerabilities: Software modules from libraries, frameworks, and packages run with the same privileges as the application. If any modules have vulnerabilities, the application inherits those vulnerabilities.

  10. Insufficient logging and monitoring: If a site doesn’t adequately monitor for security incidents, then attacks will go undetected. This gives attackers more time to more thoroughly penetrate the site and extract or tamper with more data. Breaches are often detected by outside parties because of inadequate monitoring.

The OWASP Top 10 document describes these risks in greater detail. It also provides links to strategies to address those risks. These documents are easily found on the web.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset