Structure of the Data Directory

The MySQL data directory contains all of the databases and tables managed by the server. These are organized into a tree structure that is implemented in straightforward fashion by taking advantage of the hierarchical structure of the UNIX or Windows file systems:

  • Each database corresponds to a directory under the data directory.

  • Tables within a database correspond to files in the database

The data directory also contains several status files that are generated by the server, such as the log files. These files provide important information about the server's operation and are valuable for administrators, especially when something goes wrong and you're trying to determine the cause of the problem. If some particular query kills the server, for example, you can identify the offending query by examining the log files.

How the MySQL Server Provides Access to Data

Everything under the data directory is managed by a single entity, the MySQL server mysqld. Client programs never manipulate data directly. Instead, the server provides the sole point of contact though which databases are accessed, acting as the intermediary between client programs and the data they wish to use. (See Figure 10.1.)

Figure 10.1. How the MySQL server controls access to the data directory.


When the server starts up, it opens the log files if any were requested, then presents a network interface to the data directory by listening for network connections. To access data, client programs establish a connection to the server, then communicate requests as SQL queries to perform the desired operations (for example, creating a table, selecting records, updating records). The server performs each operation and sends back the result to the client. The server is multi-threaded and can service multiple simultaneous client connections. However, because update operations are performed one at a time, the practical effect is to serialize requests so that two clients can never change the same record at exactly the same time.

Under normal conditions, having the server act as the sole arbiter of database access provides assurance against the kinds of corruption that can result from multiple processes accessing the database tables at the same time. Nevertheless, administrators should be aware that there are times when the server does not have exclusive control of the data directory:

  • When you run multiple servers on a single data directory. Normally you run a single server to manage all databases on a host, but it's possible to run multiple servers. If this is done to provide access to multiple independent data directories, there is no problem of interaction. But it's possible to start multiple servers and point them at the same data directory. In general, this is not a good idea. If you try it, you'd better make sure your system provides good file locking or the servers will not cooperate properly. You also risk having your log files become a source of confusion (rather than a source of helpful information) if you have multiple servers writing to them at the same time.

  • When you run isamchk and myisamchk. The isamchk and myisamchk utilities are used for table maintenance, troubleshooting, and repair. As you might guess, because these utilities can change table contents, allowing them to operate on tables at the same time the server is doing so can cause table damage. It's important to understand how to limit this type of interaction so that you don't damage your tables. See Chapter 13, "Database Maintenance and Repair," for instructions on the proper use of these programs.

Database Representation

Each database managed by the MySQL server has its own database directory. This exists as a subdirectory of the data directory, with the same name as the database it represents. For example, a database my_db corresponds to the database directory DATADIR /my_db.

This representation allows several database-level statements to be almost trivial in their implementation. CREATE DATABASE db_name creates an empty directory db_name in the data directory, with an ownership and mode that allow access only to the MySQL server user (the UNIX user the server runs as). This is equivalent to creating the database manually by executing the following commands as the server user on the server host:

% mkdir DATADIR/db_name Create database directory
% chmod 700 DATADIR/db_name Make it accessible only to the MySQL server user

The minimal approach of representing a new database by an empty directory contrasts with some other database systems that create a number of control or system files even for an "empty" database.

The DROP DATABASE statement is implemented easily as well. DROP DATABASE db_name removes the db_name directory in the data directory, along with any table files contained within it. This is almost the same as the following command:

% rm -rfDATADIR/db_name

(The difference is that the server removes only files with extensions known to be used for tables. If you've created other files in the database directory, the server leaves them intact, and the directory itself is not removed.)

SHOW DATABASES is essentially nothing more than a list of the names of the directories located within the data directory. Some database systems keep a master table that lists all the databases maintained, but there is no such construct in MySQL. Given the simplicity of the data directory structure, the list of databases is implicit in the contents of the data directory and such a table would be unnecessary overhead.

Database Table Representation

Each table within a database exists as three files in the database directory: a form (description) file, a data file, and an index file. The basename of each file is the table name, and the filename extension indicates the file type. The extensions are shown in Table 10.1. The data and index file extensions indicate whether the table uses the older ISAM indexing or the newer MyISAM indexing.

Table 10.1. 10.1 MySQL File Types
File Type Filename Extension File Contents
Form file .frm Describes the structure of the table (its columns, column types, indexes, and so forth).
Data file .ISD (ISAM) or.MYD (MyISAM)Contains the table's data—that is, its rows.
Index file .ISM (ISAM) or.MYI (MyISAM)Contains the index trees for any indexes on the data file. The index file exists whether or not the table has any indexes.

When you issue a CREATE TABLE tbl_name statement defining a table's structure, the server creates a tbl_name .frm file containing the internal encoding of that structure. It also creates empty data and index files, initialized to contain information indicating no records and no indexes. (If the CREATE TABLE statement includes index specifications, the index file reflects those indexes.) The ownership and mode of the files representing the table are set to allow access only to the MySQL server user.

When you issue an ALTER TABLE statement, the server re-encodes tbl_name .frm and modifies the contents of the data and index files to reflect the structural change indicated by the statement. This is true for CREATE INDEX and DROP INDEX as well because they are treated by the server as equivalent ALTER TABLE statements. DROP TABLE is implemented by removing the three files representing the table.

You cannot create or alter a table manually, although you can remove a table by removing the three files in the database directory to which the table corresponds. For example, if my_db is the current database, DROP TABLE my_tbl is roughly equivalent to the following command:

% rm -f
							DATADIR/my_db/my_tbl.*

The output from SHOW TABLES my_db is just a listing of the basenames of the .frm files in the my_db database directory. Some database systems maintain a registry that lists all tables contained in a database. MySQL does not because it is unnecessary; the "registry" is implicit in the structure of the data directory.

Operating System Constraints on Database and Table Naming

MySQL has general rules for naming databases and tables:

  • Names may be constructed from alphanumeric characters in the current character set, as well as the underscore and dollar ('_' and '$').

  • Names may be up to 64 characters long.

However, because names of databases and tables correspond to names of directories and files, the operating system on which a server runs may impose additional constraints.

First, you are limited in database and table names to the characters that are legal in filenames. For example, '$' is allowed in a name by MySQL's rules, but if your OS doesn't allow it, you can't use it in directory or table names, either. In practice, this is not a concern for either UNIX or Windows. The greatest difficulty you might have is referring to names directly from the shell when performing database administration. For example, if you give a database a name such as $my_db that includes a dollar sign, any reference to that name from a shell command line may be interpreted by the shell as a variable reference:

% ls $my_db
my_db: Undefined variable.

If this happens, you must escape the '$' character or use quoting to suppress its special meaning:

% ls $my_db
% ls '$my_db'

If you use quotes, use single quotes. Double quotes do not suppress variable interpre-tation.

Second, although MySQL allows database and table names to be up to 64 characters long, the length of names is also bound by the length allowed by your operating system. Normally, this is not a problem, although under UNIX you may run into System V-ish systems that enforce an older 14-character limit. In that case, your effective limit on database names is 14 characters. The limit for table names is 10 characters because names of files representing tables end with a period and a three-character extension.

Third, case sensitivity of the underlying file system affects how you name and refer to databases and tables. If the file system is case sensitive (as in UNIX), the two names my_tbl and MY_TBL refer to different tables. If the file system is not case sensitive (as in Windows), my_tbl and MY_TBL refer to the same table. You should keep that in mind if you use a UNIX server to develop a database, and if there is a possibility you might move the database to a Windows server sometime.

Implications of Data Directory Structure for System Performance

The structure of the data directory is easy to understand because it uses the hierarchical structure of the file system in such a natural way. At the same time, this structure has certain performance implications, particularly regarding operations that open the files that represent database tables.

One consequence of the data directory structure is that because tables are represented by multiple files, each open table can require multiple file descriptors, not one. The server caches descriptors intelligently, but a busy server can easily use up lots of them servicing many simultaneous client connections or executing complex queries that reference several tables. File descriptors are a scarce resource on many systems, particularly systems that set the default per-process limit fairly low. Chapter 11, "General MySQL Administration," provides information on assessing the number of descriptors you'll need, and on reconfiguring your server or operating system if necessary.

Another effect of representing each table by its own files is that table-opening time increases with the number of tables. Operations that open tables map onto the file-opening operations provided by the operating system, and as such are bound by the efficiency of the system's directory-lookup routines. Normally this isn't an issue, but it is something to consider if you'll need large numbers of tables in a database.

For example, if you want to have 10,000 tables, your database directory will contain 30,000 files. With that many files, you may notice a slowdown due to the time taken by file-opening operations. (Linux ext2 and Solaris file systems are subject to this problem.) If this is cause for concern, you might be wise to reconsider the structure of your tables in relation to the needs of your applications and reorganize your tables accordingly. Ask whether or not you really require so many tables; sometimes applications multiply tables needlessly. An application that creates a separate table per user results in many tables, all of which have identical structures. If you wanted to combine the tables into a single table, you might be able to do so by adding another column identifying the user to which each row applies. If this significantly reduces the number of tables, the application's performance improves.

As always in database design, you must consider whether or not this particular strategy is worthwhile for a given application. Reasons not to combine tables in the manner just described are as follows:

  • Increased disk space requirements. Combining tables reduces the number of tables required (decreasing table-opening times), but adds another column (increasing disk space requirements). This is a typical time versus space tradeoff and you'd need to decide which factor is most important. If speed is paramount, you'd probably be willing to sacrifice a little extra disk space. If space is tight, it might be more acceptable to use multiple tables and live with a slight delay.

  • Security considerations. These may constrain your ability or desire to combine tables. One reason to use a separate table per user is to allow access to each table only to that user by means of table-level privileges. If you combined tables, all users' data would be in the same table.

    MySQL has no provision for restricting access to particular rows to a given user; thus, you might not be able to merge tables without compromising access control. On the other hand, if all access to the data is controlled by your application (users never connect directly to the database), you can merge the tables and use application logic to enforce row-level access to the combined result.

MySQL has its own internal limit on table sizes, but because it represents tables as files, MySQL is also bound by the maximum file size allowed by your operating system. The effective maximum table size is therefore the smaller of MySQL's internal limit and the system file size limit.

In general, the trend is for constraints on sizes to be relaxed over time. For example, IBM AIX 4.1 has a 2GB file size limit, but the limit in AIX 4.2 is approximately 64GB. The internal table size limit in MySQL increases with newer releases, too. Prior to the 3.23 series, the internal limit is 4GB. As of 3.23, the limit is approximately 9 million terabytes. Table 10.2 illustrates how the MySQL internal table size limit and the AIX file size limit interact to determine the effective maximum table size. Similar interactions may apply for other operating systems as well.

Table 10.2. 10.2 Interaction of MySQL and Operating System Size Limits
MySQL Version AIX Version Maximum Table Size Constraining Factor
MySQL 3.22.22AIX 4.12GBAIX 2GB maximum file size
MySQL 3.22.22AIX 4.24GBMySQL 4GB maximum table size
MySQL 3.23.0AIX 4.12GBAIX 2GB maximum file size
MySQL 3.23.0AIX 4.264GBAIX 64GB maximum file size

MySQL Status Files

In addition to database directories, the MySQL data directory contains a number of status files. These files are summarized in Table 10.3, then described in more detail. The default name for most of the files is generated from the server host name, denoted as HOSTNAME in the table.

The server writes its process ID (PID) into the PID file when it starts up and removes the file when it shuts down. The PID file is themeans by which the server allows itself to be found by other processes. For example, if you run the mysql.server script at system shutdown time to shut down the MySQL server, that script examines the PID file to determine which process it needs to send a termination signal to.

The error log is created by safe_mysqld as a redirection of the server's standard error output; it contains any messages the server writes to stderr. This means the error log exists only if you start the server by invoking safe_mysqld. (That is the preferred method for starting the server anyway because safe_mysqld restarts the server if it exits due to an error.)

The general log and update log are optional; you can turn on just the logging types you need, using the --log and --log-update server options.

Table 10.3. 10.3 MySQL Status Files
File Type Default Name File Contents
Process ID HOSTNAME .pid The server process ID
Error log HOSTNAME .err Startup and shutdown events and error conditions
General log HOSTNAME .log Connect/disconnect events and query information
Update log HOSTNAME .nnn Text of all queries that modify table contents or structure

The general log provides general information about server operation: who is connecting from where and what queries they are issuing. The update log provides query information, too, but only for queries that modify database contents. The contents of the update log are written as SQL statements that can be executed by providing them as input to the mysql client. Update logs are useful if you have a crash and must revert to backup files because you can repeat the updates performed since the time of the crash by feeding update logs to the server. This allows you to bring your databases up to the state they were in when the crash occurred.

Here is a sample of the kind of information that appears in the general log as the result of a short client session that creates a table in the test database, inserts a row into the table, and then drops the table:

990509  7:34:09     492 Connect    paul@localhost on test
                    492 Query      show databases
                    492 Query      show tables
                    492 Field List tbl_1
                    492 Field List tbl_2
                    …
990509  7:34:22     492 Query      CREATE TABLE my_tbl (val INT)
990509  7:34:34     492 Query      INSERT INTO my_tbl VALUES(1)
990509  7:34:38     492 Query      DROP TABLE my_tbl
990509  7:34:40     492 Quit

The general log contains columns for date and time, server thread ID, event type, and event-specific information.

The same session appears in the update log like this:

use test;
CREATE TABLE my_tbl (val INT);
INSERT INTO my_tbl VALUES(1);
DROP TABLE my_tbl;

For the update log, an extended form of logging is available using the --log-long-format option. Extended logging provides information about who issued each query and when. This uses more disk space, of course, but may be useful if you want to know who is doing what without trying to correlate update log contents with the connection events in the general log.

For the session just shown, extended update logging produces this information:

# Time: 990509  7:43:42
# User@Host: paul [paul] @ localhost []
use test;
CREATE TABLE my_tbl (val INT);
# User@Host: paul [paul] @ localhost []
INSERT INTO my_tbl VALUES(1);
# Time: 990509  7:43:43
# User@Host: paul [paul] @ localhost []
DROP TABLE my_tbl;

It's a good idea to make sure your log files are secure and not readable by arbitrary users. The general and update logs both may contain sensitive information such as passwords because they contain the text of queries. Here's the type of log entry you don't want just anyone to be able to read because it displays the password for the root user:

990509  7:47:24       4 Query      UPDATE user SET Password=PASSWORD("secret")
                                   WHERE user="root"

For information on checking and setting your data directory permissions, see Chapter 12, "Security." The short instructions for securing the data directory consist of the following command:

%chmod 700DATADIR

Run this command as the UNIX user who owns the data directory. Make sure the server runs as that user, too, or the command not only will keep other people out of the data directory (which you want), but it will prevent the server from accessing your databases (which you don't want!).

Status files appear at the top level of the data directory, just like database directories, so you may wonder whether names of those files can conflict with or be mistaken for database names (for example, when the server is executing a SHOW DATABASES statement). The answer is no. Status and log information is stored in files, and databases are directories, so executable programs can distinguish them with a simple stat() call. (That's how the server tells them apart.) If you're looking through the data directory yourself, you can distinguish status files from database directories by using ls -l and examining the first character of the mode information to see whether it's a '-' or a 'd':

% ls -lDATADIR
total 31
drwxrwx---  1 mysqladm     mysqlgrp      1024 May  8 13:22 bigdb
drwxrwx---  2 mysqladm     mysqlgrp      1024 Dec 15 22:34 mysql
-rw-rw----  1 mysqladm     mysqlgrp        69 May  9 20:11 pit-viper.001
-rw-rw-r--  1 mysqladm     mysqlgrp     24168 May  9 20:11 pit-viper.err
-rw-rw----  1 mysqladm     mysqlgrp      4376 May  9 20:11 pit-viper.log
-rw-r--r--  1 mysqladm     mysqlgrp         5 May  9 20:11 pit-viper.pid
drwxrwx---  7 mysqladm     mysqlgrp       512 Sep 10  1998 sql-bench
drwxrwx---  2 mysqladm     mysqlgrp       512 May  9 07:44 test

You can also tell simply by looking at names: All status file names contain a period, whereas no database directory name does (period is not a legal character in a database name).

For information on log file maintenance and rotation techniques, see Chapter 11.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset