Data Operations Management is the fourth Data Management Function in the data management framework shown in Figures 1.3 and 1.4. It is the third data management function that interacts with and is influenced by the Data Governance function. Chapter 6 defines the data operations management function and explains the concepts and activities involved in data operations management.
6.1 Introduction
Data operations management is the development, maintenance, and support of structured data to maximize the value of the data resources to the enterprise. Data operations management includes two sub-functions: database support and data technology management.
The goals of data operations management include:
The context diagram for data operations management is shown in Figure 6.1.
6.2 Concepts and Activities
Chapter 1 stated that data operations management is the function of providing support from data acquisition to data purging. Database administrators (DBAs) play a key role in this critical function. The concepts and activities related to data operations management and the roles of database administrators are presented in this section.
6.2.1 Database Support
Database support is at the heart of data management, and is provided by DBAs. The role of DBA is the most established and most widely adopted data professional role, and database administration practices are perhaps the most mature of all data management practices. DBAs play the dominant role in data operations management, as well as in Data Security Management (see Chapter 7). As discussed in Chapter 5, DBAs also play critical roles in Data Development, particularly in physical data modeling and database design, as well as support for development and test database environments.
In fact, many DBAs specialize as Development DBAs or Production DBAs. Development DBAs focus on data development activities, while Production DBAs perform data operations management activities. In some organizations, each specialized role reports to different organizations within IT. Production DBAs may be part of a production infrastructure and operations support group. Development DBAs and / or production DBAs are sometimes integrated into application development organizations.
Figure 6.1 Data Operations Management Context Diagram
Production DBAs take primary responsibility for data operations management, including:
The Production DBA is responsible for the following primary deliverables:
DBAs do not perform all the activities of data operations management exclusively. Data stewards, data architects, and data analysts participate in planning for recovery, retention, and performance. Data stewards, data architects, and data analysts may also participate in obtaining and processing data from external sources
6.2.1.1 Implement and Control Database Environments
Database systems administration includes the following tasks:
Prepare checklists to ensure these tasks are performed at a high level of quality. These checklists lay out the steps involved. The work of one DBA should be audited by another DBA before the changes go into production.
The DBA is the custodian of all database changes. While many parties may request changes, the DBA defines the precise changes to make to the database, implements the changes, and controls the changes. DBAs should use a controlled, documented, and auditable process for moving application database changes to the Quality Assurance or Certification (QA) and Production environments, in part due to Sarbanes-Oxley and other regulatory requirements. A manager-approved service request or change request usually initiates the process. In most cases, the DBA should have a back out plan to reverse changes in case of problems.
Test all changes to the QA environment in the development / test environment, first, and test all changes to production, except for emergency changes, in the QA environment. While Development DBAs control changes to development / test environments, Production DBAs control changes to production environments, as well as usually controlling QA environments.
6.2.1.2 Obtain Externally Sourced Data
Most organizations obtain some data from external third-party sources, such as lists of potential customers purchased from an information broker, or product data provided by a supplier. The data is either licensed or provided free of charge; is provided in a number of different formats (CD, DVD, EDI, XML, RSS feeds, text files); and is a one-time-only or regularly updated via a subscription service. Some acquisitions require legal agreements.
A managed approach to data acquisition centralizes responsibility for data subscription services with data analysts. The data analyst will need to document the external data source in the logical data model and data dictionary. A developer may design and create scripts or programs to read the data and load it into a database. The DBA will be responsible for implementing the necessary processes to load the data into the database and / or make it available to the application.
6.2.1.3 Plan for Data Recovery
Data governance councils should establish service level agreements (SLAs) with IT data management services organizations for data availability and recovery. SLAs set availability expectations, allowing time for database maintenance and backup, and set recovery time expectations for different recovery scenarios, including potential disasters.
DBAs must make sure a recovery plan exists for all databases and database servers, covering all possible scenarios that could result in loss or corruption of data. This includes, but is not limited to:
Management and the organization’s business continuity group, if one exists, should review and approve the data recovery plan. The DBA group must have easy access to all data recovery plans.
Keep a copy of the plan, along with all necessary software, such as the software needed to install and configure the DBMS, instructions, and security codes, such as the administrator password, in a secure, off-site location in the event of a disaster. Backups of all databases should be kept in a secure, off-site location.
6.2.1.4 Backup and Recover Data
Make regular backups of databases and, for OLTP databases, the database transaction logs. The SLA for the database should include an agreement with the data owners as to how frequently to make these backups. Balance the importance of the data against the cost of protecting it. For large databases, frequent backups can consume large amounts of disk storage and server resources. At least once a day, make a complete backup of each database.
Furthermore, databases should reside on some sort of managed storage area, ideally a RAID array on a storage area network or SAN, with daily back up to tape. For OLTP databases, the frequency of transaction log backups will depend on the frequency of updating, and the amount of data involved. For frequently updated databases, more frequent log dumps will not only provide greater protection, but will also reduce the impact of the backups on server resources and applications. Backup files should be kept on a separate file system from the databases, and should be backed up to tape, or some separate storage medium, daily. Store copies of the daily backups in a secure off-site facility.
For extremely critical data, the DBA will need to implement some sort of replication scheme in which data moves to another database on a remote server. In the event of database failure, applications can then “fail over” to the remote database and continue processing. Several different replication schemes exist, including mirroring and log shipping. In mirroring, updates to the primary database are replicated immediately (relatively speaking) to the secondary database, as part of a two-phase commit process. In log shipping, a secondary server receives and loads copies of the primary database’s transaction logs at regular intervals. The choice of replication method depends on how critical the data is, and how important it is that failover to the secondary server be immediate. Mirroring is usually a more expensive option than log shipping. For one secondary server, use mirroring; use log shipping to update additional secondary servers.
Other data protection options include server clustering, in which databases on a shared disk array can failover from one physical server to another, and server virtualization, where the failover occurs between virtual server instances residing on two or more physical machines.
Most DBMSs support hot backups of the database - backups taken while applications are running. When some updates occur in transit, they will roll either forward to completion, or roll back when the backup reloads. The alternative is a cold backup taken when the database is off-line. However, this may not be a viable option if applications need to be continuously available.
The DBA will also, when necessary, recover lost or damaged databases by reloading them from the necessary database and transaction log backups to recover as much of the data as possible.
6.2.1.5 Set Database Performance Service Levels
Database performance has two facets - availability and performance. Performance cannot be measured without availability. An unavailable database has a performance measure of zero.
SLAs between data management services organizations and data owners define expectations for database performance. Typically, the agreement will identify an expected timeframe of database availability, and a select few application transactions (a mix of complex queries and updates), each with a specified maximum allowable execution time during identified availability periods. If process execution times consistently exceed the SLA, or database availability is not consistently compliant with the SLA, the data owners will ask the DBA to identify the source of the problem and take appropriate remedial action.
Availability is the percentage of time that a system or database can be used for productive work. Availability requirements are constantly increasing, raising the business risks and costs of unavailable data. Activities to ensure availability are increasingly performed in shrinking maintenance windows.
Four related factors affect availability:
Many things may cause a loss of database availability, including:
DBAs are responsible for doing everything possible to ensure databases stay online and operational, including:
6.2.1.6 Monitor and Tune Database Performance
DBAs optimize database performance both proactively and reactively, by monitoring performance and by responding to problems quickly and competently. Most DBMSs provide the capability of monitoring performance, allowing DBAs to generate analysis reports. Most server operating systems have similar monitoring and reporting capabilities. DBAs should run activity and performance reports against both the DBMS and the server on a regular basis, including during periods of heavy activity. They should compare these reports to previous reports to identify any negative trends and save them to help analyze problems over time.
Data movement may occur in real time through online transactions. However, many data movement and transformation activities are performed through batch programs, which may be Extract-Transform-Load (ETL) programs or limited to one system internally. These batch jobs must complete within specified windows in the operating schedule. DBAs and data integration specialists monitor the performance of batch data jobs, noting exceptional completion times and errors, determining the root cause of errors, and resolving these issues.
When performance problems occur, the DBA should use the monitoring and administration tools of the DBMS to help identify the source of the problem. A few of the most common possible reasons for poor database performance are:
After the cause of the problem is identified, the DBA will take whatever action is needed to resolve the problem, including working with application developers to improve and optimize the database code, and archiving or deleting data that is no longer actively needed by application processes.
In exceptional cases, the DBA may consider working with the data modeler to de-normalize the affected portion of the database. Do this only after other measures, such as the creation of views and indexes, and the rewriting of SQL code, have been tried; and only after careful consideration of the possible consequences, such as loss of data integrity and the increase in complexity of SQL queries against de-normalized tables. This caveat applies only to OLTP databases. For read-only reporting and analytical databases, de-normalization for performance and ease of access is the rule rather than the exception, and poses no threat or risk.
6.2.1.7 Plan for Data Retention
One important part of the physical database design is the data retention plan. Discuss data retention with the data owners at design time, and reach agreement on how to treat data over its useful life. It is incorrect to assume that all data will reside forever in primary storage. Data that is not actively needed to support application processes should be archived to some sort of secondary storage on less-expensive disk, or tape, or a CD / DVD jukebox, perhaps on a separate server. Purge data that is obsolete and unnecessary, even for regulatory purposes. Some data may become a liability if kept longer than necessary. Remember that one of the principal goals of data management is that the cost of maintaining data should not exceed its value to the organization.
6.2.1.8 Archive, Retain, and Purge Data
The DBAs will work with application developers and other operations staff, including server and storage administrators, to implement the approved data retention plan. This may require creating a secondary storage area, building a secondary database server, replicating less-needed data to a separate database, partitioning existing database tables, arranging for tape or disk backups, and creating database jobs which periodically purge unneeded data.
6.2.1.9 Support Specialized Databases
Do not assume that a single type of database architecture or DBMS works for every need. Some specialized situations require specialized types of databases. Manage these specialized databases differently from traditional relational databases. For example, most Computer Assisted Design and Manufacturing (CAD / CAM) applications will require an Object database, as will most embedded real-time applications. Geospatial applications, such as MapQuest, make use of specialized geospatial databases. Other applications, such as the shopping-cart applications found on most online retail web sites, make use of XML databases to initially store the customer order data. This data is then copied into one or more traditional OLTP databases or data warehouses. In addition, many off-the-shelf vendor applications may use their own proprietary databases. At the very least, their schemas will be proprietary and mostly concealed, even if they sit on top of traditional relational DBMSs.
Administration of databases used only to support a particular application should not present any great difficulty. The DBA will mostly be responsible for ensuring regular backups of the databases and performing recovery tests. However, if data from these databases needs to be merged with other existing data, say in one or more relational databases, it may present a data integration challenge. These considerations should be discussed and resolved whenever such databases are proposed or brought into the organization.
6.2.2 Data Technology Management
DBAs and other data professionals manage the technology related to their field. Managing data technology should follow the same principles and standards for managing any technology.
The leading reference model for technology management is the Information Technology Infrastructure Library (ITIL), a technology management process model developed in the United Kingdom. ITIL principles apply to managing data technology. For more information, refer to the ITIL website, http://www.itil-officialsite.com.
6.2.2.1 Understand Data Technology Requirements
It is important to understand not only how technology works, but also how it can provide value in the context of a particular business. The DBA, along with the rest of the data services organization, should work closely with business users and managers to understand the data and information needs of the business. This will enable them to suggest the best possible applications of technology to solve business problems and take advantage of new business opportunities.
Data professionals must first understand the requirements of a data technology before determining what technical solution to choose for a particular situation. These questions are a starting point for understanding suitability of a data technology and are not all-inclusive.
6.2.2.2 Define the Data Technology Architecture
Data technology is part of the enterprise’s overall technology architecture, but it is also often considered part of its data architecture.
Data technology architecture addresses three basic questions:
Data technologies to be included in the technology architecture include:
Technology architecture components are sometimes referred to as “bricks”. Several categories or views representing facets of data technology bricks are:
The technology road map for the organization consists of these reviewed, approved, and published bricks, and this helps govern future technology decisions.
It is important to understand several things about technology:
After the necessary discussions with the business users and managers, the data services group can summarize the data technology objectives for the business in the form of a strategic roadmap that can be used to inform and direct future data technology research and project work.
6.2.2.3 Evaluate Data Technology
Selecting appropriate data related technology, particularly the appropriate database management technology, is an important data management responsibility. Management selects data technology to meet business needs, including total cost, reliability, and integration.
Selecting data technology involves business data stewards, DBAs, data architects, data analysts, other data management professionals, and other IT professionals. Data technologies to be researched and evaluated include:
In addition, data professionals may have unique requirements for tools used in other fields, including:
Make selection decisions using a standard technology evaluation process and applying the decision analysis concepts defined by Kepner and Tregoe in The Rational Manager. List alternatives and compare them against a defined set of weighted decision criteria, including feature requirements and functional objectives. The basic method includes the following steps:
Selecting strategic DBMS software is particularly important. DBMS software has a major impact on data integration, application performance, and DBA productivity. Some of the factors to consider when selecting DBMS software include:
The DBA will need to assist in evaluating technology alternatives. A number of factors come into play here:
The DBA will need to carefully test each candidate product to determine its strengths and weaknesses, ease of implementation and use, applicability to current and future business needs and problems, and whether it lives up to the vendor’s hype.
6.2.2.4 Install and Administer Data Technology
The DBAs face the work of deploying new technology products in development / test, QA / certification, and production environments. They will need to create and document processes and procedures for administering the product with the least amount of effort and expense. Remember that the expense of the product, including administration, licensing, and support must not exceed the product’s value to the business. Remember also that the purchase of new products, and the implementation of new technology, will probably not be accompanied by an increase in staffing, so the technology will need to be, as much as possible, self-monitoring and self-administering.
Also, remember that the cost and complexity of implementing new technology is usually under-estimated, and the features and benefits are usually over-estimated. It is a good idea to start with small pilot projects and proof-of-concept (POC) implementations, to get a good idea of the true costs and benefits before proceeding with a full-blown production implementation.
6.2.2.5 Inventory and Track Data Technology Licenses
Organizations must comply with all licensing agreements and regulatory requirements. Carefully track and conduct yearly audits of software license and annual support costs, as well as server lease agreements and other fixed costs. Being out-of-compliance with licensing agreements poses serious financial and legal risks for an organization.
This data can also determine the total cost-of-ownership (TCO) for each type of technology and technology product. Regularly evaluate technologies and products that are becoming obsolete, unsupported, less useful, or too expensive.
6.2.2.6 Support Data Technology Usage and Issues
When a business need requires new technology, the DBAs will work with business users and application developers to ensure the most effective use of the technology, to explore new applications of the technology, and to address any problems or issues that surface from its use.
DBAs and other data professionals serve as Level 2 technical support, working with help desks and technology vendor support to understand, analyze, and resolve user problems.
The key to effective understanding and use of any technology is training. Organizations should make sure they have an effective training plan and budget in place for everyone involved in implementing, supporting, and using data and database technology. Training plans should include appropriate levels of cross training to better support application development, especially Agile development. DBAs should have, and take the opportunity to learn, application development skills such as class modeling, use-case analysis, and application data access. Developers should learn some database skills, especially SQL coding!
6.3 Summary
The guiding principles for implementing data operations management into an organization, a summary table of the roles for each data operations management activity, and organization and cultural issues that may arise during data operations management are summarized below.
6.3.1 Guiding Principles
In his book Database Administration, Craig Mullins offers DBAs the following rules of thumb for data operations management:
6.3.2 Process Summary
The process summary for the data operations management function is shown in Table 6.1. The deliverables, responsible roles, approving roles, and contributing roles are shown for each activity in the data operations management function. The Table is also shown in Appendix A9.
Activities |
Deliverables |
Responsible Roles |
Approving Roles |
Contributing Roles |
4.1.1 Implement and Control Database Environments |
Production database environment maintenance, managed changes to production databases, releases |
DBAs |
DM Executive |
System programmers, data stewards, data analysts, software developers, project managers |
4.1.2 Acquire Externally Sourced Data (O) |
Externally sourced data |
DBAs, data analysts, data stewards |
Data Governance Council |
Data stewards, data analysts |
4.1.3 Plan for Data Recovery (P) |
Data availability SLAs, data recovery plans |
DBAs |
DM Executive, Data Governance Council |
|
4.1.4 Backup and Recover Data (O) |
Database backups and logs,restored databases,business continuity |
DBAs |
DM Executive |
|
4.1.5 Set Database Performance Service Levels (P) |
Database performance SLAs |
DBAs |
DM Executive, Data Governance Council |
|
4.1.6 Monitor and Tune Database Performance (O) |
Database performance reporting, Database performance |
DBAs |
||
4.1.7 Plan for Data Retention (P) |
Data retention plan, storage management procedures |
DBAs |
DM Executive |
Storage management specialists |
4.1.8 Archive, Retrieve and Purge Data (O) |
Archived data, retrieved data, purged data |
DBAs |
DM Executive |
|
4.1.9 Manage Specialized Databases (O) |
Geospatial databases, CAD / CAM databases, XML databases, object databases |
DBAs |
DM Executive |
Data stewards, Subject matter experts |
4.2.1 Understand Data Technology Requirements (P) |
Data technology requirements |
Data architect, DBAs |
DM Executive |
Data stewards, other IT professionals |
4.2.2 Define the Database Architecture (P) (same as 2.3) |
Data technology architecture |
Data architect |
DM Executive, Data Governance Council |
DBAs, data analysts, data stewards |
4.2.3 Evaluate Data Technology (P) |
Tool evaluation findings, tool selection decisions |
Data analysts, DBAs |
DM Executive, Data Governance Council |
Data stewards, other IT professionals |
4.2.4 Install and Administer Data Technology (O) |
Installed technology |
DBAs |
DM Executive |
Data analysts, other data professionals |
4.2.5 Inventory and Track Data Technology Licenses (C) |
License inventory |
DBAs |
DM Executive |
Other data professionals |
4.2.6 Support Data Technology Usage and Issues (O) |
Identified and resolved technology issues |
DBAs |
DM Executive |
Other data professionals |
Table 6.1 Data Operations Management Process Summary
6.3.3 Organizational and Cultural Issues
Q1: What are common organizational and cultural obstacles to database administration?
A1: DBAs often do not effectively promote the value of their work to the organization. They need to recognize the legitimate concerns of data owners and data consumers, balance short-term and long-term data needs, educate others in the organization about the importance of good data management practices, and optimize data development practices to ensure maximum benefit to the organization and minimal impact on data consumers. By regarding data work as an abstract set of principles and practices, and disregarding the human elements involved, DBAs risk propagating an “us versus them” mentality, and being regarded as dogmatic, impractical, unhelpful, and obstructionist.
Many disconnects, mostly clashes in frames of reference, contribute to this problem. Organizations generally regard information technology in terms of specific applications, not data, and usually see data from an application-centric point of view. The long-term value to organizations of secure, reusable, high-quality data, such as data as a corporate resource, is not as easily recognized or appreciated.
Application development often sees data management as an impediment to application development, as something that makes development projects take longer and cost more without providing additional benefit. DBAs have been slow to adapt to changes in technology, such as XML, objects, and service-oriented architectures, and new methods of application development, such as Agile Development, XP, and Scrum. Developers, on the other hand, often fail to recognize how good data management practices can help them achieve their long-term goals of object and application reuse, and true service-oriented application architecture.
There are several things that DBAs and other data-management practitioners can do to help overcome these organizational and cultural obstacles, and promote a more helpful and collaborative approach to meeting the organization’s data and information needs:
To sum up, we need to understand who our stakeholders are, and what their needs and concerns are. We need to develop a set of clear, concise, practical, business-focused standards for doing the best possible work in the best possible way. Moreover, we need to teach and implement those standards in a way that provides maximum value to our stakeholders, and earns their respect for us as facilitators, contributors, and solution providers.
Q2: How many DBAs does an organization need?
A2: The answer to this question varies by organization. There is no standard staffing rule of thumb. However, there may be a significant business cost to understaffing. An overworked DBA staff can make mistakes that cost much more in downtime and operational problems than might be saved in salary cost avoidance by minimizing the DBA staff. Many factors need to be considered when determining the optimal number of DBAs for the organization. These factors include:
Q3: What is an application DBA?
A3: An application DBA is responsible for one or more databases in all environments (development / test, QA, and production), as opposed to database systems administration for any of these environments. Sometimes, application DBAs report to the organizational units responsible for development and maintenance of the applications supported by their databases. There are pros and cons to staffing application DBAs. Application DBAs are viewed as integral members of an application support team, and by focusing on a specific database, they can provide better service to application developers. However, application DBAs can easily become isolated and lose sight of the organization’s overall data needs and common DBA practices. Constant collaboration between DBAs and data analysts, modelers, and architects is necessary to prevent DBA isolation and disengagement.
Q4: What is a procedural DBA?
A4: A procedural DBA specializes in development and support of procedural logic controlled and execute by the DBMS: stored procedures, triggers, and user defined functions (UDFs). The procedural DBA ensures this procedural logic is planned, implemented, tested, and shared (reused). Procedural DBAs lead the review and administration of procedural database objects.
6.4 Recommended Reading
The references listed below provide additional reading that support the material presented in Chapter 6. These recommended readings are also included in the Bibliography at the end of the Guide.
Dunham, Jeff. Database Performance Tuning Handbook. McGraw-Hill, 1998. ISBN 0-07-018244-2.
Hackathorn, Richard D. Enterprise Database Connectivity. Wiley Professional Computing, 1993. ISBN 0-4761-57802-9. 352 pages.
Hoffer, Jeffrey, Mary Prescott, and Fred McFadden. Modern Database Management, 7th Edition. Prentice Hall, 2004. ISBN 0-131-45320-3. 736 pages.
Kepner, Charles H. and Benjamin B. Tregoe. The New Rational Manager. Princeton Research Press, 1981. 224 pages.
Kroenke, D. M. Database Processing: Fundamentals, Design, and Implementation, 10th Edition. Pearson Prentice Hall, 2005. ISBN 0-131-67626-3. 696 pages.
Martin, James. Information Engineering Book II: Planning and Analysis. Prentice-Hall, Inc., 1990. Englewoood Cliffs, New Jersey.
Mattison, Rob. Understanding Database Management Systems, 2nd Edition. McGraw-Hill, 1998. ISBN 0-07-049999-3. 665 pages.
Mullins, Craig S. Database Administration: The Complete Guide to Practices and Procedures. Addison-Wesley, 2002. ISBN 0-201-74129-6. 736 pages.
Parsaye, Kamran and Mark Chignell. Intelligent Database Tools and Applications: Hyperinformation Access, Data Quality, Visualization, Automatic Discovery. John Wiley & Sons, 1993. ISBN 0-471-57066-4. 560 pages.
Pascal, Fabian, Practical Issues In Database Management: A Reference For The Thinking Practitioner. Addison-Wesley, 2000. ISBN 0-201-48555-9. 288 pages.
Piedad, Floyd, and Michael Hawkins. High Availability: Design, Techniques and Processes. Prentice Hall, 2001. ISBN 0-13-096288-0.
Rob, Peter, and Carlos Coronel. Database Systems: Design, Implementation, and Management, 7th Edition. Course Technology, 2006. ISBN 1-418-83593-5. 688 pages.