Certain maintenance procedures require more attention than others. The procedures that require the most attention are categorized as daily procedures. It is recommended that a SharePoint administrator take on these procedures each day to ensure system reliability, availability, performance, and security. These procedures are examined in the following three sections.
Although checking the overall server health and functionality may seem redundant or elementary, this procedure is critical to keeping the system environment and users working productively.
Some questions that should be addressed during the checking and verification process are the following:
• Can users access data in SharePoint document libraries?
• Can remote users access SharePoint via SSL if configured?
• Is there an exceptionally long wait to access the portal (that is, longer than normal)?
• Do SMTP alerts function properly?
• Are searches properly locating newly created or modified content?
To provide a secure and fault-tolerant organization, it is imperative that a successful backup be performed every night. If a server failure occurs, the administrator may be required to perform a restore from tape. Without a backup each night, the IT organization is forced to rely on rebuilding the SharePoint server without the data. Therefore, the administrator should always back up servers so that the IT organization can restore them with minimum downtime if a disaster occurs. Because of the importance of the tape backups, the first priority of the administrator each day needs to be verifying and maintaining the backup sets.
If disaster ever strikes, the administrators want to be confident that a system or entire farm can be recovered as quickly as possible. Successful backup mechanisms are imperative to the recovery operation; recoveries are only as good as the most recent backups.
Although Windows Server’s or SharePoint’s backup programs do not offer alerting mechanisms for bringing attention to unsuccessful backups, many third-party programs do. In addition, many of these third-party backup programs can send emails or pages if backups are successful or unsuccessful. For more information on backing up and restoring SharePoint, reference Chapter 10, “Backing Up and Restoring a SharePoint Environment.”
The Windows Event Viewer is used to check the system, security, application, and other logs on a local or remote system. These logs are an invaluable source of information regarding the system. The following event logs are present for SharePoint servers running on Windows Server:
• Security— Captures all security-related events being audited on a system. Auditing is turned on by default to record success and failure of security events.
• Application— Stores specific application information. This information includes services and any applications running on the server.
• System— Stores Windows Server–specific information.
All Event Viewer events are categorized either as informational, warning, or error.
Checking these logs often helps to understand them. Some events constantly appear but aren’t significant. Events will begin to look familiar, so it will be noticeable when something is new or amiss in event logs. It is for this reason that an intelligent log filter such as SCOM 2007 R2 is a welcome addition to a SharePoint environment.
Some best practices for monitoring event logs include
• Understanding the events being reported
• Setting up a database for archived event logs
• Archiving event logs frequently
• Using an automatic log parsing and alerting tool, such as System Center Operations Manager
To simplify monitoring hundreds or thousands of generated events each day, the administrator should use the filtering mechanism provided in the Event Viewer. Although warnings and errors should take priority, the informational events should be reviewed to track what was happening before the problem occurred. After the administrator reviews the informational events, she can filter out the informational events and view only the warnings and errors.
To filter events, do the following:
Some warnings and errors are normal because of bandwidth constraints or other environmental issues. The more logs are monitored, the more familiar an administrator should be with the messages and therefore will spot a problem before it affects the user community.