© Nikolas Charlebois-Laprade et al. 2017

Nikolas Charlebois-Laprade, Evgueni Zabourdaev, Daniel Brunet, Bruce Wilson, Mike Farran, Kip Ng, Andrew Stobart, Roger Cormier, Colin Hughes-Jones, Rhoderick Milne and Shawn Cathcart, Expert Office 365, https://doi.org/10.1007/978-1-4842-2991-0_6

6. SharePoint Recovery

Nikolas Charlebois-Laprade, Evgueni Zabourdaev2, Daniel Brunet3, Bruce Wilson4, Mike Farran5, Kip Ng6, Andrew Stobart4, Roger Cormier6, Colin Hughes-Jones6, Rhoderick Milne6 and Shawn Cathcart7

(1)Gatineau, Québec, Canada

(2)Ottawa, Ontario, Canada

(3)Laval, Québec, Canada

(4)Winnipeg, Manitoba, Canada

(5)Strathmore, Alberta, Canada

(6)Mississauga, Ontario, Canada

(7)Edmonton, Alberta, Canada

BY DANIEL BRUNET

In this chapter, you will learn everything that you need to be aware of concerning recovery for SharePoint. This book has a lot of great technical content, produced by some of my excellent colleagues, and while this chapter will probably be the least technical, its subject is one of the most difficult to implement. The challenge with recovery is rarely due to the technical aspect but mainly processes and expectations.

In all my years at Microsoft, working on support, knowledge transfer, or program assessment, data recovery has continuously been an area in which opportunities for improvement with SharePoint exist. While this book focuses on Office 365 (O365), in this chapter, I will also cover SharePoint infrastructure recovery on-premises.

I will cover many aspects of recovery that you may have to discuss with business/application owners and backup and/or storage groups and that you may have to implement. I will also discuss the different capabilities (or lack thereof) between recovery on-premises and on a Software as a service (Saas) such as SharePoint Online.

As most of you know, SharePoint is a multilayered platform, and like many Microsoft products, there are multiple ways to achieve the same goal. Depending on implementation and complexity, many methods can be employed. I will provide the right approach, in general, but will focus on large enterprise scenarios. I will give you the other options available and their pros and cons.

Note that some reference will be made to disaster recovery (DR) , but this chapter is not about that. The only DR scenario will be that of a failure of the farm, and not the underlying infrastructure. I will not cover high availability or data center issues.

Infrastructure

One of the key statements that I make in workshops that you can also find in TechNet articles is that the best infrastructure recovery process is a strong automated build process. Owing to the stateless nature of most SharePoint servers, there are many scenarios wherein build automation can offer clear advantages over other forms of troubleshooting and/or recovery. With that said, the most critical and most complex recovery scenario is, of course, the SharePoint farm. Fortunately, in O365, this is something you don’t have to worry about. The patching process at Microsoft is very mature and has improved with the lessons learned from deployments at this scale. This is one of the benefits that you see in SharePoint 2016, with which you can now have zero downtime patching and a much more robust update process (PSconfig/Wizard). This comes from our own experience at upgrading SharePoint in such a large scale as O365.

Many factors may also impact the degree of difficulty you will face when recovering other SharePoint components, for example, recovering a faulty SharePoint server. I have seen many issues with customers trying to restore a server from a virtual machine snapshot, and while this should be very simple, it is unsupported and can lead to more problems.

We then have the services. While they can all be restored using the SharePoint backup method, it is not necessarily the preferred approach for many of them. We will look at this in detail later in this chapter. As you can see in Figure 6-1, you have a single point of failure, which is the configuration database that is the core of a SharePoint farm. It is well known that restoring the configuration database is unsupported, and I will explain this in greater detail later. Every infrastructure layer that will be discussed in this chapter is represented in this diagram.

  • The farm

  • The servers

  • The services and their databases (including any server dependencies)

  • The web application and its databases

A434446_1_En_6_Fig1_HTML.gif
Figure 6-1. SharePoint logical architecture —infrastructure points of recovery

Content

While you can consider many services as actual content, such as the Managed Metadata Service, “content,” in this chapter, refers to everything authored within a site collection and stored in a content database. This is where processes can differ, based on business Service Level Agreements (SLAs) . Recovery Point Objective, Recovery Level Objective, and Recovery Time Objective may all have a financial impact on your recovery solution. Figure 6-2 represents the content hierarchy in SharePoint.

A434446_1_En_6_Fig2_HTML.gif
Figure 6-2. Content hierarchy

As you can see, every layer is part of a larger container, and what’s important to understand is that they do not all offer the same recovery capabilities. One of the most important things to understand in SharePoint on-premises and Online is that the smallest native full-fidelity restore capability is the site collection (see Figure 6-3). I will discuss this further, later in the chapter.

A434446_1_En_6_Fig3_HTML.gif
Figure 6-3. Full-fidelity recoverable containers in SharePoint infrastructure with out-of-the-box functionalities

Content Recovery

Let me start with this topic, as it covers both SharePoint on-premises and Online. SharePoint can be a one-stop shop for all your content, from your personal corporate documents to something like a critical business process form. As I mentioned previously, not all content may have the same restoration needs. Some organizations will have the same retention point-in-time recovery (PTR) for every item in SharePoint. Practically, while this simplifies the backup rules, it can have a much higher storage and manageability cost. It may also have a higher restoration overhead, depending on the backup approach. This is where it is very important to differentiate what you are accustomed to in SharePoint on-premises and what you can expect in SharePoint Online.

Point-in-Time Recovery

This is the concept of recovering content across time. While you might expect to be able to restore a piece of content from any time in the past, the reality is very different. Here are some points to consider in this regard:

  • How long in the past do you really need to recover a piece of content, and from where?

  • What type of content requires PTR?

  • Differentiate data loss recovery type—deletion operation vs. alteration

  • What is the PTR coverage you currently offer on your existing infrastructure?

  • What are the current tool(s) and methods you use to achieve the different PTR objectives?

  • What does O365 offer in terms of PTR?

As I mentioned in the introduction, in SharePoint, you must consider these specifications for every layer of content. On many occasions, I have seen backups duplicated for the sole purpose of recovering many layers. For example, to recover a document vs. a site collection vs. a content database, I would see three different backup processes implemented to achieve these goals. While this is very efficient to facilitate the recovery process, it has a very large backup footprint for both storage requirements and backup window time.

This is a very important aspect that you must discuss with your enterprise backup vendor. While they will tell you that you are covered for every layer, ask how the backup process is done. Basically, if you back up a database, can you easily restore a site collection or a document?

It is important to understand that when you back up a SharePoint content database, no matter which method you use, you are actually backing up every piece of content contained in it: site collections, sites, lists/libraries, and items/documents.

But a backup is only as good as the restoration process that comes with it. While all content is part of a content database, your expectations may not be met, as your recovery may require more granularity than can be achieved through content database recovery.

As I stated previously, out of the box, the smallest full-fidelity restoration in SharePoint is a site collection. And it is also bound to a supported limit of 100GB that may impact your backup and restore scenarios. Let me detail the capabilities, based on the level and service and the proper approach for large-site collections (Table 6-1).

Table 6-1. Capabilities

Restoration Point

Possible Backup Method(s)

Limitation

Scenarios

Point-in-Time On-Premises

Point-in-Time on O365

Database

SQL only

SQL Agent with enterprise software

SharePoint backup

None, but consider storage space

Database corruption

Isolated site collection restoration (1 in database)

Based on your database retention policy

14-day overwrite, only with support request at Microsoft

Site Collection

SQL database restore if alone in database

SharePoint backup-spsite/restore-spsite operation

100GB with SharePoint backup operation

Consider disk space requirement where backup is executed

Site corruption

Site duplication (test/dev)

Granular recovery on copy of site

Based on your database retention policy or site collection retention, if applicable

14-day overwrite, only with support request at Microsoft

Site or subsite

Site without the site collection using content export (export/import-spweb)

Not full-fidelity

Content reorganization

Faulty site

Based on database retention and/or tool limitation

Not available

List or library

Complete list or library using content export (export/import-spweb)

Not full-fidelity

Content reorganization

Faulty library

Granular recovery

Based on database retention and/or tool limitation

Not available

Item or document

Granular document recovery point in time

Not possible out of the box outside of recycle bin and versioning

Lost document

Document change tracking

Based on database retention and/or tool limitation

Recycle bin (90-day default)

Versioning based on library setting

14-Day SharePoint Online

This is an important aspect differentiating usual on-premises expectations and SharePoint Online. If you have to recover a faulty artifact in SharePoint, it can be restored for any point time in the preceding 14 days, as long as it’s a database or a site collection. Outside of that scope, no official support is offered.

Let’s look at some hypothetical scenarios in which the expectations of a user may not be met after transitioning to SharePoint Online.

  1. I am looking for a document that I created three months ago, and it is not present in either

    1. My recycle bin

    2. My previous versions

  2. I need to restore my site collection to the state it was in a month ago.

  3. I want to restore a specific library on SharePoint Online.

  4. I have a compliance that requires that I keep deleted and altered content for a specific period.

While this is not the subject of this chapter, it is important to differentiate between a restore requirement vs. compliance. There are many features in SharePoint Online and in the security and compliance center of Office 365 that are designed to address specific concerns about retention outside of a backup. Although activation and configuration of these features may result in content being retained indefinitely, this is not a one-size-fits-all solution.

If your compliance requires you to keep everything, altered or deleted, for, let’s say, seven years, this is achievable with a retention rule in Office 365. Note that this will impact the size of your site and your tenant quota.

Alternatives on the Cloud

The 14-day PTR in the cloud is very similar to a well-balanced PTR on-premises. The difference is only in long-term recovery. On-premises, you will not have a full PTR. Instead, you will probably have a monthly PTR, and it is very possible that a piece of content will not exist on an archived tape, for example. If a document or a version was created/deleted within a month, chances are that it will not be captured. Because there is no equivalent of something like a tape retention, what are the other options available, if required?

The most popular are cloud-based enterprise tools. As you would probably do on-premises to an archiving area, the vendor will store your PTR content on the cloud. There are a few advantages to this approach.

  • Much less storage requirement on-premises

  • No archiving management

  • No traffic from O365 to your network and much faster transfer time, especially if you are storing your PTR content on Azure, which can be in the same data center as O365

You can still back up the content to your data center, but that would defeat the benefits of using SharePoint Online, and you must manage bandwidth and storage requirements.

Limitations

You have to understand that these cloud backups are only for granular content. It will not do a database or a site collection backup. It is a limitation but, at the same time, a benefit. Let me explain.

At Microsoft, we will cover a recovery of a site collection or a database within 14 days. This is sufficient in any corruption scenarios in which your site is defective. The missing piece is granular PTR, and this is where we rely on great partners to provide a solution, if you require it.

Capabilities may differ from vendor to vendor, but you can at least expect the capability of recovering a document at a point in time based on your retention rules. But another benefit is that vendors can also provide more services, such as site reconstruction. Let me explain where site reconstruction/reorganization fits into a recovery requirement.

Site Reconstruction/Reorganization

Many times, when organizations start to use SharePoint, they do so without a proper governance and taxonomy (site and content structure). On top of this, they mix or aggregate too much content within a site collection, which becomes unmanageable over time. While this chapter does not cover site creation governance, you may have to reorganize content at some point, and this will require you to have a recovery strategy that is in line with our discussion.

Another scenario is when a site becomes defective owing to heavy customization or other issues, and you have to extract its content to a new site collection, so that you do not lose content but regain stability. On-premises, you can use the export feature of SharePoint (export-spweb), but it has limitations. It is bound to the version you are running and the site definition it was created with. But you can use an export to, well, export content to a new site collection. If you have a granular backup solution on O365, you could achieve the same goal of exporting content to a new site.

Just be aware that any exportation is not the same as a fidelity backup, and you may have to do some manual work to make your new site behave exactly as its predecessor. Typical limitations in exporting content are

  • Last edited author and date replaced

  • Workflow state lost and to be rerun

  • List lookup relation lost

  • Recycle bin content

  • Alerts

Content Categorization

Content categorization is very important to help you define your recovery strategy, especially that we now know that, with SharePoint, we must deal with the content and/or the container. A simple strategy, such as what you may be used to with backing up a file share, or a database, may not be sufficient. Let me show you the type of content and high-level categorization that exist in SharePoint.

Whenever I work with information architecture (IA), no matter the size or how mature it is, I provide the basic pyramid that is the foundation of understanding content categorization in SharePoint. No matter how your IA is defined, you will probably find your content in one of these categories and can apply a basic SLA to it (see Figure 6-4).

A434446_1_En_6_Fig4_HTML.gif
Figure 6-4. My own extended version of the SharePoint content pyramid

The top layer is the realm of any business-critical application. It may be identified as a portal, but in SharePoint, it defines something that is highly managed. It can be very small or mildly large, but it has a high impact and requires a specific recovery SLA. In any case, it should be isolated in a site collection and, many times, an isolated database.

The second layer is the most complex area to handle. Document management and record management are usually subject to well-defined retention policies, which vary from geography to geography, from industry to industry, and even among customers. It should have, by far, the largest containers and site collections. This is usually where you must decide if you need granular recovery capabilities.

The third layer, while it is the most puzzling category lately, should be the easiest to handle and has the most capabilities. This is the “team,” “social,” and “modern collaboration” category. The technologies used in this category of content extends outside just SharePoint! We now have Outlook groups that use both the best from SharePoint and Exchange, Yammer, and, more recently, Microsoft Teams. You may wonder which one to use, and this is a very hot topic right now, but not within the scope of this chapter. I will limit this layer to the typical SharePoint team site that is not meant for departmental usage. That means that security is not centralized in AD or Azure and that sites are typically small and have a limited lifespan.

Talking about lifespan, a departmental site is not meant to have an expiration. We will manage document retention instead. Collaborative sites, on the other hand, may have expiration and retention rules and can be ad hoc in nature.

Last, but not least, is personal storage—what used to be your personal drive on a network server or local My Documents. Many individuals have moved or are planning to move to either on-premises or O365 One Drive for Business. This can be small or very large and comes with many functionalities, such as versioning, sharing, compliance, and access from anywhere and on different devices. With a capability that can go up to 5TB per user, and even more on demand, compliance may be required, but what about recovery?

If we recapitulate, the following is what categorization and recovery can look like:

  • A critical business application site collection has a very rapid SLA covering the site collection.

  • A departmental or business unit site collection(s) has (have) a granular document SLA and a database SLA.

  • An ad hoc project/team has a site collection SLA without granular level.

  • A personal drive is an isolated storage without any SLA besides the Sync Client and/or manager’s retention.

You can now translate this to a basic service definition , as in Table 6-2.

Table 6-2. Basic Service

Type of Site

Requirement

Type of Backup On-Premises

Type of Backup on O365

Process On-Premises

Process on O365

PTR

Critical business application xyz, portal, etc.…

Fast recovery

Usually small in size

Database

Duplicated site collection backup, if desired

Normal 14-day PTR

Recovery copy of database to extract site collection or overwrite database

Site collection restore operation, if used

Call support to recover faulty site

Not required, granular recovery covered by recycle bin and versioning/approval process

Collaboration

Small teams

Ad doc project-based

Site only recovery, no granular SLA

Database only

Recycle bin

Versioning

Database only

Recycle bin

Versioning

Recover only if site defective

Recover only if site defective

No granular PTR, users are responsible and can use local synced copies

My Site

Site only recovery, no granular SLA

Database only

Recycle bin

Versioning

Database only

Recycle bin

Versioning

Recover only if site defective

Recover only if site defective

No PTR, but may require management review and content retention if employee leaves company

Departmental, official corporate document management

Database and/or site collection

Document PTR

Database

Granular enterprise tool, if desired

Database

Cloud granular enterprise tool if desired

Recovery site collection or database, if faulty

PTR document based on policy

Recovery site collection or database, if faulty

PTR document based on policy

Yes based on policy and type of content

Now that we have a minimal content categorization, we will be able to define a backup retention strategy. I will provide some examples, but this will depend on your policies. Be sure to validate the typical statement whereby an organization thinks it has to keep everything forever. This is where technicalities are important to understand, as they greatly affect the restoration process.

Database and Site Collection Storage Strategy

As I had mentioned before, you may at some point have to restore a site collection that exceeds the 100GB threshold for supportability. A site collection backup and restore operation is generated from a SharePoint server. A backup will consist of a massive amount of read statements in SQL, downloading the site content to the SharePoint server and saving it as a binary file. This operation is very costly in bandwidth and time, especially when it comes to large sites.

The restore operation is even worse, as you are now taking the contents of this binary file and uploading it to SQL, using many update and/or insert statements. You can imagine that for a 100GB site collection, this will take a very long time, be sensitive to any network disruption, and potentially fill your SQL transaction log file or drive. In a storage strategy, which is bound to your recovery strategy when you know you will have a site collection that is expected to be large, like a departmental site, a record management or archiving site collection, you want to isolate the site and store it alone in the database. SharePoint content databases can contain more than one site collection, but in the case of large ones, you will want to have a one-to-one ratio. That way, if you have to recover a large site collection, you will use an SQL restore operation.

Dealing with large site collections and databases , you may be tempted to look at RBS storage. I will not cover this subject in this book, but it is rarely recommended to customers, unless they have a very strong requirement for it and the proper maturity level to handle its recovery complexity, to prevent any mismatch between the database and external storage. When also using RBS with a third-party vendor, the recovery process will be tightly bound with this vendor, and you must ensure that you will have the proper support from the vendor, depending on the recovery issue. RBS also may not be suitable or have limitations, based on what you expect to achieve.

Fortunately, in SharePoint Online, we deal with how we store large site collections, so there is no need to worry or think about it. It is also one of the reasons Microsoft supports site collections up to 1TB in SharePoint Online, as opposed to the previous 100GB limit.

The GUID Story

Other important factors to understand when dealing with site restoration are globally unique identifiers (GUIDs) . GUIDs are unique IDs that keep the consistency in the SharePoint structure, but they can lead to disconnected services and various additional issues. They are the relational aspect in SQL and, by doing some faulty operations, can lead to conflicting artifacts that you often see referenced as orphans. In dealing with recovery of databases and site collections, there are two important GUIDs: the site collection and the content database.

One operation that will lead to issues is the restoration and mounting to SharePoint of a copy of a database. In mount-spcontentdatabase, you have an optional parameter -AssignNewDatabaseId that allows you to have a new database GUID. Unfortunately, this command will not change the site collection’s GUIDs, in it or its URL. So, if you do this, you will end up with orphaned site collections. A copy of a content database that contains site collections can only be restored in a different farm.

The other operation that may lead to some issues is restore-spsite. By default, if you restore and overwrite a site collection using the -force parameter, the site will get a new GUID. Great! That should prevent orphans—but it will cause a disconnection with your Managed Metadata Service (MMS), if you have local terms in that site collection. Local terms are stored in the MMS database and have the site collection GUID to reference them. Change your GUID, and you will have the metadata available in your site collection, as the content is also stored there, but you cannot edit it, as the MMS is now referring to the wrong GUID. If you plan to overwrite a site collection, you should delete it first and use the latest parameter in SharePoint 2013: -PreserveSiteId.

And last, one common restore request is to have a copy of a site collection, sometimes for parallel improvement or granular recovery of some content. Ideally, you want to restore a copy in a different farm. This provides better protection from any mistake by users or administrators. But if you must, you can restore a copy of a site collection, as long as it is in a different content database, has a different URL, and, of course, a different GUID.

Figure 6-5 shows a typical site collection and overwrite restoration process when the database contains more than one site.

A434446_1_En_6_Fig5_HTML.jpg
Figure 6-5. Site collection operation

Figure 6-6 shows a typical restore operation when dealing with a large and isolated site collection.

A434446_1_En_6_Fig6_HTML.jpg
Figure 6-6. Large site collection

While Figure 6-6 shows a site larger than 100GB, I suggest that you proactively identify which sites are or will grow larger, isolate them right away, and use the database recovery process.

Another important point is when the SQL team overwrites the database, if any connections are still open, do not forcibly close them. The site may not be functional after restoration. It takes about five minutes for SharePoint to complete an unmount operation. This is the clean way of approaching a large site restoration and will ensure a successful process. And if the site does not respond after restoration (404 error or blank page), you may have to clear the Timer Job Cache and reset IIS.

As for database size, you probably saw the recommended 200GB. This is not a hard limit but a recommendation, owing to these backup/recovery processes. As you just read, in the case of an isolated site collection, all you must do is restore and overwrite a database. But imagine a scenario in which you must restore a copy for a more granular request. In this case, it would not be very practical to have a 2TB content database. The different database size recommendations, based on usage such as archiving and record management while dealing with capacity for IOPS (input/output operations per seconds) are well explained on TechNet. But, as I explained, they are bound by how content is managed in them for operational processes and SLAs.

For example, if you store all your archives in a very large database in one site collection, let’s say, 4TB, you understand that you will not require the same performance access as active sites or will require much higher IOPS. You will also have a different SLA and backup schedule, as the content is, in theory, inactive and mainly for read access. And if granular recovery is necessary, you will require an enterprise backup tool that will capture document changes, as you will not really restore a copy of that database to capture one document.

Recycle Bin and Versioning

These two very important features of SharePoint are often overlooked and not managed. They are also not well understood.

The Recycle Bin Myth

This myth is always the first thing I have the pleasure of highlighting when I visit a new customer. Still today, after so many years since SharePoint 2007, I see the surprise when I explain the reality of the recycle bin. YOUR DELETED ITEM DOES NOT GO TO THE SECOND STAGE AFTER THE RETENTION PERIOD IS MET! IT IS PERMANENTLY DELETED!

I am sure many of you reading this will be surprised too. Here’s how it works in only two sentences:

  • It will go to the second stage only if you delete it a second time, meaning that you empty your own recycle bin.

  • It will stay only in the second stage for the remaining time from the first deletion date, meaning that if you deleted a document and empty your recycle bin after ten days, that document has 20 days left in the second stage.

So, of course, the next question is, Can I change this? The answer is no for the process, but you can improve it.

Your first option is to change the retention period. You can extend the recycle bin life cycle to, let’s say, 90 days. Now, while this will improve your PTR for accidental deletion, it will require more space. Also, note that the recycle bin handles deleted lists/libraries, sites/subsites, and even site collections. So, the longer the retention, the more space is required.

What about SharePoint Online? The retention period cannot be changed and is set at 93 days. For the folks who must manage either storage capabilities on-premises or quotas on O365, following is a little more technical information that may help.

The first stage recycle bin goes against a site collection quota. Unfortunately, in many cases on-premises, I see people not applying quotas and having site collections running large, without the knowledge of the administrator responsible for capacity planning. On O365, a site collection quota is mandatory and forces you to think about the goal of that site collection. For example, a project/team site would have a small quota, so it is not used to doing something different from its original purpose, and a departmental document management site collection will have a large quota.

No matter the quota, the first-stage recycle bin content counts toward it, which means that it has an impact on site collection, database, and storage size. The second-stage recycle bin does not count toward a site collection quota. So, let’s say you are running low on site collection space. You could empty the first recycle bin and make some room.

While it doesn’t count toward the site collection quota, it is still in the database, meaning that it counts toward your capacity planning and tenant storage allocation. Also, the second stage is not unlimited. It can grow up to 50% of the site collection storage quota. On SharePoint Online, the quota is set at 200%.

Very Important Note

What happens if the second stage reaches that 50% on-premises? All content in it will be purged and permanently deleted. You can increase it up to 100%.

New in SharePoint Online

In modern collaborations such as Outlook groups, a new trend in support calls began to appear concerning the first-stage recycle bin. Currently, in every version of SharePoint, when a user deletes a document, it is only visible in the recycle bin of the person who deleted the document. Only a site collection administrator has the capability of viewing every deleted document in the first stage and the second stage.

This can lead to a lack of functionality in small collaborations in which you may have to recover a document deleted by a colleague, if you are also an editor. In SharePoint Online, users with editing permission will now be able to see a document deleted by a colleague.

Versioning

One of the first reasons people move from File Share to SharePoint is, of course, versioning. While the recycle bin protects you from accidental deletion, versioning is your first line of defense against undesired change or history tracking. Good recovery planning will include a versioning strategy. Why a strategy? Because versioning has a certain impact on storage requirement, and not every type of content requires the same versioning setting. SharePoint provides not only versioning but also content-approval processes.

When you enable versioning, the default settings will probably not apply to all your content. Depending on the type of site you are working in, versioning may or may not be enabled by default in a specific library (Figure 6-7). As well, when enabled, limitations are optional and inactive by default, which will consume more storage than if you impose limits. Fortunately, since SharePoint 2013, shredded storage was introduced, and only the delta (differences) are saved in the SharePoint database when saving multiple versions of a document, reducing storage requirement quite dramatically, compared to SharePoint 2010 and earlier versions.

A434446_1_En_6_Fig7_HTML.jpg
Figure 6-7. Versioning settings in a SharePoint library

As with the recycle bin second-stage myth I described previously, versioning also has a setting that is often misinterpreted. If you read the previous screen capture rapidly, you will assume that you can define the number of major and minor versions, basically, how many minor versions you can save between two major ones and how many majors. But go ahead and read again. The reality, and it is well explained in the setting page, is very different.

You should read how many major versions can be stored and how many of them will have minor versions. If I choose to keep five drafts, then only the last five major versions will have minor versions. The previous majors will not. But in these five majors, you can have up to 511 minor versions between each major.

This way of thinking assumes that once a major version is released, keeping track of changes with the last is not required permanently within the drafts. Basically, you will only have to keep track between two major versions. It also assumes that when working with versions, you will at some point publish that document and that you don’t really have to put any limitation between majors.

The reality is that, sometimes, many people work with a minor, without ever publishing a major, version. And some customers will even have high expectations of SharePoint deals with minors, such as being available in a search result. This is not a good working plan, and it should be better defined, depending, again, on the type of content authored.

Versioning requirements will probably be very different in a publishing web site than a collaborative team site. They will also be very different in official document managed sites. For example, in a collaborative site, there may not be much requirement to enable minor versions. In a publishing web site or with a legal department’s document management, where content approval is required, you will probably want to have draft versions enabled. But once a document is approved and final, the need for keeping the draft versions may be less important, at least for a certain time. If change tracking is required, it may be sufficient to keep drafts between two previous major versions.

Web Applications

A web application in SharePoint is in a very high level, an IIS site that exposes your site content from the database. While your content stored in site collection is backed up by the processes covered before, the web application is not.

Sometimes, I see people back up the web application, using SharePoint. To do so, you are, in fact, using the farm backup tool (or backup-spfarm) and selecting the web application. While this works, it is not very efficient, for many reasons that I will explain.

  • Backing up the web application will also back up the content database.

  • The SQL server will send a backup of the database to a Share on the SharePoint server.

  • This tool is not really in line with your enterprise database backup strategy, process, and tools.

A web application is not that much different from the service application we will look at following. Also, the creation of a web application and its customization, like its settings but also its IIS counterpart, such as web.config, certificates, and network settings, should be part of the build process and very well documented and, ideally, automated. If your process is in place, it is more efficient to rebuild a web application and use or restore the content database.

It is very rare that a web application has to be restored. It is usually the content. We usually “restore” a web application in another environment, and this should be done with your build process. Some vendors will offer restoring web application and IIS settings or even transfer them to a different farm. These are probably very good secondary options but should not replace your build process.

Service Applications

Like web applications, service applications are IIS web services that can also have Windows services dependencies and, in most cases, databases. Like web applications, they can be backed up using the SharePoint backup function (backup-spfarm). But in many cases, like web applications, they are redundant of your SQL database backup and build process. But there are exceptions. In the following section, I will cover the different categories of services, based on their recovery process. Some of them are common, and other are more unusual.

Services That Fit a Database Strategy

These services, like web applications, can rely on your overall build and SQL backup processes. Basically, if a service must be restored, rebuild the service, using a restored database. Following is a list of services that can use this model:

  • Managed Metadata Service

  • BCS Service**

  • User Profile Service *

  • Secure Store *

  • Subscription setting

  • Apps *

  • Machine Translation Service Application

  • Performance Point**

    * Requires additional steps to complete the process and may depend on other services

    ** May use Secure Store

As explained in TechNet, you can simply overwrite a service database, but you need, at a minimum, to stop the timer services. You should also never forcibly close the connections when restoring in SQL. In some cases, this is insufficient, and deleting the service application and database(s) is the safest method to recover. In any case, if after a restore the service is not running as expected, you will probably have to reset the IIS service and, in some cases, clear the timer service cache.

Also, restoring only the content of a service application is probably very rare, unless you are copying the service to another farm. Usual restoration requests for a service would probably be due to instability or misconfiguration. In this case, it is preferable to rebuild the service application, using the restored database and reprovisioning the end points (Services on server Stop/Start). Note that deleting the database by removing the service application may not be possible, if it is part of a SQL availability group. Deleting a database may also create overhead on the SQL team.

Basic Steps

The following steps must be followed to restore your service applications:

  1. Delete the service application and its databases.

  2. Note: If the databases are part of an SQL availability group, you won’t be able to delete them. You can decide to keep them, but you must restore it to all SQL instances.

  3. Request an SQL restore operation from the SQL team and overwrite, if you did not delete in the previous step.

  4. Create your service application using your known build and/or script process, using the same database name.

  5. Your script should also include starting any services instances on the proper servers, unless you use the MinRole of SharePoint 2016.

Dependencies (Restore in Different Farm)

A common problem when restoring a service database to be used in another farm, such as refreshing your pre-production environment, is the GUID mismatch. In services such as Managed Metadata, there are references to GUID present in other databases, such as site collections. If the site collections did not come from the same environment and were also copied using the database process, their GUIDs will be different, and when you restore your service in that environment, your site collection local MMS columns will be grayed out, as the restored MMS database will point to a different site collection GUID. Similar issues can apply to the App Service application.

This chapter will not cover how to fix these scenarios, as it covers only restoration within the same farm, bypassing this issue, but it is important to understand how it affects your content and service staging and refreshing.

Additional Steps for Some Services

User Profile Service

In the case of the User Profile Service, you must also restore the Social database. If you are using the FIM engine to synchronize, you are also using the profile Sync database. This database contains encryption, and the key must be restored, if you plan to restore it.

The easiest method to restore the User Profile Service with FIM is to not restore that database.

  1. Delete the service and the databases.

  2. Restore the Profile and Social databases.

  3. Re-create the service. A new Sync database will be created.

Secure Store

On TechNet, the documentation only offers the SharePoint backup as a recovery method, but you can still use the database method. There is one extra element that is very important to consider with this service. This is the Master key that you created originally by using a pass phrase at the service creation. You must store this pass phrase, as it will be required if you re-create your service application using the restored database (Figure 6-8). Note that if you did not delete the service application and only overwrote the database, you do not have to refresh the pass phrase (Figure 6-9).

A434446_1_En_6_Fig8_HTML.jpg
Figure 6-8. Master key error after restore
A434446_1_En_6_Fig9_HTML.jpg
Figure 6-9. Master key refresh
App Management Service

The App Management Service has multiple dependencies. If you are restoring within the same environment, consider which component you want to restore.

  • App Management Service

  • Secure Store service application

  • Subscription service application

  • Apps Catalog Site Collection

  • Any site collection with Apps/Add-in deployed

If you plan to restore the App Service in the same environment, you need only restore the service database. If, in the other end, you want to restore apps in the catalog, the site collection will also have to be restored.

It is also important to note that if you are restoring in another farm, every dependent artifact must also be restored. Otherwise, you may have to reinstall or configure some components, such as trusts, manifest, etc….

For this scenario, because we are only restoring the service application in our current environment, the same process as with the other service applies.

  1. Delete the service application.

  2. Restore the database.

  3. Re-create the service application using the same database name.

Access Service (2013)

This service application relies on the App Service Application and secure store. While Access Service can be backed up using the SP-Farm backup process, restoring it may still lead to issues and errors. Also, re-creating the Access Service application causes the existing Access App to stop working. The newly created Access Service application will be OK.

The problem when re-creating the Access Service application is that it creates a new GUID entry for the hosting SQL server. Any new access apps will use the new GUID, and the previous one will try to connect to the one that does not exist anymore. In the case of a DR farm or recovery of the service within the farm, you will have to reconfigure the service to use the previous GUID. If you are not able to point to the previous GUID, you will have to export the Access data using the Access client, by creating an App Package.

To identify your SQL server reference ID, before any recovery attempt, execute the following script. It may be a good idea to keep this GUID documented in your build documentation.

$ASapp = Get-SPAccessServicesApplication
$app = $Null
if ($ASapp.length -ne $Null) { $app = $ASapp[0] } else { $app = $ASapp }
$context = [Microsoft.SharePoint.SPServiceContext]::GetContext($app.ServiceApplicationProxyGroup, [Microsoft.SharePoint.SPSiteSubscriptionIdentifier]::Default)
Get-SPAccessServicesDatabaseServer -ServiceContext $context

Otherwise, you will probably find it in the ULS log, where after a restore, you may get the following error in the ULS log: “AccessServicesDatabaseServerGroupCollection.GetDatabaseServer: Could not find a server matching reference id…”

To add the previous server entry, you can execute the following script, by inserting the previous GUID in the $ServerRefID and your SQL Server Host Name variables.

$ServerRefID = "Your previous GUID here"
$SqlHost = "Your SQL Host Name"
########
$serverGroupName = 'DEFAULT'
$ASapp = Get-SPAccessServicesApplication
$app = $Null
if ($ASapp.length -ne $Null) { $app = $ASapp[0] } else { $app = $ASapp }
$context = [Microsoft.SharePoint.SPServiceContext]::GetContext($app.ServiceApplicationProxyGroup, [Microsoft.SharePoint.SPSiteSubscriptionIdentifier]::Default)
$newdbserver = New-SPAccessServicesDatabaseServer -ServiceContext $context -DatabaseServerName $SqlHost -DatabaseServerGroup $serverGroupName -ServerReferenceId $ServerRefID -AvailableForCreate $true

Services That Do Not Have a Database

These services have only to be re-created if there are issues and do not possess any recovery steps .

  • Excel Services

  • Visio Services

  • Word Automation

  • Work Management

Search

Search is very different and has its own complexity. It contains not only databases but local indexes that need to be in sync. Therefore, there are no complete search database recovery processes. If your SLA requires that you back up search completely, in a scenario in which recovery would be faster than rebuilding the indexes and crawling the content, you will be required to use the SharePoint backup process (backup-spfarm).

Backup-spfarm -Item "FarmShared ServicesShared Services Applications<SearchServiceApplicationName>"

If you desire only to keep the search configuration, your build process and documentation should have the automation and documentation for any configuration, such as custom result sources, content sources, etc. If automation/documentation is not present or not up to date, it is possible to reuse the Admin database to re-create your new Search Service Application, using the same process to create the service via the existing/restored Admin database. You must also ensure that you have a copy of the thesaurus files that you used to import into search, as they are not part of the SharePoint backup.

Server

One potential mistake I often see is the expectation to reuse a nightly snapshot of a VM as a restore method for a server. The only artifacts that should be backed up from a server are specific manual deployments that are not done by SharePoint.

  • Web.Config entries

  • Networking configuration

  • IIS custom settings

  • Certificates

  • Any custom artifacts (Dll, resources, etc., not deployed by a SharePoint packaged solution)

If a server becomes unstable for unknown reasons, or is compromised by a disk corruption or bad update, it is very tempting to simply bring back a server from the previous snapshot.

This is a very risky operation, and while some people may achieve the expected result by clearing the timer cache, it may introduce unexpected results and even affect the integrity of the farm. Server recovery is only used in a full rollback scenario, including every other component from a point-in-time backup within minutes.

The proper method is to remove the server from the farm and rebuild and reintegrate it. If you have high availability implemented, this can be achieved during the production hour. The only impact would be a performance reduction on the services provided by the server.

Since SharePoint 2013, a new complexity appeared when removing a server from a farm, which caused some issues when not done properly. If a server is either part of a distributed cache cluster or a search topology, it must be removed from these before being disconnected from the farm. Skipping these steps will cause orphan entries, even if the server is reinserted with the same name.

The following script will prevent you from removing the server from any cluster and topology before being disconnected from the farm. If the server is available and running, always attempt to execute the script from it. If it is not healthy, or simply not available anymore, run the script from another server. It will force the removal of the entries.

$FqdnServerName = read-host ("What is the FQDN server name to remove?")
$servername = $FqdnServerName.split(".")[0]
Add-PSSnapin microsoft.sharepoint.powershell


#####Cleaning up distributed cache###############
$instanceName ="SPDistributedCacheService Name=AppFabricCachingService"
$serviceInstance = Get-SPServiceInstance | ? {($_.Service.Tostring()) -eq $instanceName -and ($_.Server.Name) -eq $ServerName}
if($serviceInstance)
{
    if ($servername -ne $env:COMPUTERNAME)
    {
        $notGraceful = $true
    }


    if(!$notGraceful)
    {
        $startTime = Get-Date
        $currentTime = $startTime
        $elapsedTime = $currentTime - $startTime
        $timeOut = 900


        try
        {
            Write-Host "Shutting down gracefuly distributed cache host. This can take a few minutes to transfer cached memory to other DC servers"
            $hostInfo = Stop-CacheHost -Graceful -CachePort 22233 -ComputerName $FqdnServerName


            while($elapsedTime.TotalSeconds -le $timeOut-and $hostInfo.Status -ne 'Down')
            {
                Write-Host "Host Status : [$($hostInfo.Status)]"
                Start-Sleep(5)
                $currentTime = Get-Date
                $elapsedTime = $currentTime - $startTime
                $hostInfo = Get-CacheHost -HostName $FQDNServerName -CachePort 22233
            }


            Write-Host "Stopping distributed cache host was successful. Stopping Service in SharePoint."
            Stop-SPDistributedCacheServiceInstance
            Write-Host "Removing from cluster."
            Remove-SPDistributedCacheServiceInstance
        }
        catch [System.Exception]
        {
            Write-Host "Unable to stop cache host within 15 minutes."
            $NotGraceful = $true
        }
    }
    if ($NotGraceful )
    {
        if($env:COMPUTERNAME -EQ $servername)
        {
            try
            {
                write-host "Removing server without graceful"
                Remove-SPDistributedCacheServiceInstance
            }
            catch [System.Exception]
            {
                write-host "Unable to remove Server with remove-SPDistributedCacheServiceInstance, forcing deletion"
                $forceDelete = $true
            }
            }
            else
            {$forceDelete = $true}
    }


    if ($forceDelete)
    {
        $serviceInstance.Delete()
    }
}


##############CLEANING UP SEARCH TOPOLOGY######################
write-host ("Removing server from search topology")
$ssa = Get-SPEnterpriseSearchServiceApplication
$active = Get-SPEnterpriseSearchTopology -SearchApplication $ssa -Active
$clone = New-SPEnterpriseSearchTopology -SearchApplication $ssa -Clone -SearchTopology $active
$componenttoRemove = Get-SPEnterpriseSearchComponent -SearchTopology $clone | where-object {$_.servername -like $ServerName}
if ($componenttoRemove)
{
    Foreach ($component in $componenttoRemove)
    {    $CId = $component.ComponentId
         write-host "Removing $cid"
         Remove-SPEnterpriseSearchComponent -searchtopology $clone -Identity $component -confirm:$false
    }


    Write-host ("Applying New Clone Search Topology. This can take a few minutes")
    Set-SPEnterpriseSearchTopology -Identity $clone
}


#########Disconnecting from the farm################
if($env:COMPUTERNAME -EQ $servername)
{


    write-host ("Disconnecting server from the farm")
    try
    {
        disconnect-spconfigurationdatabase -Confirm:$false
    }
    catch
    {
        Write-host ("Unable to disconnect, remove server from Central Admin or try with Configuration Wizard")
    }


}
else
{
    try
    {
        $serverToDelete = get-spserver $servername
        $serverToDelete.delete()
    }
    catch
    {
        Write-host ("Unable to delete server, remove server from Central Admin")
    }
}

Farm

Configuration Database

By now, you should know that restoration of the configuration database is not supported. The only method by which you can reuse the configuration database is by some replication scenario in which every bit of SharePoint is guaranteed to be backed up at the same time, including VM and databases.

The other typical scenario in which a configuration database will be restored is when a complete backup is taken before a major update, while the farm is offline, and where, in the case of a major issue, it is part of a rollback plan whereby every component, server, and database is restored at the same point in time. This is an all-or-nothing scenario. You cannot restore a configuration database alone from a point in time different from that of any other of its components.

I always tell my customers that the best farm recovery process is a great build process. It is much easier to rebuild than fix. If it is not the case in your organization, you should prioritize testing your build process.

SharePoint Farm Backup

The SharePoint farm backup can be practical and useful in certain situations, but it may not be sufficient in many scenarios. It is also redundant of your actual SQL backup and build strategy. If your farm becomes unstable, unusable, or even if one of the servers is problematic, chances are that using the restore-spfarm operation will not fix your issue. I like to think of this backup as a setting backup that does not replace a good enterprise strategy. Remember that the configuration database is not really restored, and if it was damaged, chances are that the farm backup would not fix it.

The farm backup can be used to reapply settings to a new farm, if you don't have any automation in place. Also, the backup-spfarm configuration can only be useful to recuperate settings, without backing up other components, such as databases that are already taken by SQL backups.

Summary

As you may have noted by now, I am a firm believer in a great build automation, coupled with a good database backup strategy. The most common issue I faced is a lack of familiarity with these operations, with outdated build and deployment documentation and a lack of practice or priority regarding them.

As a general recommendation, to ensure that your organization’s backup process is healthy, you should

  • Practice your farm build process.

  • Test your services recovery until you have the right recipe in your recovery cookbook.

  • Ensure that you have all the customization sources and packages (WSP) up to date in a source controlled environment.

  • Ensure that the custom application deployment documentation is up to date, with separated initial blank deployment vs. one that already has site artifacts implemented.

  • Ensure that you have all the third-party installation and required keys available, including SharePoint binaries.

  • Ensure that you have any certificate or other server manual artifacts available.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset