Super SQL Server Clusters for SharePoint - Part 2

Article
12/05/2016

This is part two of one of how to setup the mother of all database clustering solutions for SharePoint. In part one we setup the failover cluster, installed SQL Server and configured the failover clusters.

Previously we’d setup our super-cluster & installed SQL Server, this time with two failover clusters; one for each farm.

Now we need to setup or create our SharePoint databases for both the primary & secondary farms.

Create SharePoint Databases for Each Farm

At this point we should have all of our SQL instances ready to go. Next we need to add the databases so we can setup AlwaysOn for those databases.

Using client-side SQL aliases, this is what we want to build first:

Both farms have separate configuration & some service apps where it makes sense (seriously, this is important). For anything we’re going to copy between farms like content-databases for example, we’ll just create/move the databases onto the primary farm.

First create/edit a SQL alias on each SharePoint server:

Create new farms; one on each failover cluster for each farm, using said alias:

…or use PowerShell if you prefer.

You could just as easily move databases if you already have SharePoint farm(s) setup already; the principal is the same – we want two farms working, each one on a separate failover cluster.

Create Service-Apps

Once we have two farms setup & ready we need to create the service-apps next.

Create & configure your service-applications as needed on the primary farm. Leave the secondary farm as we’ll use copies of everything from the primary farm, with one exception…

Each farm should have its own separate search application + databases.

Search isn’t something that you want to copy between farms as we want farm redundancy on this huge chunk of SharePoint, so we’ll just fail it over to the local farm SQL instances.

Create Content Databases & Web-Applications

Also as before, create a web-app on the primary farm only with either new or restored content databases.

Create AlwaysOn Availability Groups for SharePoint Farms

At this point we should have all of our SQL instances & databases ready to go for both SharePoint farms.

Now we have to create the availability-groups for them – we want the following SQL Server AlwaysOn availability groups:

Farm-local failover groups.
- All the databases that won’t ever move between farms, one group per farm.
Service application availability group.
- All service-app data that both farms will share, copied between the two farms.
Content database availability group.
- All the content that both farms will share, copied between the two farms.

Now, it should be pointed out that creating AlwaysOn availability groups is much (so much) easier when the database file paths are exactly the same across all instances. If this is the case, you can perform an initial “full” sync via a shared network path to backup & restore the databases + transaction-logs.

Otherwise you need to do the DB + transaction-log restores manually & just have SQL Server join the databases. This my friends, is an epic amount of work if you have to do it by hand.

Farm + Search Availability Groups

We need two farm AlwaysOn availability groups; one per farm so we have farm redundancy. That means we’ll have two copies of each farm database & two SQL instances we can use, with at least one instance with two member servers.

We want the config DB, central admin content + the search databases in this group. All need to be in “full recovery” mode, so just make a backup of each database and they should be ready to add. Add the state service DB too if you have one.

We don’t care too much about the usage database because it can be very heavy on updates, and isn’t critical.

Here we’re including the state service in the local farm group, meaning it won’t be synced between farms. This is because I don’t have any SharePoint 2010 style workflows so isn’t needed to be synced, but your requirements might differ (read: if you have workflows, don’t include the State Service DB in the local farm group – sync with service-apps DBs).

Next up; add the secondary replica for our local farm failover AG, which in farm 1s’ case is “SQL-N1”.

Now we need to join the databases. In my specific setup we can’t do a full sync here as both SQL-N1 & SQL-N2 use different paths for the database files (something you can technically avoid if you know ahead of time, but I forgot).

Anyway, as mentioned earlier, the alternative is the somewhat-more-hassle restoring each database with no recovery (important) on the secondary so the wizard can just join each DB to the availability group instead of creating each one too.

Join only a pain to get DBs in sync, but nobody said this will be easy. Assuming all the databases to add are ready on the secondary instance (SQL-N1 in this example), in “restoring” mode then this will work no problem.

Create availability group “SPFarm1” between instances SQL-N1 & SQL-N2, and another availability group “SPFarm2” between SQL-N3 & SQL-N4.

Service-Apps Availability Group

Next up, we want another separate availability group for service applications. On farm 1 create/mount the following service applications:

Managed Metadata
User profiles
App Management
Secure Store
…
Whatever service-apps you use that have databases that need syncing.

Once created & working, create a new availability group between SQL-N2 & SQL-N3 to synchronise the databases in those applications, except the sync DB for user profiles.

Once mounted, primary to SQL-N3 & in farm 2 add the service-applications using the existing databases setup in AlwaysOn.

More on sharing service-apps between farms @ https://blogs.msdn.microsoft.com/sambetts/2016/03/21/running-sharepoint-service-applications-in-read-only-mode-for-disaster-recovery-farms

Content Database Availability Group

Finally do the same again for content databases as you did for service-apps in a 2^nd group between SQL-N2 & SQL-N3, just with the content databases. One new availability group between nodes SQL-N2 & SQL-N3.

Create Listeners & Update SharePoint Aliases

Now we just need to add listeners for the SPFarm availability groups so SharePoint will failover between the farm instances transparently.

I’ll add mine here; for some reason SQL doesn’t let me create one unless all possible subnets are covered, even though for the farm-specific connections were never going to cross a subnet.

Having a listener for each farm means either farm 1, for example should be able to use either SQL-N1 or SQL-N2 instances; the change-over will happen transparently.

Even better, as SQL-N2 is a failover cluster of two members, SQL-N2 itself could even failover to either SQL-N2-S1 or SQL-N2-S2 and again, nothing would change. Two levels of failover transparency! My goodness.

Anyway, to seal the deal, update each SharePoint farm server to use either “SPFarm1Listener” or “SPFarm2Listener” as we did before.

Everything Finished – Let’s Review the Setup

At this point your SQL Server cluster setup should look something like this:

SQL-N1 (single member failover cluster)
- SPFarm 1 availability group + listener.
SQL-N2 (2-server failover cluster)
- SPFarm 1 availability group + listener.
- Service Apps availability group.
- Content database availability group.
SQL-N3 (2-server failover cluster)
- SPFarm 2 availability group + listener.
- Service Apps availability group.
- Content database availability group.
SQL-N4 (single member failover cluster)
- SPFarm 2 availability group + listener.

You cluster host should now have no less than 8 roles in it:

Four failover SQL clusters & four availability groups. Clustering on steroids!

Testing Failovers & Outages

So this is the key goal here then; which SQL Servers can die without SharePoint suffering any outage? How many before we need to use the other farm?

No Farm Failover Needed

First let’s see what we can kill without needing to failover to the other farm.

We can technically keep working with up-to 4 SQL Servers offline & just 2 left, which is quite unlikely to ever happen. In this scenario we’ve lost two clustered roles too as there’s no servers left to run them. Assuming the two offline servers weren’t in the same failover cluster, we’d still have data-synchronisation between farms too.

Any one SQL Server can go offline without the slightest interruption of service, albeit we’d possibly have to failover the AlwaysOn AG manually, as automatic failover only works for standalone instances.

Farm Failover Needed

So what could go offline which would need an entire farm failover then?

In this case we have just one SQL Server running of the six setup, and even then we could still use the 2^nd farm. Synchronisation would be stopped but should in theory resume anyway.

Wrap-Up

Setting this kind of architecture up is very complicated, but as you can see it does give the benefit of having a zombie-like SQL backend for SharePoint – one that just won’t ever die.

In all honesty I’d only recommend this over SQL Server AlwaysOn for SharePoint DR if SQL/SharePoint uptime is absolutely critical, and so is guaranteeing data movement between SharePoint farms. In reality, it’s the guaranteed data movement that this model really gives over the standalone AlwaysOn for SP-DR model, but yet it’s quite a lot more complicated to setup.

Another negative is the lack of automatic failover between AlwaysOn instances, but as each instance is a failover instance itself that shouldn’t be an issue.

Anyway, in short, this is your solution if costs aren’t an issue & uptime is critical.

If you need an uptime solution for more reasonable costs, get AlwaysOn with just standalone instances.

Cheers,

Sam Betts