Combining Disaster Recovery Farms with SQL Redundancy for SharePoint with SQL Server AlwaysOn

Alternative title: “how do I get a redundant SQL backend combined with a disaster recovery site, using AlwaysOn”?

This is a question that has come up a bit just recently so I though I'd clarify how this can work. Arranging the SQL Server AlwaysOn setup for SharePoint, especially when a contingency/disaster-recovery farm is involved alongside using AlwaysOn for local SQL redundancy for the primary SharePoint farm isn’t an easy thing to visualise. So let me help with that!

SQL Server AlwaysOn can be used in two distinct ways when it comes to giving SharePoint its' databases; both for maintaining synchronisation for disaster-recovery/secondary SharePoint farms, or just providing failover options for SQL Server on a single farm. Here's how both of those work on their own.

 

Scenario 1: AlwaysOn for SharePoint Disaster-Recovery/Contingency Farm

Here’s how AlwaysOn is used for contingency/disaster recovery sites. We have two distinct SharePoint farms which share the same content so we can switch users between each farm:

image

The only thing in an availability group are the content databases; everything else is just locally mounted. If there’s a problem on one side, we switch to the other side both SQL and SharePoint together. Also, aside from content, we don’t care about syncing much else – each farm basically has its own unique DBs (although this has caveats you need to be clear about).

Notice that there’s no listener here, very deliberately; we don’t want the primary farm reading the database if it’s in use on the DR side and visa-versa. This is very important because a great way of murdering your content is if two farms read/write from the same content-database simultaneously. Really, Microsoft will drop you like a hot potato if have enabled multiple farm writes to the same content DBs, so definitely no AlwaysOn listener for this scenario.

 

Scenario 2: AlwaysOn for SharePoint Local Failover

This is the classic “my SQL Server can die and nobody will notice” scenario, but with just a single farm. Here’s how AlwaysOn is used for single-farm redundancy:

image

Every database is in one availability group & we can switch between SQL nodes, even take each one offline without causing much drama; SharePoint keeps working because the listener transparently redirects SharePoint to the active node for all the databases in the availability group (which is all of them).

Combining Content-Sync with Local Redundancy in AlwaysOn

So the challenge is, how do we combine these two AlwaysOn uses into one single setup? It’s not such an easy question to answer because of how different the uses of AlwaysOn are – one is pure data-sync without a listener by design, and the other is entirely dependent on listeners to make sure failovers don’t take out the single farm. Combining these models takes some thought.

One possibility is having x2 AlwaysOn groups first for failover, one for each farm. Like so:

image

This works; we can failover any availability group and because of the listeners we get no serious interruption. The obvious problem is we have no content databases involved at this point, because content DBs are shared between farms rather than a single farm.

So we add a third availability group (AG) for just syncing content databases between the primary site:

image

Again, by design we can’t use a listener for the content databases availability group. This is because when we set SQL-N3 as the group primary server, because we’re running off the DR farm, we have to actively avoid the primary farm from being able to read & write to the content databases if we want to not kill said databases.

This does mean of course we don’t get local redundancy for these databases, given they’re nothing to do with the primary AG. For now, there’s no particularly clean workarounds for this (that I can think of anyway); replicating the content to SQL-N1 could work & just using local aliases to redirect on the “client” (SharePoint servers) could work, but it’s not as clean certainly.

Another workaround would be to make SQL-N2 & SQL-N3 (in this example) failover cluster SQL instances. Yes, it’s perfectly possible to make an AlwaysOn cluster out of classic failover clusters for the high-availability architecture even nuclear war would struggle to take out.

 

An Alternative – Replicate All DBs to DR Site

Another option I’ve seen is to replicate all databases to a DR site via asynchronous replication but also to a local SQL node for failover. This isn’t the same thing though as basically if you used the configuration database on the DR side, you’re basically in the same farm at that point.

That means you don’t have SPFarm independence and basically your DR & normal production SPServers will be lumped together on both sides. Not recommended, in my opinion.

 

Wrap-up

So hopefully that helped explain a bit more how these scenarios can work together. Let me know if anything’s not clear!

Cheers,

// Sam Betts