Service Fabric Backups and CN certs – bug discovery

I encountered a strange behavior while working with Backup and Restore service running on a Service Fabric 8.0.514.9590 cluster. I assume the behavior exists on prior releases as well.

The Scenario

This cluster had been deployed initially referencing the certificate Thumbprint rather than the Common Name attribute, following the MSFT documentation for doing so. Switching over wasn’t a big deal, but it took forever. On this cluster, I would have been better off blowing the cluster away and re-deploying it fresh. The process went through a total of four cluster-wide upgrades. After all was said and done, the cluster had successfully switched to Common Name certificates, a Microsoft best practice that you can read more about here.

As I was wrapping up the deployment, I realized that SF Explorer was throwing errors about backups every time the page refreshed. The behavior is described by a user on Github in this issue. I also connected via the Connect-SFCluster cmdlet and attempted to interact with the ‘BackupRestoreService’ with the same result… authorization errors.

It turns out that when the cluster was re-deployed fresh using the Common Name certificate method, ‘BackupRestoreService’ worked just fine. So there must be an undocumented behavior or a bug of some kind when switching from a thumbprint-based deployment to a Common Name based deployment.

Work-arounds

If you run in to the issue, a fresh re-deploy with a Common Name tied to the cluster certificate rather than a thumbprint works. The ‘BackupRestoreService’ still has the thumbprint defined in the deployment and I imagine it will need to be manually changed when the next certificate is issued and deployed, but we’ll see if MSFT provides some feedback and guidance. I wonder if you could just leave the thumbprint value out entirely and it still utilize the Common Name to encrypt the connection string. Probably not…

Here’s the template section:

{
 "name": "BackupRestoreService",
 "parameters": [{
   "name": "SecretEncryptionCertThumbprint",
   "value": "[parameters('certificateThumbprint')]"
   }]
},

Update: MSFT has confirmed two things. 1) Switching a cluster from thumbprint-based certificate deployment over to Common Name deployment should not break BackupRestoreService. 2) BackupRestoreService still requires a thumbprint in its configuration. The behavior is a bug.

I’m still waiting on MSFT to provide more information regarding why the switch to Common Name broke BRS. They’re attempting to re-produce it in a lab.

For the BRS configuration itself, just remember to update your BRS thumbprint when a new certificate is deployed to your cluster prior to certificate expiration. You can do this either via the ARM template or directly on the portal. I recommend using the template.

Leave a comment

Close Bitnami banner
Bitnami