Automatic rollback for Azure deploy with pipeline

Recently I gave a talk about several tips and trick in your Azure DevOps pipelines. One of the tricks in this talk was to implement a automatic rollback for your azure deployment. In this blogpost I will explain how to automatically rollback to the last known good configuration once a Azure Deployment fails.

In this blog I will be using ARM templates for the deployment but this can just as easily be done by using Bicep or Terraform.

The files used in the talk can be found on my GitHub page. In this blog we will only look at folder 4 in the repository. In this pipeline we will deploy an Azure VM which if fails will rollback to an earlier version.

The start of the pipeline

Let’s look at the first part of the pipeline:

trigger:
- none

parameters:
- name: rgName
  displayName: Name of resourcegroup?
  type: string
  default: "test-rg-pipeline"
- name: useLastKnownGoodConfiguration
  displayName: Use last known stable configuration (if false the most recent commit will be used)?
  type: boolean
  default: false

variables:
  - group: "Deployment Variables"

pool:
  vmImage: ubuntu-latest

The trigger of the pipeline is here set to none, in a production environment you probably want to trigger this pipeline every time something changes with your IAC files.
Afterwards two parameters are defined. The Resource Group Name is not that important, this was added to be able to run the pipeline multiple times at once for different resourcegroups to test quicker. The second parameter is important. This boolean parameter is used to determine if the pipeline should run with the current commit selected or if it should run with the last known good configuration. Both parameters have default values assigned so the pipeline can be run without entering any parameters.

A variable group is defined, this variable group contains two variables:

  • LastKnownGoodConfiguration : this holds the commit ID of the last known good deployment.

  • Password : this holds the password to use in the deployment of the azure vm.

The variable group is a library linked to an azure KeyVault.

The build stage

Next up is the first stage of the pipeline which is called Build.

stages:
- stage: Build
  jobs:
  - job: Build
    displayName: Create Artifact
    steps:
      - checkout: self
        displayName: Clone the repository
        fetchDepth: 100
        persistCredentials: true
      - task: PowerShell@2
        displayName: Switch to last known good configuration
        inputs:
          targetType: 'inline'
          script: |
            git reset --hard $(LastKnownGoodConfiguration)
        condition: ${ }
      
      - task: PublishBuildArtifacts@1
        inputs:
          PathtoPublish: "$(Build.SourcesDirectory)"
          ArtifactName: "drop"
          publishLocation: "Container"
      - task: PowerShell@2
        name: StoreCommit
        displayName: Store commit
        inputs:
          targetType: 'inline'
          script: |
            Write-Host "##vso[task.setvariable variable=commit;isOutput=true]$(git log -n 1 --pretty=format:%H)"

In this stage a package is created which will be used in the rest of the pipeline. This is done because it’s possible there is a delay between when the pipeline was started and when it actually deploys things. It’s even possible when doing multiple deployments that there is time between these steps. Especially when using approvals or manual interventions. This could mean that the code in the production branch of your source control could change in the meantime so to make sure the pipeline will only deploy what is approved for this pipeline it starts with a checkout to get all the files from the branch. It will use a fetchdepth of 100 here, this is to ensure also earlier commits are retrieved.

The next step has a condition to only run when the parameter at the top is set to true. If this is the case it will use powershell to change the HEAD of the branch to a different commit. The commit chosen is the one stored in the keyvault and available as variable due to the variablegroup.

In the third step a package of the files is created. So when the parameter for lastknowngoodconfiguration is set to false this will just be the branch/commit selected when running the pipeline. If the parameter is true this package will contain the files of the last known good configuration.

In the last step a git command is used to get the currently active HEAD commit ID. This value for now is stored in a variable so it can be stored later if the pipeline was successful.

Deploying the resources

Next up in the pipeline it’s time to deploy the resources.

- stage: Deploy
  displayName: Deploy
  jobs:
  - deployment: Deploy
    displayName: Deploy resources
    environment: Test
    variables:
      - name: commit
        value: $[ stageDependencies.Build.Build.outputs['StoreCommit.commit'] ]
    strategy:                  
      runOnce:
        preDeploy:
          steps:
          - download: current
            artifact: drop
          - task: AzureResourceManagerTemplateDeployment@3
            displayName: Deploy Resource Group
            inputs:
              deploymentScope: 'Subscription'
              azureResourceManagerConnection: '<YOUR SERIVICE CONNECTION>'
              subscriptionId: '<YOUR SUBSCRIPTIONID>'
              location: 'West Europe'
              templateLocation: 'Linked artifact'
              csmFile: '$(Pipeline.Workspace)/drop/CreateRG/template.json'
              csmParametersFile: '$(Pipeline.Workspace)/drop/CreateRG/parameters.json'
              overrideParameters: '-rgName ${ }'
              deploymentMode: 'Incremental'
        deploy:
          steps:
          - task: AzureResourceManagerTemplateDeployment@3
            inputs:
              deploymentScope: 'Resource Group'
              azureResourceManagerConnection: '<YOUR SERIVICE CONNECTION>'
              subscriptionId: '<YOUR SUBSCRIPTIONID>'
              action: 'Create Or Update Resource Group'
              resourceGroupName: '${ }'
              location: 'West Europe'
              templateLocation: 'Linked artifact'
              csmFile: '$(Pipeline.Workspace)/drop/CreateVM/template.json'
              csmParametersFile: '$(Pipeline.Workspace)/drop/CreateVM/parameters.json'
              overrideParameters: '-adminPassword $(Password)'
              deploymentMode: 'Incremental'

In this pipeline we make use of deployment jobs to control the steps for deployment better. To illustrate the use of the predeployment and deployment phase in this job the deployment template is split up in two different steps where first a resourcegroup is created and afterwards a VM is created in this resourcegroup.

You’ll notice that in the predeploy phase a step is added to download the artifact while this isn’t done in the deploy step. This is because the deploy step will automatically download the artifacts associated with the pipeline so it doesn’t need to be added here.

In this example the tasks for ARM templates are used to deploy the files but this could be changed to Bicep or Terraform steps without changing the rest of the functionality.

On success or failure

The interesting part is what follows now. The deployment job has a special phase called “on” which has two situations, it’s shown in the last part of the pipeline.

on:
          success:
            steps:
            - task: AzureCLI@2
              inputs:
                azureSubscription: '<YOUR SERIVICE CONNECTION>'
                scriptType: 'bash'
                scriptLocation: 'inlineScript'
                inlineScript: 'az keyvault secret set --vault-name ''<YOUR KEYVAULT NAME>'' --name ''LastKnownGoodConfiguration'' --value ''$(commit)'''
          failure:
            steps:
            - task: TriggerBuild@4
              inputs:
                definitionIsInCurrentTeamProject: true
                buildDefinition: '<YOUR PIPELINEID>'
                queueBuildForUserThatTriggeredBuild: false
                ignoreSslCertificateErrors: false
                useSameSourceVersion: false
                useCustomSourceVersion: false
                useSameBranch: true
                waitForQueuedBuildsToFinish: false
                storeInEnvironmentVariable: false
                templateParameters: 'useLastKnownGoodConfiguration: true'
                authenticationMethod: 'OAuth Token'
                enableBuildInQueueCondition: false
                dependentOnSuccessfulBuildCondition: false
                dependentOnFailedBuildCondition: false
                checkbuildsoncurrentbranch: false
                failTaskIfConditionsAreNotFulfilled: false

Let’s start by looking at the on.success part. Here it writes the commit ID we stored earlier to the azure keyvault, so after the whole deployment is complete this commit ID because the new lastknowngoodconfiguration.

In the on.failure part I make use of the trigger build step from the azure devops marketplace. This could be done with the API too if you don’t want or can’t use the extension from the marketplace. This will trigger a run of the pipeline but now it will set the parameter to true. If for some reason your pipeline fails it will show a new run like this:

You’ll notice the new run was initiated by a different account, this could be changed but personally I prefer this to make it more visible that this was an automated rollback run.

Possible improvements

This pipeline was written purely to demonstrate the possibilities so there is a lot of room for improvement. Here are some things to keep in mind when writing this for yourself.

  • If earlier version of the code is deployed the last known good configuration will be downgraded to this version too, this might be expected behavior but you do need to keep this in mind.

  • You probably want to make this pipeline into a template (or multiple ones) which can used in your actual deployment pipelines.

  • You probably want to deploy the resources in an absolute way instead of the incremental way. Now if new resources are added and something fails these new resources will persist.

  • A failsafe should be included incase the lastknowngoodconfiguration also fails, because now it will just infinitely loop and keep creating new pipelines.

  • It would be usefull to add something to the name of the pipeline if a rollback is triggered so this is instantly clear in the pipeline run overview.

Conclusion

With this technique you can create an automatic rollback scenario without to much extra work. Do keep in mind this could still break your environment so it would be wise to add some approvals somewhere or extra tests. But with this as a basis I wish you good luck on implementing these rollbacks!