Lessons learned from migrating a 10+ year old on-prem monolithic SaaS product to the cloud

Synopsis:Lift + Shift + Re-engineering” a successful SaaS product from on-prem to cloud can never be easy. And the challenge becomes even more complex when the migration is going to be turn-key rather than incremental. How do you plan & execute such a large-scale cloud migration? What are the learnings?

Background: This migration was done as part of cloud-modernizing a custom-built CMMS (Computerized Maintenance Management System) platform. This is a SaaS-based platform providing facilities management and procurement solutions across sectors namely retail, financial services, logistics, and healthcare. This platform comprises an ecosystem of 20+ applications, all interconnected catering to facility management needs. This is a high-volume platform serving 3 million requests per day and the entire migration was turn-key i.e., all the 20+ applications needed to be migrated at once.

 Picture1-Mar-24-2023-09-32-07-2207-PM

Fig. 1. SaaS ecosystem prior to the cloud migration

 

Why the need to migrate from on-premise to the cloud?

Below were the main reasons to migrate from on-prem.

 

  • Onboarding a new customer needed a minimum of 4 to 6 weeks and with such long onboarding time, it was becoming increasingly difficult to cater to new business needs.
  • Inability to dynamically scale the compute to handle seasonal spikes.
  • The on-prem infrastructure was provisioned for the peak load resulting in high costs.
  • Other pain points
    • Non-Prod ecosystem (Dev, QA, UAT & Regression) didn’t mirror the Prod ecosystem resulting in surprises during release.
    • Rudimentary DevOps pipeline/infrastructure resulting in dependence on “Release Management” team to promote the code even to the non-prod ecosystems.
    • New version releases were like an event with the entire engineering team joining the bridge and validating their deliverables on release night.
    • Lack of support for ultra-high availability.

Approach

When we started the cloud migration journey, we wanted to do “Lift + Shift” since it was a 10+ year old product ecosystem. But post the initial analysis, we realized that there was an opportunity to do cloud-specific reengineering with minimal investments that would provide the client with better ROI. We identified two major themes/sub-projects (given below) that needed to be addressed as part of the reengineering work.

  • Cloud Reengineering & Migration
  • DevSecOps Rollout

 

Cloud Reengineering & Migration – Key Tasks

  • System study, Design & Proof of Concept (PoC)
    • Prepare adetailed blueprint on the current product ecosystem
    • Identify components that are to be re-engineered for the cloud and that require PoC
    • Execute PoC and validate results
    • Certify components that are ready for Reengineering post the successful PoC
    • Prepare cloud architectural blueprint
    • Review the architectural blueprint with Architecture Council & get their approval
  • Refine Cloud Architecture (Cost Optimization)
    • Prepare Bill of Materials for the approved target cloud infrastructure
    • Prepare ROI (Compare anticipated cloud costs vs existing infrastructure cost, compute ROI)
    • Refine architectural choices to contain cost & update the architectural blueprint
  • Setup Cloud Non-Prod Environment
    • Build CI/CD Infrastructure (Depends on DevSecOps rollout)
    • Script Non-Prod environment setup (Infrastructure as a Code)
    • Script application deployment scripts
    • Setup Non-Prod infrastructure using automated scripts
  • Prepare code base for Cloud
    • Reengineer the code to utilize Cloud PaaS offerings
    • Validate the application (End to end testing, Performance campaign)
    • Certify that the code is ready for Cloud
  • Deploy code in Non-Prod Environment
    • Prepare Build Scripts
    • Execute Build Scripts
    • Automate deployment using CI/CD Infrastructure
    • Execute smoke tests
    • Ideate Build Propagation Concept
    • Evangelize, get buy-in for Build Propagation
    • Validate Build Propagation
    • Propagate builds from Dev to QA to UAT to Regression
    • Validate results
    • Certify Build Propagation
  • Setup Cloud Prod Environment
    • Setup Prod
    • Execute Mock Production Cutover
      • DB backups & Restoration
      • Estimate & validate the time for restoring differentials
    • Prepare Production Cutover Sequence
    • Review & Obtain approval for Production Cutover Sequence
    • Prepare Rollback Strategy
    • Review & Obtain approval on Rollback strategy
    • Dry run Rollback plan
  • Go-Live
    • Production Cut over
    • Post Production Rollout Support
      • 4-6 weeks to stabilize

 

DevSecOps Rollout – Key Tasks

  • Design & PoC
    • Prepare detailed DevSecOps target state
    • Review the DevSecOps with key Stakeholders (Dev Leads, architects, Release Management Team) and obtain their buy-in
    • Identify DevSecOps steps, and tools
    • POC DevSecOps toolchain
    • Evangelize DevSecOps tools & obtain feedback
  • Execute
    • Build DevSecOps pipeline
    • Deploy DevSecOps pipeline in Non-Prod
    • Optimize DevSecOps pipeline based on stakeholder feedback
    • Refine DevSecOps pipeline
    • Enable pipeline for Prod deployment

 

Scope of Work

          Our scope of work included the following:

  • Project Management & Governance
  • Deliver Architectural Blueprint
  • Bill of Materials creation
  • Building Proof of Concepts
  • Preparing Migration & Rollback Strategy
  • Application Reengineering
  • Non-production environment setup & migration
  • Running performance campaign & optimizing the code, infrastructure
  • Change management support
  • Production Application Setup
  • Post go-live Support


    New call-to-action

Current State

Salient features of the cloud re-engineered architecture (given below) are:

  • The code base(s) are deployed onto a VM Image that is spawned into VM scale sets. VMs spawned within the scale set are managed by the Azure Load Balancer.
  • Azure front door is wired with Azure Load Balancer that keeps check on various Web Access Firewall-related policies that shields the SaaS eco system from various external attacks.
  • Azure Keyvault is used to store and retrieve secrets for all the web and windows-based applications of SaaS ecosystem.
  • A dedicated VM hosts various windows services and is triggered based on CRON based scheduler to perform various asynchronous activities
  • Files are stored in Azure Blob Storage that are treated as attachments within the SaaS eco system.
  • Automated Build and deployment pipelines are set up in combination using Azure Dev Ops and Jenkins server to build the infrastructure of the entire SaaS ecosystem and deploy the code binaries into various environments such as QA, Staging, Regression and Production

    Picture2-Oct-17-2022-08-16-00-87-PM

Fig. 2. SaaS Ecosystem – Deployed Cloud Architecture

(Key Services: VM Scalesets, Azure Dev Ops, Azure Front Door, Azure Load Balancer, Azure Key Vault, Blob Storage & SendGrid)

 

Learnings

  • Process Related
    • Program Management: Ensure stakeholders are involved right from the get-go else there will be multiple false starts & unnecessary delays.
      • Stakeholder Identification & Buy-in: Prepare a list of all the stakeholders (internal, external) and actively engage them during the entire project life cycle (Project kick-off till closure)
      • Prepare a comprehensive program plan with clear owners
        • Obtain buy-in from the owners on their tasks
      • Over-communicate (Design decisions, project delays, risks, issues etc.,)
      • Celebrate successes
      • Weekly review meeting with key stakeholders
        • Publish overall progress with RAG status, minutes on a shared location (Confluence, Sharepoint etc.,)
      • Engage End Users/Customers during UAT
        • Prepare the list of end users/customers in advance who need to be actively involved during UAT
        • Utilize the end users to test any client-specific scenarios which are outside of your control. Ex. 3rd party SSO integration, API integration touch points etc.,
      • Prepare a robust Post Go-Live support plan
        • Assemble dedicated team of experts (Architect, Devs, QA’s, DevOps & Release Engineers) to handle post go-live issues
        • Plan at least for 4-6 weeks of dedicated support post go-live
      • Technical
        • IP Whitelisting: Prepare the list of new set of IPs and share it with both internal and external stakeholders to whitelist them. Best practice would be to utilize DNS wherever possible
        • Email Issues/SendGrid: Update DMARC rules/policies (if any) to handle email failures
        • Environment right-sizing: It is OK to oversize the initial cloud infrastructure similar to the on-prem compute. Iteratively right-size the Cloud infrastructure post go-live based on the need
        • Rollback strategy: Have a robust rollback strategy for application, data/database & file storage. Dry run the rollback strategy.
          • Ensure the rollback plan can be executed couple of days post the go-live not just on the prod go-live cut-off date
          • Ensure the database is backed up and the differentials post the initial back-ups are ready in case you need to roll back
        • If possible, stay clear from Hybrid setup: Avoid hybrid setup as much as possible which involves massive data transfers between on-prem and cloud infrastructure. The data transfer speed will vary during the day and don’t plan transfer time based on the peak transfer rates. Also be ready to allocate higher compute (CPU, Memory, IO) if you need to facilitate faster data transfer rates (this will put a dent on the planned cloud cost).

 

About Encora

Fast-growing tech companies partner with Encora to outsource product development and drive growth. Contact us to learn more about our software engineering capabilities.

 

Share this post

Table of Contents