Synopsis**:** “Lift + Shift + Re-engineering” a successful SaaS product from on-prem to cloud can never be easy. And the challenge becomes even more complex when the migration is going to be turn-key rather than incremental. How do you plan & execute such a large-scale cloud migration? What are the learnings?
Background**:** This migration was done as part of cloud-modernizing a custom-built CMMS (Computerized Maintenance Management System) platform. This is a SaaS-based platform providing facilities management and procurement solutions across sectors namely retail, financial services, logistics, and healthcare. This platform comprises an ecosystem of 20+ applications, all interconnected catering to facility management needs. This is a high-volume platform serving 3 million requests per day and the entire migration was turn-key i.e., all the 20+ applications needed to be migrated at once.

Fig. 1. SaaS ecosystem prior to the cloud migration
Below were the main reasons to migrate from on-prem.
Onboarding a new customer needed a minimum of 4 to 6 weeks and with such long onboarding time, it was becoming increasingly difficult to cater to new business needs.
Inability to dynamically scale the compute to handle seasonal spikes.
The on-prem infrastructure was provisioned for the peak load resulting in high costs.
Other pain points
Non-Prod ecosystem (Dev, QA, UAT & Regression) didn’t mirror the Prod ecosystem resulting in surprises during release.
Rudimentary DevOps pipeline/infrastructure resulting in dependence on “Release Management” team to promote the code even to the non-prod ecosystems.
New version releases were like an event with the entire engineering team joining the bridge and validating their deliverables on release night.
Lack of support for ultra-high availability.
When we started the cloud migration journey, we wanted to do “Lift + Shift” since it was a 10+ year old product ecosystem. But post the initial analysis, we realized that there was an opportunity to do cloud-specific reengineering with minimal investments that would provide the client with better ROI. We identified two major themes/sub-projects (given below) that needed to be addressed as part of the reengineering work.
Cloud Reengineering & Migration
DevSecOps Rollout
System study, Design & Proof of Concept (PoC)
Prepare adetailed blueprint on the current product ecosystem
Identify components that are to be re-engineered for the cloud and that require PoC
Execute PoC and validate results
Certify components that are ready for Reengineering post the successful PoC
Prepare cloud architectural blueprint
Review the architectural blueprint with Architecture Council & get their approval
Refine Cloud Architecture (Cost Optimization)
Prepare Bill of Materials for the approved target cloud infrastructure
Prepare ROI (Compare anticipated cloud costs vs existing infrastructure cost, compute ROI)
Refine architectural choices to contain cost & update the architectural blueprint
Setup Cloud Non-Prod Environment
Build CI/CD Infrastructure (Depends on DevSecOps rollout)
Script Non-Prod environment setup (Infrastructure as a Code)
Script application deployment scripts
Setup Non-Prod infrastructure using automated scripts
Prepare code base for Cloud
Reengineer the code to utilize Cloud PaaS offerings
Validate the application (End to end testing, Performance campaign)
Certify that the code is ready for Cloud
Deploy code in Non-Prod Environment
Prepare Build Scripts
Execute Build Scripts
Automate deployment using CI/CD Infrastructure
Execute smoke tests
Ideate Build Propagation Concept
Evangelize, get buy-in for Build Propagation
Validate Build Propagation
Propagate builds from Dev to QA to UAT to Regression
Validate results
Certify Build Propagation
Setup Cloud Prod Environment
Setup Prod
Execute Mock Production Cutover
DB backups & Restoration
Estimate & validate the time for restoring differentials
Prepare Production Cutover Sequence
Review & Obtain approval for Production Cutover Sequence
Prepare Rollback Strategy
Review & Obtain approval on Rollback strategy
Dry run Rollback plan
Go-Live
Production Cut over
Post Production Rollout Support
4-6 weeks to stabilize
Design & PoC
Prepare detailed DevSecOps target state
Review the DevSecOps with key Stakeholders (Dev Leads, architects, Release Management Team) and obtain their buy-in
Identify DevSecOps steps, and tools
POC DevSecOps toolchain
Evangelize DevSecOps tools & obtain feedback
Execute
Build DevSecOps pipeline
Deploy DevSecOps pipeline in Non-Prod
Optimize DevSecOps pipeline based on stakeholder feedback
Refine DevSecOps pipeline
Enable pipeline for Prod deployment
Our scope of work included the following:
Project Management & Governance
Deliver Architectural Blueprint
Bill of Materials creation
Building Proof of Concepts
Preparing Migration & Rollback Strategy
Application Reengineering
Non-production environment setup & migration
Running performance campaign & optimizing the code, infrastructure
Change management support
Production Application Setup
Post go-live Support
{{cta('6971eddf-99eb-4fc3-8cc5-45e9eb95a593')}}
Salient features of the cloud re-engineered architecture (given below) are:
The code base(s) are deployed onto a VM Image that is spawned into VM scale sets. VMs spawned within the scale set are managed by the Azure Load Balancer.
Azure front door is wired with Azure Load Balancer that keeps check on various Web Access Firewall-related policies that shields the SaaS eco system from various external attacks.
Azure Keyvault is used to store and retrieve secrets for all the web and windows-based applications of SaaS ecosystem.
A dedicated VM hosts various windows services and is triggered based on CRON based scheduler to perform various asynchronous activities
Files are stored in Azure Blob Storage that are treated as attachments within the SaaS eco system.
Automated Build and deployment pipelines are set up in combination using Azure Dev Ops and Jenkins server to build the infrastructure of the entire SaaS ecosystem and deploy the code binaries into various environments such as QA, Staging, Regression and Production

Fig. 2. SaaS Ecosystem – Deployed Cloud Architecture
_(_Key Services: VM Scalesets, Azure Dev Ops, Azure Front Door, Azure Load Balancer, Azure Key Vault, Blob Storage & SendGrid)
Process Related
Program Management: Ensure stakeholders are involved right from the get-go else there will be multiple false starts & unnecessary delays.
Stakeholder Identification & Buy-in: Prepare a list of all the stakeholders (internal, external) and actively engage them during the entire project life cycle (Project kick-off till closure)
Prepare a comprehensive program plan with clear owners
Obtain buy-in from the owners on their tasks
Over-communicate (Design decisions, project delays, risks, issues etc.,)
Celebrate successes
Weekly review meeting with key stakeholders
Publish overall progress with RAG status, minutes on a shared location (Confluence, Sharepoint etc.,)
Engage End Users/Customers during UAT
Prepare the list of end users/customers in advance who need to be actively involved during UAT
Utilize the end users to test any client-specific scenarios which are outside of your control. Ex. 3rd party SSO integration, API integration touch points etc.,
Prepare a robust Post Go-Live support plan
Assemble dedicated team of experts (Architect, Devs, QA’s, DevOps & Release Engineers) to handle post go-live issues
Plan at least for 4-6 weeks of dedicated support post go-live
Technical
IP Whitelisting: Prepare the list of new set of IPs and share it with both internal and external stakeholders to whitelist them. Best practice would be to utilize DNS wherever possible
Email Issues/SendGrid: Update DMARC rules/policies (if any) to handle email failures
Environment right-sizing: It is OK to oversize the initial cloud infrastructure similar to the on-prem compute. Iteratively right-size the Cloud infrastructure post go-live based on the need
Rollback strategy: Have a robust rollback strategy for application, data/database & file storage. Dry run the rollback strategy.
Ensure the rollback plan can be executed couple of days post the go-live not just on the prod go-live cut-off date
Ensure the database is backed up and the differentials post the initial back-ups are ready in case you need to roll back
If possible, stay clear from Hybrid setup: Avoid hybrid setup as much as possible which involves massive data transfers between on-prem and cloud infrastructure. The data transfer speed will vary during the day and don’t plan transfer time based on the peak transfer rates. Also be ready to allocate higher compute (CPU, Memory, IO) if you need to facilitate faster data transfer rates (this will put a dent on the planned cloud cost).
Fast-growing tech companies partner with Encora to outsource product development and drive growth. Contact us to learn more about our software engineering capabilities.