Centuries Old Bank Consolidates Data On Legacy Systems with Managed Capacity Service Model
Our client needed to build an ingestion pipeline to extract data from source systems and deliver this data to GPDW to allow the company to pursue its Data-as-a-Service approach within the group. The client’s end users would benefit from easy access to consolidated data with better quality and availability and this would help our client provide new financial services and improve existing ones. Big data solutions are a core focus in our client’s strategy and integrated data flows provide end-customers with high quality online financial services.
SoftServe implemented a managed capacity service model from scratch to roll-out a data processing and ingestion platform to migrate and consolidate data held on numerous legacy systems onto the client’s big data platform. Streams from heterogeneous data sources were integrated with a centralized Hadoop cluster. The number of legacy systems involved was around 1000.
All development was done within an established DevOps framework and followed an established build pipeline. Working within this DevOps framework, SoftServe utilized the following technologies:
- Source Control: GIT & Bitbucket
- Programming Language: Scala
- Build server: Bamboo using Maven
- Artefact storage: Nexus
- Deployment: Puppet
- Unit Tests: ScalaTest
Our team organized delivery around six operational processes:
Source Onboarding Process – this was the preparation of infrastructure configurations for the source systems. Due to security and access management requirements, each source system had to be located in a different data zone on the Hadoop platform with dedicated folders, databases and access groups on Active Directory. Our team had to raise multiple infrastructure request to Hadoop, Unix and Access Management teams and coordinate to verify their completion.
Development and QA Process – this involved preparing configurations for new data feeds and testing ingestions on a QA environment. All configurations are in source repository and follow a gitflow process through development. The QA process requires 100% test coverage with full traceability in Jira.
Production Release Process – using guidelines from the client for the change management process based on ITIL framework and tools, SoftServe leveraged a continuous delivery approach and automation tools on the test and production environments to release new configurations. Each release involved automated checks - one before the release to verify correctness of infrastructure and ingest configurations; and one after to verify the correctness of deployment.
Production Maintenance Process – investigating and resolving issues on production. After ingesting data to production our team regularly reviews ingestion status using Cloudera Manager tools and custom dashboards configured on a big data analytics platform. They investigate errors and issues and coordinate efforts across multiple teams to fix them.
Change Management and Reconciliation Process - our team ingests data into Hadoop in parallel with existing legacy data warehouse solution that will be decommissioned in the future. This platform is currently live on production and we analyze their releases and identify new data feeds or changes to existing data feeds that should also correspond with changes on our platform.
Continuous Improvement Process – our team performs biweekly retrospectives to improve the above processes. Each process has a designated process owner who is responsible for keeping the process documentation up-to-date and introducing new changes.
SoftServe implemented a KPI driven delivery, managing fully transparent agreed service levels.
SoftServe established a capacity managed process based on KPIs for ingestion that could be correlated based on data supply. This established an optimal process for efficient ingestion as it can process sources per period of time and delivery unit per period of time, supporting data quality metrics.
SoftServe met the goals of our client's IT strategy to migrate and consolidate data held on numerous legacy systems onto the firm’s big data platform, which uses the Cloudera distribution of Hadoop.
Specific business goals within this group focused around the following primary drivers:
- Economics - to optimize costs, our client’s data engineering team leveraged different locations within Eastern and Central Europe
- Competence – working with a partner with a high level of competence within the Hadoop environment with a proven success record in providing similar services to similar customers in the financial sector
- Scalability – accommodating for incoming demand to gradually increase volumes until a stable state was achieved