Big Data Platform for Market Risk Management and Business Scenarios Processing for Investment Bank’s Portfolio Trading
As an investment bank engaged in investment portfolio analysis, the company processes billions of deals comprising billions of dollars daily. One of the central issues for the organization is dealing with Big Data analytics of investment portfolio scenarios. During previous years, the client had three unsuccessful attempts to build a system that would enhance market risk management in portfolio analysis activities.
SoftServe had conducted successful projects with the client before and was chosen to build a new iteration of this solution.
The client has more than 500 professional traders who each manages from one to a few hundred portfolios. More than five million open trades are active in the bank, including shares and derivatives (futures, options, etc.)
Each day, the client’s Market Risk Department analysts conduct the following activities:
- Analyze market risks
- Run simulations on stress testing according to uncovered risks (~20K scenarios)
- Map these scenarios on investment portfolios
- Process/analyze billions of variants resulting from the mapping
- Define the probability of negative scenarios and detect risky portfolios
- Develop recommendations on potential actions for portfolio strategy enhancement and risk mitigation
The teams involved in these activities are:
The Portfolio Calculators (PC) – generate scenarios of potential portfolio behavior with respect to obtained information on market risks.
The Portfolio Analyzers (PA) – analyze portfolio data inputs (generated variants) in order to support financial trader with potential decision and actions to align with optimal portfolio strategy.
The teams are separated by design to decrease any subjective mutual impacts. They have to process huge volumes of data as billions of variants are generated by applying 20,000 scenarios to five million active trades. The high-performance solution to support this process has to be scalable, flexible, stable, and secure.
SoftServe developed a solution that covers the whole flow of Big Data analysis activities conducted by the Market Risk Department integrated into one Big Data platform system.
The system includes two main modules (subproducts):
Risk Finder Data Services (RFDS) - for business scenario processing is the first part of the solution. The platform collects large volumes of streaming data from the Portfolio Calculators and other departments within the bank. External predefined data sources may also be included into the processing flow and contribute data into the data storage. The RFDS hid es big data complexity behind clear business oriented functions which other teams use, so any subjective influences on data are excluded.
Streaming Framework is the second part of the developed solution. It retrieves data from the Kafka data messaging solution according to the Portfolio Analyzer specification, processes data (maps to scenarios, defines risks and deviations, etc.), and provides transformed and enriched data to the RFDS. The Streaming Framework not only retrieves data, but can enrich the data input from the Portfolio Calculator with the data provided by other departments to the data lake on a custom request of the Portfolio Analyzer. A huge value for the Portfolio Analyzer is that the system is able to process not only data provided by the Portfolio Calculator, but all types of data (both streaming and batch data) into its storage by all predefined sources, including external ones. The Portfolio Analyzer gets access to comprehensive, up-to-date and accurate data inputs for market risk analysis.
Interaction with the full system is as follows:
- The Portfolio Calculator, on daily basis, generates streaming data on calculated portfolio scenarios and sends it to data storage (data lake)
- The Streaming Framework retrieves and maps this data to trades, defines risks and deviations, and defines future potential variants for portfolios
- When the data is processed The Streaming Framework stores it in the RFDS and sends a notification to the Portfolio Calculator that the activity is completed successfully
- The Portfolio Calculator team sends a request through API for report generation (data provided in Parquet file, but may be also provided in other format, such as CSV on-demand from the Portfolio Analyzer team)
- The Portfolio Analyzer team retrieves this data report from RFDS, processes it in a standalone analyzing system (developed separately), and provides data in a format applicable for a trader to process it for decision making.
The integrated solution was developed in tight timeframes with a team of only a few dedicated tech experts:
- Risk Finder Data Services (RFDS) – 1 year 8 months
- Streaming Framework – 1 year
The system of two interconnected products developed by SoftServe has outstanding performance and scalability.
The processing capacity of the system (number of trades and scenarios, required server and core quantity) is as follows:
Current capacity (solution already developed and tested):
- 5-9 million trades (depending on the business program)
- 16,000 risk scenarios
- 138 server nodes
- 2,760 cores on the cluster
- 37 TB memory on the cluster
- 2.6 PB of disk space on the cluster (260 TB currently used)
Testing capacity was used when developing the system – a cluster of the same size as on production is used for testing. During development the load which is 10-100% of real production load was used.
Assumed future capacity (maximum scalability, the maximum prognostic capacity in case all other conditions are positive – client has resources to invest in hardware, etc.) - the server utilization is currently about 30%, and the system capacity might be increased 3x with no hardware upgrade required. Further capacity increases are possible and requires additional nodes to be allocated. The scalability is nearly linear.
260 TB of data is currently held in the department involved in this project.
Streaming processing rates:
- 135 million rows/minute
- 4 2,000 trades/minute
- 120 GB/minute is landing to HDFS (uncompressed)
The system was developed at an environmental scale of 10 cores and 10 servers, and used 200,000 trades and 20,000 scenarios for testing the performance.
Now that the system is in production, five million trades and 20,000 scenarios are supported by 100 cores distributed to 100 servers. As the correlation in scalability is linear, to have the system processing 10 million trades would require double the investment into servers, no other constraints have been determined at this point.
The developed system is a one-of-a-kind solution to leverage Big Data and SoftServe’s technical competencies to enhance marketing risk management activities. The solution brought a new level of efficiency to the operational routine of market risk analysts and took market risk assessments to a new standard of accuracy.
Additionally, the system is not limited to the described use-case but can support many other business cases. The client, in partnership with SoftServe, is already developing a few additional business cases based on the Big Data Platform and Streaming Framework that appeal challenges with Big Data processing at the global enterprise level for the organization.
- Hadoop distributive: CDH
- Big Data storage: HDFS
- Metadata storage: Oracle
- Scheduling: Oozie
- Messaging: ProtoBuf over Solace JMS bus and Kafka
- Data Collection: Apache Flume
- Data Processing: Spark, Spark Streaming, Hive, Impala
- REST API Layer: Scala, Akka
- Middleware: Tomcat, Java
- Test Automation: TeamCity, Sonar Qube, Scala Test, Scala Style, Scoverage, Linter, Wart Remover