Company overview

US-based FinTech provider offers custom financial indexes for investment and insurance sectors, enabling direct indexing strategies with data from CME Group, Bloomberg, and Reuters. The firm employs PhD-level experts and hedge fund leaders to develop high-speed algorithms that align with trader needs and market demands, where processing seconds determine billions in trading value. Through secure, scalable pipelines, it ensures data quality for asset management and end-customer strategies.

Data engineering DevOps Project management QA/QC System redesign
USA

Business context


Our expertise in data engineering appealed to the client. The Geniusee team tailored a custom data-collecting pipeline that helps create financial indexes, empowering end customers’ investment strategy. We also ensured data quality and preparation for further analysis. We created a custom solution for custom financial index investing to convert and extract data from subscriptions provided by financial institutions.

Challenges


Assure incredibly fast data processing speeds

Build a secure and trustworthy data transfer process

Ensure system stability and periodical platform updates

Solutions we implemented


Our client required specialized services in the data engineering field to collect and ensure the quality of data from CME*, Bloomberg, Reuters, and other sources. We also had to ensure fast processing, which took up to a few seconds, as speed is crucial in the trading world. As a team, we developed a secure and stable service for a direct indexing strategy, collaborating with third parties and data vendors, which has earned the trust of our clients.

*CME Group stands for Chicago Mercantile Exchange, Chicago Board of Trade, New York Mercantile Exchange, and The Commodity Exchange.

PoC

We decided to go with a proof of concept (principle) to examine the feasibility and relevance of the pipeline. This is also needed to verify whether the product vision of both the vendor and the client matches. Primarily, our attention on PoC development was devoted to testing the typical use cases of data science and engineering projects.

R&D

Our client required storing data in a specific format. We created a custom data processor to provide instant modification of information after transfer. Libraries appeared not to be a good fit. It took approximately one minute to complete a full cycle of data processing, from collection to index creation. It was not rational to take that much time, so we devised solutions that could be completed in just 8 seconds.

Batch processing

We processed data in varying batches. In small batches, such as a couple of samples, it is referred to as mini-batching; however, it can also involve days’ worth of data. By contrast, stream processing consists of handling data as it arrives on a sample-by-sample basis. Backlogs are not built up periodically by the system. Processes are instead carried out in real-time.

System design

At this stage, the Geniusee team defined the elements of a system to build: architecture, infrastructure, components, and data. All aspects of a system are based on the requirements specified by our client. We created a workflow to process data as it was received from CME Group databases, and then stored, processed, and analyzed it. We chose Google Cloud to ensure security, as most data was transmitted through Google Pub/Sub.

Pipeline development

Our team developed the pipeline to process data and create custom indexes for our client. The most important part here was to create a trustworthy backup to ensure a working recovery plan in case of shutdowns or system failures. We used Python, as it is a language that fits perfectly for data engineering projects due to its simple syntax. It makes it relatively easy to develop a prototype quickly, since speed was crucial for our client.

Quality assurance

Data quality was our focus during this stage. It was vital to check before data ingestion. We based it on robotic comparisons of information from different sources during specific periods. To process historical data, we used batch processing. Eventually, we came up with automated checks on a daily basis. The Geniusee team also created unit tests for the delivered pipeline to ensure correct operation, smooth workflow, and uninterrupted functionality.

Infrastructure setup

At this stage, our team was working on the DevOps of the pipeline. We described its architecture and the process of creating an architecture diagram. We decided to rely on Infrastructure as a Code (IaC) by using machine-readable definition files instead of hardware configuration or interactive configuration tools. Infrastructure as Code enables computer data centers to be configured and managed efficiently.

Features


High speed of data processing

We developed a solution that is 7.5 times more effective than existing services available on the market. This allows to create custom indexes based on individual stocks and huge funds that are suitable for asset management, traders’ financial goals, and cost effective.

Daily-based data quality check

We ensure data quality with automated tests based on batch processing and historical data comparisons from external vendors. We compared data from different sources to ensure that the data was correct.

System scalability

To ensure scalability, we based our platform on cloud services, ensuring their stability with custom-tailored backups. The system has copies in two regions to support Google Maintenance and provide 24/7 availability.

Solution flexibility

Trading is a highly challenging and rapidly changing industry. To guarantee our system runs along with it, we ensure its flexibility based on module structure of custom indexing platform’s pipeline.

Security measures

We use in-built solutions from cloud providers together with VPN services to ensure that our system is protected and leaks are prevented.

Results


Business-aligned datasets

Custom algorithms that transform data into trade indexes

Delivered database pipeline architecture

Created new data validation methods and data analysis tools