ML Software Forecasting Unicorn Companies

Business Overview

The client is a venture capital investor that aims to generate statistically correct returns in early-stage deals by leveraging AI and deep learning. The goal was to develop machine and deep learning models based on input, financial data, and reports that could predict the emergence of a potential unicorn company.

Challenges

Untagged data

We had to determine how the data were correlated based on company descriptions, business areas, financial trends, etc.

Lack of success criteria

There were no criteria defining companies as promising or successful—we had to create a data-driven scoring system from scratch.

Solution

To understand metrics that matter in evaluating companies’ potential, we started by analyzing and scoring a large amount of publicly available information at Crunchbase—tens of thousands of companies’ descriptions and statistics, number of investment rounds, potential IPO holders, etc.

The goal was to identify markers suggesting the likelihood of investing in certain businesses and their possible prosperity in the future.

The NIX team iteratively applied self-supervised learning, where the model trains itself to learn one part of the input from another part of the input. Investment experts on the client side adjusted the process.

Company analysis and derived scoring of 0-1 provided us with multilayered statistics, labels, clusters, text, and numbers by highlighting company success potential. We used BERT and Spacy to transform text fields into embedding features for further text data processing.

This allowed us to find patterns of context and establish relationships between samples. After that, we could use all classification models and decomposition methods: DTC, SVM (SVC), CNN, RNN, LSTM, etc.

We ran the data through a neural network, and the pipeline combined textual dependencies and statistical characteristics, encompassing investors’ names, the number of investment rounds, etc. The pipeline included training, validation, and implementation of machine learning models in production, including but not limited to classification (predicting categories), regression (predicting values), and clustering (finding mathematical clusters based on unsupervised techniques).

The pipeline then converged into a finite ML model, able to forecast the likelihood of the company’s success.

Outcome

Betting on big data and deep analysis, we created an advanced tool generating superior, long-term investment decisions. The client received comprehensive diagnostic software looking at a multitude of data-driven criteria to enable the formulation of growth plans, strategic positioning, and competitiveness perspectives, as well as disruptive strategies.

The algorithm can provide a criteria-based diagnostic within minutes, assisting portfolio companies with strategic advice to enhance valuation and key performance business indicators.