Data-driven transformation is a key differentiating process for many industries that mediate large amounts of data.
It can create endless optimisation opportunities to reduce costs. It increases the customer's experience through personalisation. It automates the repetitive tasks and those processing large amounts of data. More than that, when coupled with humans, machine learning leads to better results.
Machine Learning is a rather multi-disciplinary field standing at the confluence of business and technology.
We deliberately choose to be business-driven and technology-supported on our quest to generate innovative and successful solutions for our clients.
We pay special attention to this stage, as the business function of a project determines the main goals and outcomes of the design and implementation process.
More specifically, the business stage consists of:
identifying key business targets and variables; which data may be relevant for your project's goals;
identifying what type of data do we need and how we can acquire it;
determining the success metrics;
At the end of this process, we will create some charters and reports which will highlight the scope of the project, the project's plan, the success criteria, and the main communication mechanism.
We will also provide a data dictionary that will help you obtain a detailed overview of the data features and target variables that outline your business concepts.
You know the saying: garbage in, garbage out! No matter how capable a machine learning pipeline is, the quality of the data which goes into it will be the most crucial component.
For this reason, we set aside time to identify and connect critical data elements in the most efficient way for your machine learning solution. Depending on the data, this might be the longest stage of the project, usually without any directly visible results.
A basic overview of the steps we usually go through are:
Explore basic data health measurements (missing data, inconsistent data, noisy data);
Perform basic data descriptive statistics;
Execute data cleansing and transformation (data binning, data imputation, data discretisation, etc.);
Determine which are the relevant data correlations in order to identify which data can help us achieve our desired goals;
Create and automated mechanism to refresh and load the data;
Verify that the data is automatically cleansed and transformed;
Confirm automated data quality verification mechanism;
This stage will lead to a data quality summary report that will include the current value of data in terms of relevance, accuracy, accessibility, compatibility and coherence.
Depending on the conclusions of the data analysis, this phase will provide the OK/NOT OK sign to proceed to the modelling stage.
Capable, robust, and light-weight: these are the attributes of a good machine learning algorithm. Our purpose is to find how the data is best used in order to model and fulfill your objectives.
The key challenge is avoiding models that overfit the data and, consequently, create a false perception of reliable predictability.
The concrete activities that take place in this stage are:
Feature modelling: implies feature engineering, to increase the predictive power of the model, and feature selection, to reduce dimensionality;
Model creation: build and/or adapts state-of-the-art machine learning architectures, creates training and testing datasets, trains models, cross-validates and scores models, changes model algorithms or parameters as needed, and fine-tunes the final model;
The resulting model(s) will be delivered together with some model reports describing the algorithms, training process, validation statistics, and more.
The model(s) are then extensively tested to check if they are achieving the required accuracy and speed to handle real-time conditions. Based on this report, we will decide wether to GO/NOT GO to the deployment stage.
We're not big fans of over-engineering or downplaying the importance of data. We stand by these principles in our process of making the model operational for your business.
We assure good model operationalisation by developing data pipelines and connecting them to your machine learning models. Our key activities include:
Training the production version of the model on the most complex data set possible;
Deploying the support data pipelines;
Deploying API layers and support batch integration;
Deploying dashboards for monitoring the model's behavior;
And if all of these seem a bit too complicated, we also offer short-term support. We go over the architecture documentation and model specifications to make sure that the development team will be up and running in no time.
Moreover, we provide instructions and train the developers to perform ad-hoc incidents and service management activities whenever applicable.
Those who do not learn from history are doomed to repeat it. With that in mind, we use our expertise to take your project to the next level.
This stage refers to the continual improvement of our services by engaging the data scientist(s) involved in the project's implementation in order to identify new data trends and opportunities.
Our main activities include reviewing your current product, identifying and evaluating key performance indicators. as well as identifying, implementing, and monitoring improvement opportunities.
We certify our work through the periodically produced service review reports which capture key information regarding our support services. They contain a summary of relevant events in the review cycle and a summary of the identified environment and expansion opportunities.
The progress can be constantly checked by consulting the improvements backlog, which will be reviewed and/or prioritised periodically. The backlog will contain details regarding the types of improvement opportunities and the implementation decisions of the improvements. They will also serve as a baseline for the next development phase.