Building an ML list

Our platform enables you to generate your own Machine Learning (ML) list, underpinned by our proprietary classification system. Find out how to build a list.

Table of contents

  1. Overview
  2. Getting started
  3. Training your list
  4. Finalising your list

Overview

ML lists allow you to build and classify company in a matter of minutes, using our AI tech. 

To build a machine learning list you will need to train the algorithm to capture companies you are interested in and exclude the rest.

This is an iterative process: you will build your lists in stages until you get the desired output. 

To start building your list you will need:

  • A set of companies that you know are good representatives of the industry

AND/OR

  • A set of keywords that define the activity of the companies in the industry. 

The algorithm will analyse the companies that you provide as training data and create a model to classify the rest of the companies available in our database.

This will assign each company with a score value based on how similar the company is to the ones that you provided as good examples of the sector. The output list will be all companies that have a score value above 0. 

ML lists are available to all Data City users and can be set up directly from MY LISTS in the platform.

Step 1 - Getting started

To get started, head to MY LISTS and click 'Create a New List'. 

Add the first example companies using one or both of the methods shown in the video. You can either insert company numbers or use the keyword search engine to find good examples. 

The output list ranks all companies against the ones you selected (positive training set). The score value indicates how similar the website text is to the companies in the training set.

Hit "CLASSIFIER TERMS" to understand the language that is leading the classification. You should be checking this at different stages to make sure it remains relevant.

Step 2 - Training your list

Start training your list to exclude companies that you don't want in the list by adding these to the negative training set by clicking the "x" button.

You can use the keyword filter or the score value to identify these companies - companies at the end of the list will have lower score values. 

Step 3 - Finalising your list

Repeat this process until you get a representative list. You can add more companies to your positive training set by clicking the "+" button. 

Important considerations:

  • The companies that you add to your training sets lead the classification, not the keywords that you use to filter the list. 
  • The keywords that you use to filter the list are helpful to find companies for the training sets. 
  • Once the algorithm has generated a list, you can manually add or remove companies. This is useful when you want to add a company without impacting the training sets or the classifier terms, or when a company does not have an available website. 

  • Check the big companies and manually exclude them if they are not relevant. Big multinational companies may use relevant language on their website, but might not be a good representative of the sector in question. This is important because they will skew the results for the overall performance of the sector in subsequent analysis if left in.