Find out more about ML list building and how to know when your list is complete.
To know if your Machine Learning (ML) list is done, you will need to answer the following two questions:
1. Is my ML list representative of the sector?
This refers to the content of the list itself and requires checking the type of companies captured.
It requires making sure that you do not have false positives on the list (companies outside of the sector of interest) and that the size and value of the sector are cohesive with previous research.
A) Checking for false positives:
Keyword searches are a quick and easy way to do this. Add all the keywords you have used to the 'Does not contain' keyword filter as in the image below.
The search will output all companies that do not contain the relevant language. If you get large numbers, this means that you still need to continue training the algorithm to get rid of those
B) Desk research and using ANALYSE:
You can use the ANALYSE function to verify the results. Hit 'Analyse Results' and the platform will give you an insight into how the companies perform as a group.
Compare your results with existing information: is your sector much larger or smaller in terms of number of companies, employees, or turnover? Are you including big companies whose main activity is not representative of the sector?
Remember that, if you are mapping an emergent sector, you may find more companies on our platform. That is OK and the purpose of our technology. This is more of a sense check that reaffirms that your results are cohesive with economic trends.
C) Random sampling:
Go to different pages of your list randomly and check the type of companies captured.
2. Have I missed part of the sector while building my list?
Check for false negatives:
Companies part of the sector of interest that have been left out of the classification. There are two easy checks that you can do to find it out.
A) Change the score cutoff value:
As a default, the list will show all companies with a score above 0 and exclude the rest. Change the score value as in the video below to -0.5/-1.
Check the companies in that range:
If you see relevant companies with a negative score that are relevant to the sector, you have trained the algorithm too strictly.
Add some of those to your positive training set and repeat until you find a very few of them (which could be added manually to the list at the end of the process).
B) Check keywords in EXPLORE
Make some keyword searches on EXPLORE with your keywords and check if any companies are not on your list. You can check if a company is on your list by using the search box at the top right corner.
Building an ML list: You can find out more about our Machine Learning List Builder and how to use it effectively in our Building an ML List step-by-step guide.