What does the Innovation indicator represent, and how is it calculated?
The short version:
Our Innovation indicator uses a proprietary Machine Learning model to estimate how innovative every company in our database is based on their websites, where available. In the absence of any better method, we use R&D intensity as a proxy for innovation.
The model is trained on 980 companies with known R&D intensities (R&D £ expenditure per employee) and applied to 1.6 million companies in the UK to estimate whether they are innovative or not. A 3-star rating system is used to indicate our confidence in the estimation.
The long version:
Defining and measuring "innovation" is difficult.
Data which indicates whether a company is innovative or not does not exist for all UK companies, and predicting unknown innovation is also tricky.
The Central Bureau of Statistics of the Netherlands (CBS) have have shown that the website text of a company can accurately predict its innovativeness score. But obtaining the training data to replicate this in the UK is not straightforward. Where proxy data which could be used to estimate unknown company innovation does exist, it is either private (see the ONS UK Innovation Survey), or difficult to capture.
R&D spending, a proxy for innovation, is not required in annually reported accounts, and is rarely voluntarily reported. Where it is reported, it is regularly marked improperly, rendering the field in the machine-readable XBRL format accounts unreadable.
After parsing 1.5TB of machine-readable accounts in XBRL format and experimenting with OCR at scale, we have managed to capture R&D spending data for 980 UK registered companies. These companies operate across all regions of the UK and all industrial sectors, and cover a wide range of R&D spending and business sizes. From this we have calculated R&D intensity (R&D spending £ per employee).
Combined with company website text, this provides a solid source of training data from which we have developed a Machine Learning method to estimate if a company is significantly more likely to be highly innovative, based only on the content of its website. A measure of 0-3 stars is then applied based on our confidence in the innovation likelihood predicted (0 stars - not innovative; 1 star - innovative, low confidence; 2 star - innovative, medium confidence; 3 stars - innovative, high confidence).
Though we do not recommend users use the innovation indicators to build lists, this star rating system allows users to filter for innovative companies within the entire company database, or within specific lists/RTICs.
Filtering companies by innovation score is included in both our EXPLORE and ANALYSE platforms.
The Innovation Score appears as a number and not a category. What is this?
If you're using the API, you will see the raw innovation score. To convert the raw score into the confidence rating mentioned above, use the following logic:
3 star = score > 3
2 star = 1.5 < score < 3
1 star = 0 < score < 1.5