The Data City has identified companies with accounts that are likely misrepresented. We will answer: Why have we added this feature? What are some examples of anomalies in accounts? What is the basis for the predicted anomalies?
Why have we added this feature?
Our data covers all active companies registered at Companies House. As well as a breadth of data, we have depth of data. We have the ability to drill down to company level financials.
To offer this depth of data, Companies House data uses the submission of financial accounts by each company. This is a mandated process.
However, self-declared (and especially unaudited) company accounts can contain mistakes. Companies House are not responsible for verifying company accounts:
"We carry out basic checks on documents received to make sure that they have been fully completed and signed, but we do not have the statutory power or capability to verify the accuracy of the information that companies send to us."
Outliers affect less than 0.05% of our companies but their impact, by nature, can be large. Identifying possible outliers will allow our subscribers to review and remove the very small number of companies that have bad data.
What are some examples of anomalies in accounts?
Below is a non-exhaustive list of anomalies that can occur in financial accounts.
Companies can report another financial variable as their employees.
When a company is filling in their accounts, in some instances, they will copy values from another field.
For example, GILLARDS FARMS LIMITED in their 2021 accounts report assets as their number of employees. You can see this in the two images below:
Companies can report the year as their number of employees.
For example, ABBOTT & ABBOTT LIMITED revised their 2016 employees as '2016', in their 2017 accounts.
Companies can report their wage costs as the number of employees
For example, AR CARS CLUB LIMITED in their 2017 accounts report director renumeration as the number of employees.
In a small number of cases, anomalies are introduced by parsing of company accounts. The number of examples of this is very low, with a general accuracy of over 99.9%.
The model has been trained to identify these outliers too. In addition, we are working with our data provider to improve the parsing process, which will reduce the frequency of the outliers further in the future.
The examples mentioned above are where financial accounts are misleading. In training the model to identify unusual financials, we have also identified companies that correctly have unusual financials.
In particular, the model will also identify companies that can have high levels of employments with low levels of resources. Specific examples of these include recruitment or healthcare agencies.
In these companies, employees are added to a company's books, but it is another company that is funding salaries through their sales. The model will identify companies likely using signficant temporary or part-time employment, if it appears the companies financials are not sufficient to support that level of employment.
Removing agencies, or companies with temporary workers, will be beneficial for any analysis using Gross Value Added or turnover. Our calculations of GVA rely explicitly on the number of employees referring to the number of full-time employees. To estimate turnover, we rely implicitly on the assumption that each employee is a full-time employee.
What is the basis for predicted anomalies?
We trained a model to predict whether a company has anomalous financials. To do this, we started with known examples of anomalies. We used the model to predict anomalies, validating the predictions, and incorporating these into a training set. We completed this iteration over 20 times.
The validation process was manual. For thousands of companies, this involved inspecting their accounts on Companies House and understanding where the reported values had come from.
We now have a training set of over 10,000 companies and we will continue to review predictions, incorporating new kinds of outliers, if and when we find them. If you are aware of outliers that we are missing, please get in touch.