What are RSICs and how do we ensure data quality?

RTICs are great for the emerging economy. For the foundational economy, where there is more likely to be an appropriate SIC, an issue remains. A company can choose the wrong SIC code, or the SIC code they've selected does not match their activities.

We fixed this issue. Real-Time Standard Industrial Classifications (RSICs) use machine learning and a company's website text to better classify company's activities.

For example, Veolia is a large waste management and recycling company. Because they are large, they report activities of head offices. A better description of their activities is provided through RSICs (and RTICs): collection of non-hazardous waste and treatment and disposal of non-hazardous waste.

RSICs follow the same structure as SICs.

A reminder, our RSICs:

Fill gaps where SIC codes are missing
Correct inaccuracies in existing SIC codes
Add granularity where SIC codes are vague

You can read more about RSICs here.

Data Quality

To ensure quality we focus on methodological integrity:

Trust in the methodology

Primarily, we’ve built trust in our RSICs within the RSIC methodology itself. We do this through three distinct layers:

Evidence, not prediction: We treat classification as an evidence problem, not a prediction problem. RSICs are not arbitrary predictions. Instead, we evaluate the empirical likelihood of a classification based on the company’s website text, and what we uniquely understand about companies in each sector. If the data doesn't support the code, we don't assign it.
Coherence Filtering: This validation layer which rejects codes that lack alignment with the company's specific niche. This allows us to distinguish between a company mentioning a topic and actually doing it. We identify this distinction, and we classify appropriately.
Specificity: We also penalise generic classifications. Broad, "catch-all" codes are rarely useful for decision-making, so we deprioritise them in favour of precise definitions.

Trust in transparency

Unlike black box AI models where the logic is hidden, our RSIC system is built on transparency. Every classification is traceable back to the specific evidence that supports it. The framework is auditable, and we remain in control.

Quality in everything

Quality RSICs rely on quality inputs. By prioritising quality in everything, beginning with high-fidelity website matching and cutting-edge website text analysis, we build trust at every step of the pipeline.

What are RSICs and how do we ensure data quality?

Companies can misreport their activities, by selecting the wrong SIC code. We have solved this issue.

Data Quality