Informative outliers

Although, the predictive power of the algorithm way above chance, 60% rate is not too high. Figure 4 demonstrates distribution of nearly 20000 companies according to the algorithms confidence in the prediction. Notably most of the mass of the companies falls near zero – the algorithm is indecisive of whether it is a failing or successful company. However, our approach is still useful for mass filtering of a large number of companies, since a few highly pronounced failures are easily identified by it (as well as a number of highly successful companies). Note that the qualitative judgment is only relative to the confidence of the algorithm since we have not use any additional information beside the provided data. It may be worth looking at the companies that are assigned to failure with high confidence and see what makes them special using some additional information not available to the algorithm. This is in fact one of the intended modes of using the approach: quickly screen a large number of companies to bring attention to a few most likely candidates.

Figure 4
Figure 4: A summary of all company/year pairs that we have considered. Y-axis denotes the confidence of the algorithm in classifying each pair. Values near zero means that the algorithm is indecisive about if the company is failing or not. The further the value is from zero - the higher is the confidence. Note, majority of the companies are near zero - this explains the low predictive power of 60%. Nevertheless, a handful of the companies are failing or successful with high confidence. These are notable examples of their class. It may be worth for an expert looking into these companies in more detail to understand why.