Here, dear friends, is a prototype of Virgil’s hypothetical historical cruise ship death forecasting interface. The output is based on a series of logistic regressions incorporating as many features as have been given to the widget thus far.
This widget, rough prototype that it is, possesses a number of immediately obvious issues and idiosyncrasies. Chief among these is that the percentages tend to fluctuate from .3-.5, regardless of how un/favorable the conditions may be. This is owing to the decision threshold for these logistic regression models being, in all cases, under .5, so that a score of .45 would be enough to be considered a “survive” classification. I’m not sure of the best way to go about getting actual probabilities (whatever that could mean in this case) from these models. The current plan is to take all of the model’s estimates for each item in the training set and generate a probability distribution based on them, using the threshold as the 50% point, the highest value as like 90% and the lowest as 10%, gauging the shape based on how the values are distributed, then rescaling results based on that. This might lead to sometimes having over 100% or under 0% chances of survival, but I’m sure there are ways to mitigate that.
Alternatively, using a Naive Bayes classifier would give probabilities directly without such hackish postprocessing, so that’s also an option. This requires further research, and further trials.
Source code for the project (wrapped up in a hapi-based server) can be found here.