Skip to main content

Table 2 Performance of different set of features on predicting the crime per capita for 2017

From: Dynamics of crime activities in the network of city community areas

Historical data depthCoefficientsP-valueCoefficientsP-valuesR-squared metric
1F11=−0.00170.6358F51=1.1786<10−550.93
 F21=−0.00160.6087   
 F31=−0.00370.5905   
 F41=−0.08880.0631   
 F51=1.2198<10−47   
2F41=0.76610.0002F41=0.8484<10−50.96
 F42=−0.8118<10−5F42=−0.8955<10−6 
 F51=1.2432<10−20F51=1.1181<10−89 
 F52=−0.12340.2738   
3F41=1.1177<10−9   
 F42=−1.0378<10−6   
 F43=−0.07020.6889   
 F51=1.1967<10−28   
 F52=−0.24430.0334   
 F53=0.09530.2526   
  1. When deciding which feature to use for prediction with the current length of the historical data, we disregard features, printed in italics, that do not pass the test of null hypothesis that requires P-value less than or equal to 0.05 to avoid over-fitting. Therefore the feature sets containing such features are not assigned R-squared value. Before we increase the length of historical data, we retain features that pass less stringent test of P-value less than 0.1 to avoid losing the valid feature because of autocorrelation. Those features are printed in bold font. We stop when increasing the length of historical data does not improve R-squared metric for the model. To assess influence of each of the three features for the optimal historical data length of 2 years, we compute the normalized values of their coefficients that are as follows: F41=0.0149,F42=−0.0139 and F51=0.0327, both F4 features have influence of about 40% of the F5 feature influence, but the former have opposite sign to each other which weakens their influence significantly