From free text to clusters of content in health records: an unsupervised graph partitioning approach

Table 1 Benchmarking of text corpora used for Doc2Vec training

A Doc2Vec model was trained on three corpora of NRLS records of different sizes and a corpus of Wikipedia articles using a variety of hyper-parameters. The scores represent the quality of the vectors inferred using the corresponding model, i.e., the number of correct assignments out of 1500. Boldface identifies the best computational result