Skip to main content

Table 1 Hypergraph-of-entity nodes and hyperedges for the base model and the extensions

From: Characterizing the hypergraph-of-entity and the structural impact of its extensions

Type Description Observation
term Represents a single word from the original document In this work, the preprocessing pipeline includes: sentence segmentation; lower case filtering; replacement of URL, time, money and number expressions with a common placeholder, each; stemming via porter stemmer
entity Represents an entity from the list of extracted entities and/or provided triples For the INEX collection, each mention to an entity is modeled through this type of node (we consider disambiguation to be a part of the ranking)
Hyperedges (base model)
document Represents a document through the set of all its terms and entities Undirected hyperedge
related_to Represents a semantic relation between multiple entities Undirected hyperedge. In this implementation, the relation is derived from all triples in the collection, by grouping by subject
contained_in Represents a relation between a set of terms and an entity. Directed hyperedge. In this implementation, this relation exists between terms that are a part of an entity name or mention and the corresponding entity node
Hyperedges (extensions)
synonym Represents a relation of synonymy between a set of terms Undirected hyperedge. Present in the Synonyms model. The first synset from WordNet 3.0 is obtained for each noun term, missing terms are added to the model and the hyperedge is created
context Represents a relation of contextual similarity between a set of terms Undirected hyperedge. Present in the Contextual similarity model. This is computed based on the top similar terms according to word2vec embeddings
tf_bin Represents a sets of terms within the same term frequency interval, for a given document Undirected hyperedge. Present in the TF-bins model. The number of TF-bins per document is a parameter that can be set during indexing