Characterizing the hypergraph-of-entity and the structural impact of its extensions

Devezas, José; Nunes, Sérgio

doi:10.1007/s41109-020-00320-z

Applied Network Science

Table 1 Hypergraph-of-entity nodes and hyperedges for the base model and the extensions

From: Characterizing the hypergraph-of-entity and the structural impact of its extensions

Type	Description	Observation
Nodes
term	Represents a single word from the original document	In this work, the preprocessing pipeline includes: sentence segmentation; lower case filtering; replacement of URL, time, money and number expressions with a common placeholder, each; stemming via porter stemmer
entity	Represents an entity from the list of extracted entities and/or provided triples	For the INEX collection, each mention to an entity is modeled through this type of node (we consider disambiguation to be a part of the ranking)
Hyperedges (base model)
document	Represents a document through the set of all its terms and entities	Undirected hyperedge
related_to	Represents a semantic relation between multiple entities	Undirected hyperedge. In this implementation, the relation is derived from all triples in the collection, by grouping by subject
contained_in	Represents a relation between a set of terms and an entity.	Directed hyperedge. In this implementation, this relation exists between terms that are a part of an entity name or mention and the corresponding entity node
Hyperedges (extensions)
synonym	Represents a relation of synonymy between a set of terms	Undirected hyperedge. Present in the Synonyms model. The first synset from WordNet 3.0 is obtained for each noun term, missing terms are added to the model and the hyperedge is created
context	Represents a relation of contextual similarity between a set of terms	Undirected hyperedge. Present in the Contextual similarity model. This is computed based on the top similar terms according to word2vec embeddings
tf_bin	Represents a sets of terms within the same term frequency interval, for a given document	Undirected hyperedge. Present in the TF-bins model. The number of TF-bins per document is a parameter that can be set during indexing

Back to article page