Skip to main content

Table 1 Hypergraph-of-entity nodes and hyperedges for the base model and the extensions

From: Characterizing the hypergraph-of-entity and the structural impact of its extensions

Type

Description

Observation

Nodes

term

Represents a single word from the original document

In this work, the preprocessing pipeline includes: sentence segmentation; lower case filtering; replacement of URL, time, money and number expressions with a common placeholder, each; stemming via porter stemmer

entity

Represents an entity from the list of extracted entities and/or provided triples

For the INEX collection, each mention to an entity is modeled through this type of node (we consider disambiguation to be a part of the ranking)

Hyperedges (base model)

document

Represents a document through the set of all its terms and entities

Undirected hyperedge

related_to

Represents a semantic relation between multiple entities

Undirected hyperedge. In this implementation, the relation is derived from all triples in the collection, by grouping by subject

contained_in

Represents a relation between a set of terms and an entity.

Directed hyperedge. In this implementation, this relation exists between terms that are a part of an entity name or mention and the corresponding entity node

Hyperedges (extensions)

synonym

Represents a relation of synonymy between a set of terms

Undirected hyperedge. Present in the Synonyms model. The first synset from WordNet 3.0 is obtained for each noun term, missing terms are added to the model and the hyperedge is created

context

Represents a relation of contextual similarity between a set of terms

Undirected hyperedge. Present in the Contextual similarity model. This is computed based on the top similar terms according to word2vec embeddings

tf_bin

Represents a sets of terms within the same term frequency interval, for a given document

Undirected hyperedge. Present in the TF-bins model. The number of TF-bins per document is a parameter that can be set during indexing