KGE.models.translating_based.TransH

Classes

LpDistancePow

An implementation of negative squared Lp-distance.

PairwiseHingeLoss

An implementation of Pairwise Hinge Loss / Margin Ranking Loss.

TransH

An implementation of TransH from [wang 2014].

TranslatingModel

A base module for Semantic Based Embedding Model.

UniformStrategy

An implementation of uniform negative sampling

Class Inheritance Diagram

Inheritance diagram of KGE.models.translating_based.TransH.TransH

An implementation of TransH

class KGE.models.translating_based.TransH.TransH[source]

Bases: KGE.models.base_model.TranslatingModel.TranslatingModel

An implementation of TransH from [wang 2014].

TransH overcomes the problems of TransE in modeling reflexive/one-to-many/many-to-one/many-to-many relations by enabling an entity to have distributed representations when involved in different relations. TransH represents each relation \(r\) the relation-specific translation vector \(\textbf{r}_r\) in the relation-specific hyperplane \(\textbf{w}_r\), and project head and tail embeddings on to this hyperplane, expecting the projected embeddings can be connected by the relation tranalation vector \(\textbf{r}_r\):

\[ \begin{align}\begin{aligned}{\textbf{e}_h}_{\perp} + \textbf{r}_r \approx {\textbf{e}_t}_{\perp}\\{\textbf{e}_h}_{\perp} = \textbf{e}_h - \textbf{w}_r^T \textbf{e}_h\textbf{w}_r\\{\textbf{e}_t}_{\perp} = \textbf{e}_t - \textbf{w}_r^T \textbf{e}_t\textbf{w}_r\end{aligned}\end{align} \]

where \(\textbf{e}_i \in \mathbb{R}^k\) are vector representations of the entities, \(\textbf{r}_i \in \mathbb{R}^k\) are relation translation vectors, and \(\textbf{w}_i \in \mathbb{R}^k\) are relation hyperplanes.

The score of \((h,r,t)\) is:

\[f(h,r,t) = s({\textbf{e}_h}_{\perp} + \textbf{r}_r, {\textbf{e}_t}_{\perp})\]

where \(s\) is a scoring function (KGE.score) that scores the plausibility of matching between \((translation, predicate)\).

By default, using KGE.score.LpDistancePow, negative squared L2-distance:

\[s({\textbf{e}_h}_{\perp} + \textbf{r}_r, {\textbf{e}_t}_{\perp}) = - \left\| {\textbf{e}_h}_{\perp} + \textbf{r}_r - {\textbf{e}_t}_{\perp} \right\|_2^2\]

You can change to L1-distance by giving score_fn=LpDisrancePow(p=1) in __init__(), or change any score function you like by specifying score_fn in __init__().

If constraint=True given in __init__(), conduct following constraints:

  1. renormalized \(\left\| \textbf{w}_i \right\|_2 = 1\) to have unit length every iteration

  2. \(\left\| \textbf{e}_i \right\|_2 \leq 1\)

  3. \(\left| \mathbf{w}_{r}^T \mathbf{r}_{r} \right| /\left\|\mathbf{r}_{r}\right\|_2 \leq \epsilon\) to guarantees the translation vector \(\textbf{r}_r\) is in the hyperplane

constraint 2 & 3 are realized by soft constraint described in original TransH paper:

\[regularization~term = \lambda \left\{ \sum_i \left[\| \textbf{e}_i \|_{2}^{2}-1 \right]_+ + \sum_i \left[ \frac{\left(\ \textbf{w}_{i}^T \textbf{r}_{i} \right)^2}{\left\| \textbf{r}_{i} \right\|_2^2}-\epsilon^2 \right]_{+} \right\}\]

Methods

evaluate(eval_X, corrupt_side[, positive_X])

Evaluate triplets.

get_rank(x, positive_X, corrupt_side)

Get rank for specific one triplet.

restore_model_weights(model_weights)

Restore the model weights.

score_hrt(h, r, t)

Score the triplets \((h,r,t)\).

train(train_X, val_X, metadata, epochs, ...)

Train the Knowledge Graph Embedding Model.

__init__(embedding_params, negative_ratio, corrupt_side, score_fn=<KGE.score.LpDistancePow object>, loss_fn=<KGE.loss.PairwiseHingeLoss object>, ns_strategy=<class 'KGE.ns_strategy.UniformStrategy'>, constraint=True, constraint_weight=1.0, n_workers=1)[source]

Initialized TransH

Parameters
  • embedding_params (dict) – embedding dimension parameters, should have key 'embedding_size' for embedding dimension \(k\)

  • negative_ratio (int) – number of negative sample

  • corrupt_side (str) – corrupt from which side while trainging, can be 'h', 't', or 'h+t'

  • score_fn (function, optional) – scoring function, by default KGE.score.LpDistancePow

  • loss_fn (class, optional) – loss function class KGE.loss.Loss, by default KGE.loss.PairwiseHingeLoss

  • ns_strategy (function, optional) – negative sampling strategy, by default KGE.ns_strategy.uniform_strategy()

  • constraint (bool, optional) – conduct constraint or not, by default True

  • constraint_weight (float, optional) – regularization weight \(\lambda\), by default 1.0

  • n_workers (int, optional) – number of workers for negative sampling, by default 1

evaluate(eval_X, corrupt_side, positive_X=None)

Evaluate triplets.

Parameters
  • eval_X (tf.Tensor or np.array) – triplets to be evaluated

  • corrupt_side (str) – corrupt triplets from which side, can be 'h' and 't'

  • positive_X (tf.Tensor or np.array, optional) – positive triplets that should be filtered while generating corrupted triplets, by default None (no filter applied)

Returns

evaluation result

Return type

dict

get_rank(x, positive_X, corrupt_side)

Get rank for specific one triplet.

Parameters
  • x (tf.Tensor or np.array) – rank this triplet

  • positive_X (tf.Tensor or np.array, optional) – positive triplets that should bt filtered while generating corrupted triplets, if None, no filter applied

  • corrupt_side (str) – corrupt triplets from which side, can be 'h' and 't'

Returns

ranking result

Return type

int

restore_model_weights(model_weights)

Restore the model weights.

Parameters

model_weights (dict) – dictionary of model weights to be restored

score_hrt(h, r, t)[source]

Score the triplets \((h,r,t)\).

If h is None, score all entities: \((h_i, r, t)\).

If t is None, score all entities: \((h, r, t_i)\).

h and t should not be None simultaneously.

Parameters
  • h (tf.Tensor or np.ndarray or None) – index of heads with shape (n,)

  • r (tf.Tensor or np.ndarray) – index of relations with shape (n,)

  • t (tf.Tensor or np.ndarray or None) – index of tails with shape (n,)

Returns

triplets scores with shape (n,)

Return type

tf.Tensor

train(train_X, val_X, metadata, epochs, batch_size, early_stopping_rounds=None, model_weights_initial=None, restore_best_weight=True, optimizer='Adam', seed=None, log_path='./logs', log_projector=False)

Train the Knowledge Graph Embedding Model.

Parameters
  • train_X (np.ndarray or str) –

    training triplets.

    If np.ndarray, shape should be (n,3) for \((h,r,t)\) respectively.

    If str, training triplets should be save under this folder path with csv format, every csv files should have 3 columns without header for \((h,r,t)\) respectively.

  • val_X (np.ndarray or str) –

    validation triplets.

    If np.ndarray, shape should be (n,3) for \((h,r,t)\) respectively.

    If str, training triplets should be save under this folder path with csv format, every csv files should have 3 columns without header for \((h,r,t)\) respectively.

  • metadata (dict) –

    metadata for kg data. should have following keys:

    'ent2ind': dict, dictionay that mapping entity to index.

    'ind2ent': list, list that mapping index to entity.

    'rel2ind': dict, dictionay that mapping relation to index.

    'ind2rel': list, list that mapping index to relation.

    can use KGE.data_utils.index_kg to index and get metadata.

  • epochs (int) – number of epochs

  • batch_size (int) – batch_size

  • early_stopping_rounds (int, optional) – number of rounds that trigger early stopping, by default None (no early stopping)

  • model_weights_initial (dict, optional) – initial model wieghts with specific value, by default None

  • restore_best_weight (bool, optional) – restore weight to the best iteration if early stopping rounds is not None, by default True

  • optimizer (str or tensorflow.keras.optimizers, optional) – optimizer that apply in training, by default 'Adam', use the default setting of tf.keras.optimizers.Adam

  • seed (int, optional) – random seed for shuffling data & embedding initialzation, by default None

  • log_path (str, optional) – path for tensorboard logging, by default “./logs”

  • log_projector (bool, optional) – project the embbedings in the tensorboard projector tab, setting this True will write the metadata and embedding tsv files in log_path and project this data on tensorboard projector tab, by default False