KGE.models.translating_based.TransD

Classes

LpDistancePow

An implementation of negative squared Lp-distance.

PairwiseHingeLoss

An implementation of Pairwise Hinge Loss / Margin Ranking Loss.

TransD

An implementation of TransD from [ji 2015].

TranslatingModel

A base module for Semantic Based Embedding Model.

UniformStrategy

An implementation of uniform negative sampling

Class Inheritance Diagram

Inheritance diagram of KGE.models.translating_based.TransD.TransD

An implementation of TransD

class KGE.models.translating_based.TransD.TransD[source]

Bases: KGE.models.base_model.TranslatingModel.TranslatingModel

An implementation of TransD from [ji 2015].

TransD models entities and relations in distinct embedding spaces like TransR, but unlike TransR which projects entities embeddings to relation space using single projection matrix \(\textbf{M}_i\) for each relation, TransD consturcts two projection matrices dynamically, these two projection matrices are determined by both entities and relations, so called TransD.

In TransD, each entity and relation are represented by two vectors: \(\textbf{e}_i \in \mathbb{R}^k, \textbf{r}_i \in \mathbb{R}^d\) capture the meaning of entity and relation, \(\tilde{\textbf{e}}_i \in \mathbb{R}^k, \tilde{\textbf{r}}_i \in \mathbb{R}^d\) used to construct projection matrices:

\[ \begin{align}\begin{aligned}\mathbf{M}_{rh} = \tilde{\textbf{r}}_r \tilde{\textbf{e}}_h^T + \mathbf{I}^{d \times k}\\\mathbf{M}_{rt} = \tilde{\textbf{r}}_r \tilde{\textbf{e}}_t^T + \mathbf{I}^{d \times k}\end{aligned}\end{align} \]

These two constructed projection matrices are used to project embedding vectors to relation space similar with TransR:

\[ \begin{align}\begin{aligned}{\textbf{e}_h}_{\perp} = \textbf{M}_{rh} \textbf{e}_h\\{\textbf{e}_t}_{\perp} = \textbf{M}_{rt} \textbf{e}_t\end{aligned}\end{align} \]

and expecting the projected entity embeddings can be connected by the relation embeddings in the relation spaces:

\[{\textbf{e}_h}_{\perp} + \textbf{r}_r \approx {\textbf{e}_t}_{\perp}\]

The score of \((h,r,t)\) is:

\[f(h,r,t) = s({\textbf{e}_h}_{\perp} + \textbf{r}_r, {\textbf{e}_t}_{\perp})\]

where \(s\) is a scoring function (KGE.score) that scores the plausibility of matching between \((translation, predicate)\).

By default, using KGE.score.LpDistancePow(), negative squared L2-distance:

\[s({\textbf{e}_h}_{\perp} + \textbf{r}_r, {\textbf{e}_t}_{\perp}) = - \left\| {\textbf{e}_h}_{\perp} + \textbf{r}_r - {\textbf{e}_t}_{\perp} \right\|_2^2\]

You can change to L1-distance by giving score_fn=LpDistancePow(p=1) in __init__(), or change any score function you like by specifying score_fn in __init__().

If constraint=True given in __init__(), conduct following constraints:

  1. \(\left\| \textbf{e}_h \right\|_2 \leq 1\) and \(\left\| \textbf{r}_r \right\|_2 \leq 1\) and \(\left\| \textbf{e}_t \right\|_2 \leq 1\)

  2. \(\left\| {\textbf{e}_h}_{\perp} \right\|_2 \leq 1\) and \(\left\| {\textbf{e}_t}_{\perp} \right\|_2 \leq 1\)

Since the original TransD paper dose not specify how they conduct these constraints, here we use KGE.constraint.clip_constraint() which restrict the tensor’s norm does not exceeds some value, if exceeds, clip the tensor norm to given threshold value.

Methods

evaluate(eval_X, corrupt_side[, positive_X])

Evaluate triplets.

get_rank(x, positive_X, corrupt_side)

Get rank for specific one triplet.

restore_model_weights(model_weights)

Restore the model weights.

score_hrt(h, r, t)

Score the triplets \((h,r,t)\).

train(train_X, val_X, metadata, epochs, ...)

Train the Knowledge Graph Embedding Model.

__init__(embedding_params, negative_ratio, corrupt_side, score_fn=<KGE.score.LpDistancePow object>, loss_fn=<KGE.loss.PairwiseHingeLoss object>, ns_strategy=<class 'KGE.ns_strategy.UniformStrategy'>, constraint=True, n_workers=1)[source]

Initialized TransR

Parameters
  • embedding_params (dict) –

    embedding dimension parameters, should have following keys:

    'ent_embedding_size' for entity embedding dimension \(k\) 'rel_embedding_size' for relation embedding dimension \(d\)

  • negative_ratio (int) – number of negative sample

  • corrupt_side (str) – corrupt from which side while trainging, can be 'h', 't', or 'h+t'

  • score_fn (function, optional) – scoring function, by default KGE.score.LpDistancePow

  • loss_fn (class, optional) – loss function class KGE.loss.Loss, by default KGE.loss.PairwiseHingeLoss

  • ns_strategy (function, optional) – negative sampling strategy, by default KGE.ns_strategy.uniform_strategy()

  • constraint (bool, optional) – conduct constraint or not, by default True

  • n_workers (int, optional) – number of workers for negative sampling, by default 1

evaluate(eval_X, corrupt_side, positive_X=None)

Evaluate triplets.

Parameters
  • eval_X (tf.Tensor or np.array) – triplets to be evaluated

  • corrupt_side (str) – corrupt triplets from which side, can be 'h' and 't'

  • positive_X (tf.Tensor or np.array, optional) – positive triplets that should be filtered while generating corrupted triplets, by default None (no filter applied)

Returns

evaluation result

Return type

dict

get_rank(x, positive_X, corrupt_side)

Get rank for specific one triplet.

Parameters
  • x (tf.Tensor or np.array) – rank this triplet

  • positive_X (tf.Tensor or np.array, optional) – positive triplets that should bt filtered while generating corrupted triplets, if None, no filter applied

  • corrupt_side (str) – corrupt triplets from which side, can be 'h' and 't'

Returns

ranking result

Return type

int

restore_model_weights(model_weights)

Restore the model weights.

Parameters

model_weights (dict) – dictionary of model weights to be restored

score_hrt(h, r, t)[source]

Score the triplets \((h,r,t)\).

If h is None, score all entities: \((h_i, r, t)\).

If t is None, score all entities: \((h, r, t_i)\).

h and t should not be None simultaneously.

Parameters
  • h (tf.Tensor or np.ndarray or None) – index of heads with shape (n,)

  • r (tf.Tensor or np.ndarray) – index of relations with shape (n,)

  • t (tf.Tensor or np.ndarray or None) – index of tails with shape (n,)

Returns

triplets scores with shape (n,)

Return type

tf.Tensor

train(train_X, val_X, metadata, epochs, batch_size, early_stopping_rounds=None, model_weights_initial=None, restore_best_weight=True, optimizer='Adam', seed=None, log_path='./logs', log_projector=False)

Train the Knowledge Graph Embedding Model.

Parameters
  • train_X (np.ndarray or str) –

    training triplets.

    If np.ndarray, shape should be (n,3) for \((h,r,t)\) respectively.

    If str, training triplets should be save under this folder path with csv format, every csv files should have 3 columns without header for \((h,r,t)\) respectively.

  • val_X (np.ndarray or str) –

    validation triplets.

    If np.ndarray, shape should be (n,3) for \((h,r,t)\) respectively.

    If str, training triplets should be save under this folder path with csv format, every csv files should have 3 columns without header for \((h,r,t)\) respectively.

  • metadata (dict) –

    metadata for kg data. should have following keys:

    'ent2ind': dict, dictionay that mapping entity to index.

    'ind2ent': list, list that mapping index to entity.

    'rel2ind': dict, dictionay that mapping relation to index.

    'ind2rel': list, list that mapping index to relation.

    can use KGE.data_utils.index_kg to index and get metadata.

  • epochs (int) – number of epochs

  • batch_size (int) – batch_size

  • early_stopping_rounds (int, optional) – number of rounds that trigger early stopping, by default None (no early stopping)

  • model_weights_initial (dict, optional) – initial model wieghts with specific value, by default None

  • restore_best_weight (bool, optional) – restore weight to the best iteration if early stopping rounds is not None, by default True

  • optimizer (str or tensorflow.keras.optimizers, optional) – optimizer that apply in training, by default 'Adam', use the default setting of tf.keras.optimizers.Adam

  • seed (int, optional) – random seed for shuffling data & embedding initialzation, by default None

  • log_path (str, optional) – path for tensorboard logging, by default “./logs”

  • log_projector (bool, optional) – project the embbedings in the tensorboard projector tab, setting this True will write the metadata and embedding tsv files in log_path and project this data on tensorboard projector tab, by default False