KGE.models.translating_based.TransH
Classes
|
An implementation of negative squared Lp-distance. |
|
An implementation of Pairwise Hinge Loss / Margin Ranking Loss. |
An implementation of TransH from [wang 2014]. |
|
|
A base module for Semantic Based Embedding Model. |
|
An implementation of uniform negative sampling |
Class Inheritance Diagram

An implementation of TransH
- class KGE.models.translating_based.TransH.TransH[source]
Bases:
KGE.models.base_model.TranslatingModel.TranslatingModelAn implementation of TransH from [wang 2014].
TransH overcomes the problems of TransE in modeling reflexive/one-to-many/many-to-one/many-to-many relations by enabling an entity to have distributed representations when involved in different relations. TransH represents each relation \(r\) the relation-specific translation vector \(\textbf{r}_r\) in the relation-specific hyperplane \(\textbf{w}_r\), and project head and tail embeddings on to this hyperplane, expecting the projected embeddings can be connected by the relation tranalation vector \(\textbf{r}_r\):
\[ \begin{align}\begin{aligned}{\textbf{e}_h}_{\perp} + \textbf{r}_r \approx {\textbf{e}_t}_{\perp}\\{\textbf{e}_h}_{\perp} = \textbf{e}_h - \textbf{w}_r^T \textbf{e}_h\textbf{w}_r\\{\textbf{e}_t}_{\perp} = \textbf{e}_t - \textbf{w}_r^T \textbf{e}_t\textbf{w}_r\end{aligned}\end{align} \]where \(\textbf{e}_i \in \mathbb{R}^k\) are vector representations of the entities, \(\textbf{r}_i \in \mathbb{R}^k\) are relation translation vectors, and \(\textbf{w}_i \in \mathbb{R}^k\) are relation hyperplanes.
The score of \((h,r,t)\) is:
\[f(h,r,t) = s({\textbf{e}_h}_{\perp} + \textbf{r}_r, {\textbf{e}_t}_{\perp})\]where \(s\) is a scoring function (
KGE.score) that scores the plausibility of matching between \((translation, predicate)\).By default, using
KGE.score.LpDistancePow, negative squared L2-distance:\[s({\textbf{e}_h}_{\perp} + \textbf{r}_r, {\textbf{e}_t}_{\perp}) = - \left\| {\textbf{e}_h}_{\perp} + \textbf{r}_r - {\textbf{e}_t}_{\perp} \right\|_2^2\]You can change to L1-distance by giving
score_fn=LpDisrancePow(p=1)in__init__(), or change any score function you like by specifyingscore_fnin__init__().If
constraint=Truegiven in__init__(), conduct following constraints:renormalized \(\left\| \textbf{w}_i \right\|_2 = 1\) to have unit length every iteration
\(\left\| \textbf{e}_i \right\|_2 \leq 1\)
\(\left| \mathbf{w}_{r}^T \mathbf{r}_{r} \right| /\left\|\mathbf{r}_{r}\right\|_2 \leq \epsilon\) to guarantees the translation vector \(\textbf{r}_r\) is in the hyperplane
constraint 2 & 3 are realized by
soft constraintdescribed in original TransH paper:\[regularization~term = \lambda \left\{ \sum_i \left[\| \textbf{e}_i \|_{2}^{2}-1 \right]_+ + \sum_i \left[ \frac{\left(\ \textbf{w}_{i}^T \textbf{r}_{i} \right)^2}{\left\| \textbf{r}_{i} \right\|_2^2}-\epsilon^2 \right]_{+} \right\}\]Methods
evaluate(eval_X, corrupt_side[, positive_X])Evaluate triplets.
get_rank(x, positive_X, corrupt_side)Get rank for specific one triplet.
restore_model_weights(model_weights)Restore the model weights.
score_hrt(h, r, t)Score the triplets \((h,r,t)\).
train(train_X, val_X, metadata, epochs, ...)Train the Knowledge Graph Embedding Model.
- __init__(embedding_params, negative_ratio, corrupt_side, score_fn=<KGE.score.LpDistancePow object>, loss_fn=<KGE.loss.PairwiseHingeLoss object>, ns_strategy=<class 'KGE.ns_strategy.UniformStrategy'>, constraint=True, constraint_weight=1.0, n_workers=1)[source]
Initialized TransH
- Parameters
embedding_params (
dict) – embedding dimension parameters, should have key'embedding_size'for embedding dimension \(k\)negative_ratio (
int) – number of negative samplecorrupt_side (
str) – corrupt from which side while trainging, can be'h','t', or'h+t'score_fn (
function, optional) – scoring function, by defaultKGE.score.LpDistancePowloss_fn (
class, optional) – loss function classKGE.loss.Loss, by defaultKGE.loss.PairwiseHingeLossns_strategy (
function, optional) – negative sampling strategy, by defaultKGE.ns_strategy.uniform_strategy()constraint (
bool, optional) – conduct constraint or not, by default Trueconstraint_weight (
float, optional) – regularization weight \(\lambda\), by default 1.0n_workers (
int, optional) – number of workers for negative sampling, by default 1
- evaluate(eval_X, corrupt_side, positive_X=None)
Evaluate triplets.
- Parameters
eval_X (
tf.Tensorornp.array) – triplets to be evaluatedcorrupt_side (
str) – corrupt triplets from which side, can be'h'and't'positive_X (
tf.Tensorornp.array, optional) – positive triplets that should be filtered while generating corrupted triplets, by default None (no filter applied)
- Returns
evaluation result
- Return type
dict
- get_rank(x, positive_X, corrupt_side)
Get rank for specific one triplet.
- Parameters
x (
tf.Tensorornp.array) – rank this tripletpositive_X (
tf.Tensorornp.array, optional) – positive triplets that should bt filtered while generating corrupted triplets, ifNone, no filter appliedcorrupt_side (
str) – corrupt triplets from which side, can be'h'and't'
- Returns
ranking result
- Return type
int
- restore_model_weights(model_weights)
Restore the model weights.
- Parameters
model_weights (
dict) – dictionary of model weights to be restored
- score_hrt(h, r, t)[source]
Score the triplets \((h,r,t)\).
If
hisNone, score all entities: \((h_i, r, t)\).If
tisNone, score all entities: \((h, r, t_i)\).handtshould not beNonesimultaneously.- Parameters
h (
tf.Tensorornp.ndarrayorNone) – index of heads with shape(n,)r (
tf.Tensorornp.ndarray) – index of relations with shape(n,)t (
tf.Tensorornp.ndarrayorNone) – index of tails with shape(n,)
- Returns
triplets scores with shape
(n,)- Return type
tf.Tensor
- train(train_X, val_X, metadata, epochs, batch_size, early_stopping_rounds=None, model_weights_initial=None, restore_best_weight=True, optimizer='Adam', seed=None, log_path='./logs', log_projector=False)
Train the Knowledge Graph Embedding Model.
- Parameters
train_X (
np.ndarrayorstr) –training triplets.
If
np.ndarray, shape should be(n,3)for \((h,r,t)\) respectively.If
str, training triplets should be save under this folder path with csv format, every csv files should have 3 columns without header for \((h,r,t)\) respectively.val_X (
np.ndarrayorstr) –validation triplets.
If
np.ndarray, shape should be(n,3)for \((h,r,t)\) respectively.If
str, training triplets should be save under this folder path with csv format, every csv files should have 3 columns without header for \((h,r,t)\) respectively.metadata (
dict) –metadata for kg data. should have following keys:
'ent2ind': dict, dictionay that mapping entity to index.'ind2ent': list, list that mapping index to entity.'rel2ind': dict, dictionay that mapping relation to index.'ind2rel': list, list that mapping index to relation.can use KGE.data_utils.index_kg to index and get metadata.
epochs (
int) – number of epochsbatch_size (
int) – batch_sizeearly_stopping_rounds (
int, optional) – number of rounds that trigger early stopping, by default None (no early stopping)model_weights_initial (
dict, optional) – initial model wieghts with specific value, by default Nonerestore_best_weight (
bool, optional) – restore weight to the best iteration if early stopping rounds is not None, by default Trueoptimizer (
strortensorflow.keras.optimizers, optional) – optimizer that apply in training, by default'Adam', use the default setting of tf.keras.optimizers.Adamseed (
int, optional) – random seed for shuffling data & embedding initialzation, by default Nonelog_path (
str, optional) – path for tensorboard logging, by default “./logs”log_projector (
bool, optional) – project the embbedings in the tensorboard projector tab, setting this True will write the metadata and embedding tsv files inlog_pathand project this data on tensorboard projector tab, by default False