Dynamic Link Property Prediction
The task is to predict properties of edges (pairs of nodes) at a future time.
Summary
- Datasets
Scale | Name | Package | #Nodes | #Edges* | #Steps | Surprise | Metric |
---|---|---|---|---|---|---|---|
tgbl-wiki-v2 | 0.7.5 | 9,227 | 157,474 | 152,757 | 0.108 | MRR | |
tgbl-review-v2 | 0.7.5 | 352,637 | 4,873,540 | 6,865 | 0.987 | MRR | |
medium | tgbl-coin | 0.9.0 | 638,486 | 22,809,486 | 1,295,720 | 0.120 | MRR |
tgbl-comment | 0.7.5 | 994,790 | 44,314,507 | 30,998,030 | 0.823 | MRR | |
tgbl-flight | 0.9.0 | 18,143 | 67,169,570 | 1,385 | 0.024 | MRR |
- Module
Datasets are available in Numpy
arrays, Pytorch
tensors and PyG TemporalData
objects.
We also provide the evaluator.
Dataset tgbl-wiki-v2
(Leaderboard):
Temporal Graph: The tgbl-wiki
dataset stores the co-editing network on Wikipedia pages over one month. The network is a bipartite interaction network where editors and wiki pages are nodes, while one edge represents a given user edits a page at a specific timestamp. Each edge has text features from the page edits.
Prediction task: The task for this dataset is to predict with which wiki page a user will interact at a given time.
References
[1] S. Kumar, X. Zhang, and J. Leskovec. Predicting dynamic embedding trajectory in temporal interaction networks. In Proceedings of the 25th ACM SIGKDD international conference on knowledge discovery & data mining, pages 1269–1278, 2019.
License: MIT license
Dataset tgbl-review-v2
(Leaderboard):
Temporal Graph: The tgbl-review
dataset is an Amazon product review network from 1997 to 2018 where users rate different products in the electronics category from a scale from one to five. Therefore, the network is a bipartite weighted network where both users and products are nodes and each edge represents a particular review from a user to a product at a given time. Only users with a minimum of 10 reviews within the aforementioned time interval are kept in the network.
Prediction task: The considered task for this dataset is to predict which product a user will review at a given time.
References
[2] J. Ni, J. Li, and J. McAuley. Justifying recommendations using distantly-labeled reviews and fine-grained aspects. In Proceedings of the 2019 conference on empirical methods in natural language processing and the 9th international joint conference on natural language processing (EMNLP-IJCNLP), pages 188–197, 2019.
License: Amazon license
Dataset tgbl-coin
(Leaderboard):
Temporal Graph: This is a cryptocurrency transaction dataset based on the Stablecoin ERC20 transactions dataset. Each node is an address and each edge represents the transfer of funds from one address to another at a time. The network starts from April 1st, 2022, and ends on November 1st, 2022, and contains transaction data of 5 stablecoins and 1 wrapped token. This duration includes the Terra Luna crash where the token lost its fixed price of 1 USD.
Prediction task: The considered task for this dataset is to predict with which destination a given address will interact at a given time.
References
[3] K. Shamsi, Y. R. Gel, M. Kantarcioglu, and C. G. Akcora. Chartalist: Labeled graph datasets for utxo and account-based blockchains. In Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2022, NeurIPS 2022, November 29-December 1, 2022, New Orleans, LA, USA, pages 1–14, 2022.
License: CC BY-NC license (Attribution-NonCommercial)
Dataset tgbl-comment
(Leaderboard):
Temporal Graph: This dataset is a directed reply network of Reddit where users reply to each other’s threads. Each node is a user and each interaction is a reply from one user to another. The network starts from 2005 and ends at 2010.
Prediction task: The considered task for this dataset is to predict if a given user will reply to another one at a given time.
References
[4] A. Nadiri and F. W. Takes. A large-scale temporal analysis of user lifespan durability on the reddit social media platform. In Companion Proceedings of the Web Conference 2022, pages 677–685, 2022.
License: CC BY-NC license (Attribution-NonCommercial)
Dataset tgbl-flight
(Leaderboard):
Temporal Graph: This dataset is a crowd sourced international flight network from 2019 to 2022. The airports are modeled as nodes, while the edges are flights between airports at a given day. The node features include the type of the airport, the continent where the airport is located, the ISO region code of the airport as well as its longitude and latitude. The edge feature is the associated flight number.
Prediction task: The considered task for this dataset is to predict if a given flight will exist between a source and destination airport at a specified day.
References
[5] M. Strohmeier, X. Olive, J. Lübbe, M. Schäfer, and V. Lenders. Crowdsourced air traffic data from the opensky network 2019–2020. Earth System Science Data, 13(2):357–366, 2021.
License: Opensky license
Numpy Arrays
To load a dataset replace the name variable with a dataset name. See more from the TGB tutorial
from tgb.linkproppred.dataset import LinkPropPredDataset
name = "tgbl-wiki"
dataset = LinkPropPredDataset(name=name, root="datasets", preprocess=True)
data = dataset.full_data
type(data['sources']) #all source nodes of edges
Pytorch Tensors
To load a dataset replace the name variable with a dataset name.
from tgb.linkproppred.dataset_pyg import PyGLinkPropPredDataset
name = "tgbl-wiki"
dataset = PyGLinkPropPredDataset(name=name, root="datasets")
dataset.src #all source nodes of edges
Pytorch Geometric TemporalData
from tgb.linkproppred.dataset_pyg import PyGLinkPropPredDataset
name = "tgbl-wiki"
dataset = PyGLinkPropPredDataset(name=name, root="datasets")
data = dataset.get_TemporalData()
type(data) #TemporalData object
Performance Evaluator
Please see detailed example of linkproppred
here
from tgb.linkproppred.evaluate import Evaluator
name = "tgbl-wiki"
evaluator = Evaluator(name=name)
metric = dataset.eval_metric