Dynamic Node Property Prediction

The task is to predict properties of single nodes at a future time.

Summary

- Datasets

Scale Name Package #Nodes #Edges* #Steps Surprise Metric
Small tgbn-trade 0.8.0 255 468,245 32 0.023 NDCG@10
Medium tgbn-genre 0.8.0 1,505 17,858,395 133,758 0.005 NDCG@10
Large tgbn-reddit 0.8.0 11,766 27,174,118 21,889,537 0.013 NDCG@10
Large tgbn-token 0.9.0 61,756 72,936,998 2,036,524 0.014 NDCG@10

- Module

Datasets are available in Numpy arrays, Pytorch tensors and PyG TemporalData objects. We also provide the evaluator.


Dataset tgbn-trade (Leaderboard):

Temporal Graph: The tgbn-trade dataset is the international agriculture trading network between nations of the United Nations (UN) from 1986 to 2016. Each node is a nation and an edge represents the sum trade value of all agriculture products from one nation to another one. As the data is reported annually, the time granularity of the dataset is yearly.

Prediction task: The considered task for this dataset is to predict the proportion of agriculture trade values from one nation to other nations during the next year.

References

[1] G. K. MacDonald, K. A. Brauman, S. Sun, K. M. Carlson, E. S. Cassidy, J. S. Gerber, and P. C. West. Rethinking agricultural trade relationships in an era of globalization. BioScience,65(3):275–289, 2015.

License: MIT license


Dataset tgbn-genre (Leaderboard):

Temporal Graph: The tgbn-genre dataset is a bipartite and weighted interaction network between users and the music genres of songs they listen to. Both users and music genres are represented as nodes while an interaction specifies a user listens to a music genre at a given time. The edge weights denote the percentage of which a song belongs to a certain genre. The dataset is constructed by cross referencing the songs in the LastFM-song-listens dataset with that of music genres in the million-song dataset. The LastFM-song-listens dataset has one month of who-listens-to-which-song information for 1000 users and the million-song dataset provides genre weights for all songs in the LastFM-song-listens dataset. We only retain genres with at least 10% weights for each song that are repeated at least a thousand times in the dataset. In addition, the genre names are further cross references to remove genre names with typos.

Prediction task: The considered task for this dataset is to rank with which set of music genres a user will interact the most over the course of the next week.

References

[2] S. Kumar, X. Zhang, and J. Leskovec. Predicting dynamic embedding trajectory in temporal interaction networks. In Proceedings of the 25th ACM SIGKDD international conference on knowledge discovery & data mining, pages 1269–1278, 2019.
[3] B. Hidasi and D. Tikk. Fast als-based tensor factorization for context-aware recommendation from implicit feedback. In Machine Learning and Knowledge Discovery in Databases: European Conference, ECML PKDD 2012, Bristol, UK, September 24-28, 2012. Proceedings, Part II 23, pages 67–82. Springer, 2012.
[4] T. Bertin-Mahieux, D. P. Ellis, B. Whitman, and P. Lamere. The million song dataset. 2011.

License: MIT license


Dataset tgbn-reddit (Leaderboard):

Temporal Graph: The tgbn-reddit dataset is a users and subreddits interaction network. Both users and subreddits are nodes and each edge indicates that a user posted on a subreddit at a given time. The dataset spans from 2005 to 2019.

Prediction task: The task considered for this dataset is to rank with which subreddits a user will interact the most over the next week.

References

[5] A. Nadiri and F. W. Takes. A large-scale temporal analysis of user lifespan durability on the reddit social media platform. In Companion Proceedings of the Web Conference 2022, pages 677–685, 2022.

License: CC BY-NC license (Attribution-NonCommercial)


Dataset tgbn-token (Leaderboard):

Temporal Graph: The tgbn-reddit dataset is a user and cryptocurrency token transaction network. Both users and tokens are nodes and each edge indicates the transaction from a user to a token. The edge weights indicate the amount of token transferred and considering the disparity between weights, we normalized the edge weights using logarithm.

Prediction task: The task considered for this dataset is to predict how frequently a user will interact with various types of tokens over the next week.

References

[6] K. Shamsi, Y. R. Gel, M. Kantarcioglu, and C. G. Akcora. Chartalist: Labeled graph datasets for utxo and account-based blockchains. In Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2022, NeurIPS 2022, November 29-December 1, 2022, New Orleans, LA, USA, pages 1–14, 2022.

License: CC BY-NC license (Attribution-NonCommercial)


Numpy Arrays

To load a dataset replace the name variable with a dataset name. See more from the TGB tutorial

from tgb.nodeproppred.dataset import NodePropPredDataset

name = "tgbn-trade"

dataset = NodePropPredDataset(name=name, root="datasets", preprocess=True)

data = dataset.full_data

type(data['sources']) #all source nodes of edges

Pytorch Tensors

To load a dataset replace the name variable with a dataset name.

from tgb.nodeproppred.dataset_pyg import PyGNodePropPredDataset

name = "tgbn-trade"

dataset = PyGNodePropPredDataset(name=name, root="datasets")

dataset.src #all source nodes of edges

Pytorch Geometric TemporalData

from tgb.nodeproppred.dataset_pyg import PyGNodePropPredDataset

name = "tgbn-trade"

dataset = PyGNodePropPredDataset(name=name, root="datasets")

data = dataset.get_TemporalData()

type(data) #TemporalData object


Performance Evaluator

Please see detailed example of nodeproppred here

from tgb.nodeproppred.evaluate import Evaluator

name = "tgbn-trade"

evaluator = Evaluator(name=name)

metric = dataset.eval_metric