Datasets, Codes & Testbed

Datasets, Codes, and Testbeds used in our papers. Welcome to use them freely and cite our publications.


* blockEmulator *

We made our experimental tool for blockchain research open-source. This tool is named blockEmulator.

A. Visit our Websites of blockEmulator

B. Introduction to blockEmulator

  • Initiated by HuangLab (a research group in the School of Software Engineering, Sun Yat-sen University, China), *BlockEmulator is a blockchain testbed that enables researchers to verify their proposed new protocols and mechanisms. It supports multiple consensus protocols and particularly the cross-shard mechanism.
  • The main purpose of this testbed is to help users (researchers, students, etc.) quickly verify their own blockchain consensus protocols and blockchain-sharding mechanisms. BlockEmulator is designed as an experimental platform that adopts lightweight system architecture.
  • It simplifies the implementation of industrial-class blockchains since BlockEmulator only implements the core functions of a blockchain, including the transaction pool, block packaging, consensus protocols, and on-chain transaction storage. It also supports common consensus protocols, such as Practical Byzantine Fault Tolerance (PBFT) and Proof of Work (PoW).
  • In particular, BlockEmulator offers the system-level design and implementation for blockchain sharding mechanisms. For example, the cross-shard transaction mechanisms implemented by BlockEmulator include the following two representative solutions, i.e., i) Relay transaction mechanism proposed by Monoxide (NSDI’2019), and the BrokerChain protocol proposed by BrokerChain (INFOCOM’2022).
  • BlockEmulator is oriented toward blockchain researchers because it provides a blockchain experimental platform for quickly implementing their own algorithms, protocols, and mechanisms. It also offers very helpful functions for researchers to help them collect experimental data, facilitating their plotting experimental figures.

C. Papers related to blockEmulator

The following papers from HuangLab’s publications have adopted BlockEmulator as an experimental tool.

  • BrokerChain: A Cross-Shard Blockchain Protocol for Account/Balance-based State Sharding (published at INFOCOM 2022) PDF
  • Achieving Scalability and Load Balance across Blockchain Shards for State Sharding (published at SRDS 2022) PDF
  • tMPT: Reconfiguration across Blockchain Shards via Trimmed Merkle Patricia Trie (published at IWQoS 2023) PDF
  • MVCom: Scheduling Most Valuable Committees for the Large-Scale Sharded Blockchain (published at ICDCS 2021) PDF
  • Scheduling Most Valuable Committees for the Sharded Blockchain (published at IEEE/ACM ToN/TNet 2023) PDF

Datasets & Codes

#1. Dataset & Codes for Predicting Machine Failures

Background: This dataset is to implement the failure prediction using machine learning methods and AI approaches such as SVM, random forest, or deep learning algorithms. Besides the original dataset, I also provide two reports written by two visiting students when they performed a visiting-study in my lab in July 2019.
Huawei Huang, and Song Guo, “Proactive Failure Recovery for NFV in Distributed Edge Computing”, IEEE Communications Magazine, vol. 57, no. 5, pp. 131-137, March 2019
  • The dataset after preprocessing:

  • The related technique reports and codes from two visiting students:

#2. Dataset & Codes for Predicting Server Failures

Background: This dataset is used to predict the failures of server machines that occurred in a datacenter. The related published papers are as follows.

Huakun Huang, Lingjun Zhao, Huawei Huang, Song Guo, "Machine Fault Detection for Intelligent Self-Driving Networks", IEEE Communications Magazine, Vol. 58 , Issue No. 1, pp. 40-46, January 2020  [RG-Page]
Huakun Huang, Shuxue Ding, Lingjun Zhao, Huawei Huang, et al., "Real-Time Fault-Detection for IIoT Facilities using GBRBM-based DNN", IEEE Internet of Things Journal, Oct. 21, 2019. DOI: 10.1109/JIOT.2019.2948396 [RG-Page]
  • Original dataset and cleaned dataset:
  • Processing codes:

#3. Dataset & Codes of MVCom (published at ICDCS 2021)

Background: This code-and-dataset shows how we implement the algorithms used in our paper, including the proposed SE algorithm and other 3 baselines (SA, DP, WOA).
Huawei Huang, Zhenyi Huang, Xiaowen Peng, Zibin Zheng, Song Guo, “MVCom: Scheduling Most Valuable Committees for the Large-Scale Sharded Blockchain”, ICDCS, July 2021 [RG-Page & PDF]

According to the requests of some readers, I provide all codes of the algorithms used in this paper such as the SE (stochastic exploration) algorithm (which is a Markov-based Algorithm, MA), SA algorithm, DP algorithm, and WOA algorithm. Some figure-plotting codes and partial data are also included. (Updated on Nov. 7, 2022, by Huawei HUANG)

#4. Datasets of ContextFL (published at ICDCS 2022)

Background: This dataset is used to predict the ** CPU / Network Connection / App usages ** of mobile users when they are using their smartphones. The related published paper is as follows.
Huawei Huang, Ruixin Li, Jialiang Liu, Sicong Zhou, Kangying Lin, and Zibin Zheng, “ContextFL: Context-aware Federated Learning by Estimating the Training and Reporting Phases of Mobile Clients”, in proc. of IEEE International Conference on Distributed Computing Systems (ICDCS), 2022.  [RG-Page], [公众号介绍文章]