BenchCouncil Transactions on Benchmarks, Standards and Evaluations

Volume 2, Issue 4In progress (October 2022)

Download Volume 2, Issue 4

TBench (BenchCouncil Transactions on Benchmarks, Standards and Evaluations) Calls for Papers


Original Articles


HPC AI500 V3.0: A scalable HPC AI benchmarking framework

Zihan Jiang, Chunjie Luo, Wanling Gao, Lei Wang, Jianfeng Zhan


Abstract

In recent years, the convergence of High Performance Computing (HPC) and artificial intelligence (AI) makes the community desperately need a benchmark to guide the design of next-generation scalable HPC AI systems. The success of the HPL benchmarks and the affiliated TOP500 ranking indicates that scalability is the fundamental requirement to evaluate HPC systems. However, being scalable in terms of these emerging AI workloads like deep learning (DL) raises nontrivial challenges. This paper formally and systematically analyzes the factor that limits scalability in DL workloads and presents HPC AI500 v3.0, a scalable HPC AI benchmarking framework. The HPC AI500 V3.0 methodology is inspired by bagging, which utilizes the collective wisdom of an ensemble of base models and enables the benchmarks to be adaptively scalable to different scales of HPC systems. We implement HPC AI500 V3.0 in a highly customizable manner, maintaining the space of various optimization from both system and algorithm levels. By reusing the representative workloads in HPC AI500 V2.0, we evaluate HPC AI500 V3.0 on typical HPC systems, and the results show it has near-linear scalability. Furthermore, based on the customizable design, we present a case study to perform a trade-off between AI model quality and its training speed. The source code of HPC AI500 V3.0 is publicly available from the HPC AI500 project homepage https://www.benchcouncil.org/aibench/hpcai500/ .


CpsMark+: A scenario-oriented benchmark system for office desktop performance evaluation in centralized procurement via simulating user experience

Yue Zhang, Tong Wu


Abstract

Rapid business expansion of various companies has placed growing demand on office desktops recent decades. However, improper evaluation of system performance and inexplicit awareness of practical use conditions often hamper the efforts to make a consummate selection among multiple alternatives. From the perspective of end users, to optimize the evaluation process of desktop performance in centralized procurement, we present CpsMark+, a coherent benchmark system that evaluates office desktop performance based on simulated user experience. Specifically, CpsMark+ includes scenario-oriented workloads portraying representative user behaviors modeled from the cooperative workflow in modern office routines, and flexibly adapted metrics properly reflecting end-user experience according to different task types. The contrast experiment between state-of-the-art benchmarks demonstrates high sensitivity of CpsMark+ to various hardware components, e.g., CPU, and high repeatability with a Coefficient of Variation less than 3%. In a practical case study, we also demonstrate the effectiveness of CpsMark+ in simulating user experience of tested computer systems under modern office-oriented scenarios for improving the quality of office desktop performance evaluation in centralized procurement.


Optimizing the sparse approximate inverse preconditioning algorithm on GPU

Xinyue Chu, Yizhou Wang, Qi Chen, Jiaquan Gao


Abstract

In this study, we present an optimization sparse approximate inverse (SPAI) preconditioning algorithm on GPU, called GSPAI-Opt. In GSPAI-Opt, it fuses the advantages of two popular SPAI preconditioning algorithms, and has the following novelties: (1) an optimization strategy is proposed to choose whether to use the constant or non-constant thread group for any sparse pattern of the preprocessor, and (2) a parallel framework of optimizing the SPAI preconditioner is proposed on GPU, and (3) for each component of the preconditioner, a decision tree is established to choose the optimal kernel of computing it. Experimental results validate the effectiveness of GSPAI-Opt.


Performance characterization and optimization of pruning patterns for sparse DNN inference

Yunjie Liu, Jingwei Sun, Jiaqiang Liu, Guangzhong Sun


Abstract

Deep neural networks are suffering from over parameterized high storage and high consumption problems. Pruning can effectively reduce storage and computation costs of deep neural networks by eliminating their redundant parameters. In existing pruning methods, filter pruning achieves more efficient inference, while element-wise pruning maintains better accuracy. To make a trade-off between the two endpoints, a variety of pruning patterns has been proposed. This study analyzes the performance characteristics of sparse DNNs pruned by different patterns, including element-wise, vector-wise, block-wise, and group-wise. Based on the analysis, we propose an efficient implementation of group-wise sparse DNN inference, which can make better use of GPUs. Experimental results on VGG, ResNet, BERT and ViT show that our optimized group-wise pruning pattern achieves much lower inference latency on GPU than other sparse patterns and the existing group-wise pattern implementation.


IoTBench: A data centrical and configurable IoT benchmark suite

Simin Chen, Chunjie Luo, Wanling Gao, Lei Wang


Abstract

As the Internet of Things (IoT) industry expands, the demand for microprocessors and microcontrollers used in IoT systems has increased steadily. Benchmarks provide a valuable reference for processor evaluation. Different IoT application scenarios face different data scales, dimensions, and types. However, the current popular benchmarks only evaluate the processor’s performance under fixed data formats. These benchmarks cannot adapt to the fragmented scenarios faced by processors. This paper proposes a new benchmark, namely IoTBench. The IoTBench workloads cover three types of algorithms commonly used in IoT applications: matrix processing, list operation, and convolution. Moreover, IoTBench divides the data space into different evaluation subspaces according to the data scales, data types, and data dimensions. We analyze the impact of different data types, data dimensions, and data scales on processor performance and compare ARM with RISC-V and MinorCPU with O3CPU using IoTBench. We also explored the performance of processors with different architecture configurations in different evaluation subspaces and found the optimal architecture of different evaluation subspaces. The specifications, source code, and results are publicly available from https://www.benchcouncil.org/iotbench/ .


Diagnosis of COVID-19 from X-rays using combined CNN-RNN architecture with transfer learning

Md. Milon Islam, Md. Zabirul Islam, Amanullah Asraf, Mabrook S. Al-Rakhami, ... Ali Hassan Sodhro


Abstract

Combating the COVID-19 pandemic has emerged as one of the most promising issues in global healthcare. Accurate and fast diagnosis of COVID-19 cases is required for the right medical treatment to control this pandemic. Chest radiography imaging techniques are more effective than the reverse-transcription polymerase chain reaction (RT-PCR) method in detecting coronavirus. Due to the limited availability of medical images, transfer learning is better suited to classify patterns in medical images. This paper presents a combined architecture of convolutional neural network (CNN) and recurrent neural network (RNN) to diagnose COVID-19 patients from chest X-rays. The deep transfer techniques used in this experiment are VGG19, DenseNet121, InceptionV3, and Inception-ResNetV2, where CNN is used to extract complex features from samples and classify them using RNN. In our experiments, the VGG19-RNN architecture outperformed all other networks in terms of accuracy. Finally, decision-making regions of images were visualized using gradient-weighted class activation mapping (Grad-CAM). The system achieved promising results compared to other existing systems and might be validated in the future when more samples would be available. The experiment demonstrated a good alternative method to diagnose COVID-19 for medical staff.

All the data used during the study are openly available from the Mendeley data repository at https://data.mendeley.com/datasets/mxc6vb7svm . For further research, we have made the source code publicly available at https://github.com/Asraf047/COVID19-CNN-RNN .


Enabling Reduced Simpoint Size Through LiveCache and Detail Warmup

Jose Renau, Fangping Liu, Hongzhang Shan, Sang Wook Stephen Do


Abstract

Simpoint technology (Sherwood et al., 2002) has been widely used by modern micro-architecture research community to significantly speedup the simulation time. However, the typical Simpoint size remains to be tens to hundreds of million instructions. At such sizes, the cycle-accurate simulators still need to run tens of hours or even days to finish the simulation, depending on the architecture complexity and workload characteristics. In this paper, we developed a new simulation framework by integrating LiveCache and Detail-warmups with Dromajo ( https://chipyard.readthedocs.io/en/latest/Tools/Dromajo.html) and Kabylkas et al. (2005), enabling us to use much smaller Simpoint size (2 million instructions) without loss of accuracy. Our evaluation results showed that the average simulation time can be accelerated by 9.56 times over 50M size and most of the workload simulations can be finished in tens of minutes instead of hours.


Edge AIBench 2.0: A scalable autonomous vehicle benchmark for IoT–Edge–Cloud systems

Tianshu Hao, Wanling Gao, Chuanxin Lan, Fei Tang, ... Jianfeng Zhan


Abstract

Many emerging IoT–Edge–Cloud computing systems are not yet implemented or are too confidential to share the code or even tricky to replicate its execution environment, and hence their benchmarking is very challenging. This paper uses autonomous vehicles as a typical scenario to build the first benchmark for IoT–Edge–Cloud systems. We propose a set of distilling rules for replicating autonomous vehicle scenarios to extract critical tasks with intertwined interactions. The essential system-level and component-level characteristics are captured while the system complexity is reduced significantly so that users can quickly evaluate and pinpoint the system and component bottlenecks. Also, we implement a scalable architecture through which users can assess the systems with different sizes of workloads.

We conduct several experiments to measure the performance. After testing two thousand autonomous vehicle task requests, we identify the bottleneck modules in autonomous vehicle scenarios and analyze their hotspot functions. The experiment results show that the lane-keeping task is the slowest execution module, with a tail latency of 77.49 ms for the 99th percentile latency. We hope this scenario benchmark will be helpful for Autonomous Vehicles and even IoT–edge–Cloud research. Now the open-source code is available from the official website https://www.benchcouncil.org/scenariobench/edgeaibench.html .


Review Articles


An era of ChatGPT as a significant futuristic support tool: A study on features, abilities, and challenges

Mohd Javaid, Abid Haleem, Ravi Pratap Singh, Shahbaz Khan, Rajiv Suman


Abstract

The Internet of Behaviour (IoB) is an effort to dissect behavioural patterns as explained by data collection. IoB is an extension of the Internet of Things (IoT). Therefore, both are anticipated to experience exponential growth in the upcoming years. Healthcare firms have many opportunities to employ IoB to provide individualised services and anticipate patients’ behaviour. As behaviour and analysis are closely related to psychology, many techniques exist to collect relevant data. The IoB improves the doctor’s and patient’s experience. As IoT and IoB are interconnected, IoB technology collects and analyses data depending on user activity. These offer a practical technique for developing real-time remote health monitoring systems. This technology aids in the optimisation of auto insurance premiums in the healthcare sector. It tries to alter patient behaviour in order to improve the treatment process. IoB has applications in various areas, including retail and entertainment, and has the potential to change the marketing sector significantly. This technology is helpful for the appropriate analysis and comprehension of behavioural data used for creating valuable services for treatment. The primary purpose of this paper is to study IoB and its need for healthcare. The working process structure and features of IoB for the healthcare domain are studied. This paper further identifies and analyses the significant applications of IoB for healthcare. In the future, IoB technologies will give us a higher quality of life and well-being. IoB is the ideal fusion of technology, data analytics, and behavioural science. This will help healthcare professionals collect data and analyse the patient’s behaviours for an efficient treatment process. The IoB will be the digital ecosystem’s intelligence in a few years.


Errata


Erratum regarding Previously Published Articles