Publications

2026
Where Did My NCCL Calls Go? A Profiler Comparison
R. Laso, M. Salimi Beni, I. Vardas, S. Benkner, and S. Hunold
The 27th IEEE International Workshop on Parallel and Distributed Scientific and Engineering Computing (PDSEC), 2026.
To ncclsee, or Not to ncclsee: That is the Profiling Question
R. Laso, M. Salimi Beni, I. Vardas, S. Benkner, and S. Hunold
Austrian-Slovenian HPC Meeting (ASHPC26), Abstract, 2026.
Simulating MPI Collectives on Tofino Smart Switches in SimGrid
A. M. S. Belbeisi, M. Salimi Beni, T. Erbesdobler, E. Saleh, M. Tovey, A. Raoofy, and J. Weidendorfer
23rd ACM International Conference on Computing Frontiers (CF'26), 2026.
2025
Phase-Based Frequency Scaling for Energy-Efficient Heterogeneous Computing
L. Carpentieri, A. De Caro, M. Salimi Beni, K. Fan, and B. Cosenza
IEEE International Parallel and Distributed Processing Symposium (IPDPS), 2025. Acc rate = 24%
Exploring NCCL tuning strategies for distributed deep learning
M. Salimi Beni, R. Laso, B. Cosenza, S. Benkner, and S. Hunold
IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPS-W), 2025.
Optimizing Distributed Deep Learning Training by Tuning NCCL
M. Salimi Beni, R. Laso, B. Cosenza, S. Benkner, and S. Hunold
Proc. Austrian-Slovenian HPC Meeting (ASHPC25), Abstract, 2025.
ncclsee: A Lightweight Profiling Tool for NCCL
I. Vardas, R. Laso Rodriguez, and M. Salimi Beni
Proc. Austrian-Slovenian HPC Meeting (ASHPC25), Abstract, 2025.
2024
MPI Collective Algorithm Selection in the Presence of Process Arrival Patterns
M. Salimi Beni, B. Cosenza, and S. Hunold
IEEE International Conference on Cluster Computing (CLUSTER), 2024. Acc rate = 26%
Analysis and prediction of performance variability in large-scale computing systems
M. Salimi Beni, S. Hunold, and B. Cosenza
The Journal of Supercomputing, 2024.
2023
Algorithm Selection of MPI Collectives Considering System Utilization
M. Salimi Beni, S. Hunold, and B. Cosenza
Euro-Par 2023: Parallel Processing Workshops, Springer, 2023.
EMPI: Enhanced Message Passing Interface in Modern C++
M. Salimi Beni, L. Crisci, and B. Cosenza
IEEE/ACM 23rd International Symposium on Cluster, Cloud and Internet Computing (CCGrid), IEEE, 2023. Acc rate = 20%
2022
🏆 BEST PAPER AWARD
An analysis of long-tailed network latency distribution and background traffic on dragonfly+
M. Salimi Beni and B. Cosenza
The 14th BenchCouncil International Symposium On Benchmarking, Measuring And Optimizing (Bench), LNCS, 2022.
Towards a Portable Drug Discovery Pipeline with SYCL 2020
L. Crisci, M. Salimi Beni, B. Cosenza, N. Scipione, D. Gadioli, E. Vitali, G. Palermo, A. Beccari
International Workshop on OpenCL, 2022.
An analysis of performance variability on dragonfly+ topology
M. Salimi Beni and B. Cosenza
IEEE International Conference on Cluster Computing (CLUSTER), IEEE, 2022.
2021
Ignite-GPU: A GPU-enabled in-memory computing architecture on clusters
A. H. Sojoodi, M. Salimi Beni, and F. Khunjush
The Journal of Supercomputing, 2021.
2020
A GPU-enabled extension for Apache Ignite to facilitate running genetic algorithms
M. Salimi Beni, A. H. Sojoodi, and F. Khunjush
20th International Symposium on Computer Architecture and Digital Systems (CADS), IEEE, 2020.