Mustafa Arif

HPC | Cloud | DevOps | AI

European Bioinformatics Institute

Hello!

I am Mustafa - HPC and Cloud expert with focus on architecting and managing Next-Gen HPC systems for Artifical Intelligence and Scientific computing applications. My core expertise include providing consultation to institutions and research centers to implement on-perm and cloud based High Performance Computing systems.

Focus Areas

Architecting and managing Research Computing Infrastructure.
Implementing on-perm and cloud infrastructure for Big Data and AI applications.

Tech Stack

Linux Based HPC systems
- Cray, HPE, DELL
Cloud HPC
- ParallelCluster
DevOps
- Ansible, Terraform
Nvidia GPU Clusters
Parallel file systems
- Lustre, CEPH
Containerized Infrastructure
- Docker, Kubernetes, Singularity
Network Management
- Infiniband, Layer 3
Datacenter Operations
- Planning and Monitoring

Interests

High Performance Computing
Public/Private Cloud
DevOps
Data Science
Internet of Things
3D Printing

Education

MS in Computer Engineering, 2016

National University of Science and Technology
BS in Computer Engineering, 2011

Comsats University

Experience

HPC Team Lead

European Bioinformatics Institute

Jun 2023 – Present Cambridge, UK

Responsibilities include:

Leading HPC Team to deliver HPC Compute services for over 800 research users.
Managing multiple HPC (DELL) and GPU (A100s) Clusters with SLURM Job scheduler, IB network and Vast storage for scratch space.
Implementing in-band and out-of-band infrastructure monitoring solutions (Checkmk) to ensure the reliability, performance, and security of EMBL-EBI HPC systems.
Deploying reporting interfaces (XDMoD) for SLURM Job scheduler to get visibility into infrastructure usage and resource utilisation.
Optimising provisioning methods by introducing stateless provisioning using Warewulf.
Enabling remote visualization to minimize data transfers and enable users to view results in close proximity to the computing and storage resources.

Systems Manager

KTH Royal Institute of Technology

Jan 2022 – Jun 2023 Sweden

Responsibilities include:

Facilitating HPC (Storage and Compute) Infrastructure Operations.
Working in close collaboration with various stakeholders to facilitate end users computation and data pipelines.
Budgeting and procuring hardware and software for HPC Center needs.
Planning, Procuring and Deployment of on-perm OpenStack cluster for HPC users to facilitate Data Simulation and ML pipelines.
Managing projects related to HPC infrastructure expansion.
Consulting users on Scientific code development

Senior IT Consultant

Texas AM University at Qatar

Mar 2014 – Jan 2022 Qatar

Responsibilities include:

Lead multiple projects related to HPC systems Acquisition, Operations and Data platforms development.
Proactively planning and defining strategy for future scientific computing growth in the organization.
Successfully administered Linux based servers which includes; Cray XC 40 HPC system, Bull HPC Cluster, GPU cluster, Hadoop Cluster and Bare metal servers.
Installation and compilation of scientific packages on the system.
I have managed multiple high performance storage systems (Lustre, Panasas) while making sure that OSTs are healthy and user disk quotas are optimum. Also,implemented backup strategy for disaster recovery by performing on-site and online encrypted backups in cloud.
Consulting scientific computing users in code development from different domains e.g. Data science, Image Processing, Astronomy, Fluid Codes, Multi-Engineering domains and Bioinformatics.
Promoted use of HPC and GPU computing among user community by organizing training and identifying use cases.
Consulting users on their research projects and assisting them in parallel programming to port their workloads on HPC system.
Containerized multiple HPC applications for research users so to allow them to run computation with more control on software stack. Thus allowing users to quickly deploy their scientific applications on the system and perform cross system migrations with much ease.
Delivering training on scientific software and packages. Also, organizing external training where required to make sure that scientific community training requirements are fulfilled. This has a great positive impact in boarding new users on the system and optimizing workloads of existing users.
Preparing technical guides of HPC system and making them available on internal wiki page for easy accessibility. https://rc-docs.qatar.tamu.edu/.

Certifications

Redhat Certified Engineer

Red Hat Feb 2022 – Feb 2025

A Red Hat® Certified Engineer (RHCE®) is a Red Hat Certified System Administrator (RHCSA) who is ready to automate Red Hat® Enterprise Linux® tasks, integrate Red Hat emerging technologies, and apply automation for efficiency and innovation.

See certificate

Redhat Certified System Administrator

Redhat Feb 2022 – Feb 2025

See certificate

Introduction to AI in the Datacenter

Nvidia Aug 2020

Introduction to AI in the Data Center” course, which covers an introduction to AI, GPU computing topics, the NVIDIA AI software architecture, and server, rack and data center level considerations for deploying accelerated computing in the data center.

See certificate