Mustafa Arif

Mustafa Arif

HPC | Cloud | DevOps | AI

European Bioinformatics Institute

Hello!

I am Mustafa - HPC and Cloud expert with focus on architecting and managing Next-Gen HPC systems for Artifical Intelligence and Scientific computing applications. My core expertise include providing consultation to institutions and research centers to implement on-perm and cloud based High Performance Computing systems.

Focus Areas

  • Architecting and managing Research Computing Infrastructure.
  • Implementing on-perm and cloud infrastructure for Big Data and AI applications.

Tech Stack

  • Linux Based HPC systems
    • Cray, HPE, DELL
  • Cloud HPC
    • ParallelCluster
  • DevOps
    • Ansible, Terraform
  • Nvidia GPU Clusters
  • Parallel file systems
    • Lustre, CEPH
  • Containerized Infrastructure
    • Docker, Kubernetes, Singularity
  • Network Management
    • Infiniband, Layer 3
  • Datacenter Operations
    • Planning and Monitoring
Interests
  • High Performance Computing
  • Public/Private Cloud
  • DevOps
  • Data Science
  • Internet of Things
  • 3D Printing
Education
  • MS in Computer Engineering, 2016

    National University of Science and Technology

  • BS in Computer Engineering, 2011

    Comsats University

Experience

 
 
 
 
 
European Bioinformatics Institute
HPC Team Lead
Jun 2023 – Present Cambridge, UK

Responsibilities include:

  • Leading HPC Team to deliver HPC Compute services for over 800 research users.
  • Managing multiple HPC (DELL) and GPU (A100s) Clusters with SLURM Job scheduler, IB network and Vast storage for scratch space.
  • Implementing in-band and out-of-band infrastructure monitoring solutions (Checkmk) to ensure the reliability, performance, and security of EMBL-EBI HPC systems.
  • Deploying reporting interfaces (XDMoD) for SLURM Job scheduler to get visibility into infrastructure usage and resource utilisation.
  • Optimising provisioning methods by introducing stateless provisioning using Warewulf.
  • Enabling remote visualization to minimize data transfers and enable users to view results in close proximity to the computing and storage resources.
 
 
 
 
 
KTH Royal Institute of Technology
Systems Manager
Jan 2022 – Jun 2023 Sweden

Responsibilities include:

  • Facilitating HPC (Storage and Compute) Infrastructure Operations.
  • Working in close collaboration with various stakeholders to facilitate end users computation and data pipelines.
  • Budgeting and procuring hardware and software for HPC Center needs.
  • Planning, Procuring and Deployment of on-perm OpenStack cluster for HPC users to facilitate Data Simulation and ML pipelines.
  • Managing projects related to HPC infrastructure expansion.
  • Consulting users on Scientific code development
 
 
 
 
 
Texas AM University at Qatar
Senior IT Consultant
Mar 2014 – Jan 2022 Qatar

Responsibilities include:

  • Lead multiple projects related to HPC systems Acquisition, Operations and Data platforms development.
  • Proactively planning and defining strategy for future scientific computing growth in the organization.
  • Successfully administered Linux based servers which includes; Cray XC 40 HPC system, Bull HPC Cluster, GPU cluster, Hadoop Cluster and Bare metal servers.
  • Installation and compilation of scientific packages on the system.
  • I have managed multiple high performance storage systems (Lustre, Panasas) while making sure that OSTs are healthy and user disk quotas are optimum. Also,implemented backup strategy for disaster recovery by performing on-site and online encrypted backups in cloud.
  • Consulting scientific computing users in code development from different domains e.g. Data science, Image Processing, Astronomy, Fluid Codes, Multi-Engineering domains and Bioinformatics.
  • Promoted use of HPC and GPU computing among user community by organizing training and identifying use cases.
  • Consulting users on their research projects and assisting them in parallel programming to port their workloads on HPC system.
  • Containerized multiple HPC applications for research users so to allow them to run computation with more control on software stack. Thus allowing users to quickly deploy their scientific applications on the system and perform cross system migrations with much ease.
  • Delivering training on scientific software and packages. Also, organizing external training where required to make sure that scientific community training requirements are fulfilled. This has a great positive impact in boarding new users on the system and optimizing workloads of existing users.
  • Preparing technical guides of HPC system and making them available on internal wiki page for easy accessibility. https://rc-docs.qatar.tamu.edu/.

Certifications

Redhat Certified Engineer
A Red Hat® Certified Engineer (RHCE®) is a Red Hat Certified System Administrator (RHCSA) who is ready to automate Red Hat® Enterprise Linux® tasks, integrate Red Hat emerging technologies, and apply automation for efficiency and innovation.
See certificate
Introduction to AI in the Datacenter
Introduction to AI in the Data Center” course, which covers an introduction to AI, GPU computing topics, the NVIDIA AI software architecture, and server, rack and data center level considerations for deploying accelerated computing in the data center.
See certificate

Articles

Courses­Conferences

Architecting on AWS Accelerator (2020)
A comprehensive course on making architectural decisions based on AWS architectural principles and best practices.

Contact