HPC Workload Profiling on AWS: Best Practices

published on 30 September 2024

Want to supercharge your HPC workloads on AWS? Here's what you need to know:

  • HPC workload profiling analyzes resource usage to optimize performance
  • AWS offers powerful tools for HPC: EC2, EFA, ParallelCluster, FSx for Lustre
  • Key profiling methods: user-mode sampling, hardware event-based sampling, remote profiling
  • Best practices include choosing the right instances, improving network performance, and optimizing storage

Quick comparison of AWS HPC services:

Service Purpose Key Feature
EC2 Compute power 400+ instance types
EFA Network speed Fast node communication
ParallelCluster Cluster management Automatic resource setup
FSx for Lustre File storage High-speed data access
AWS Batch Job scheduling Resource optimization

By profiling your HPC workloads on AWS, you can spot bottlenecks, fine-tune performance, and get more bang for your buck. Let's dive into how to set up, profile, and optimize your HPC tasks on the cloud.

HPC workloads on AWS

HPC workloads aren't your average computing tasks. They're the heavy lifters of the digital world, demanding massive parallel computing, lightning-fast networking, and storage that can keep up.

AWS has stepped up to the plate with a suite of services tailored for these power-hungry workloads:

  1. Amazon EC2: The muscle behind HPC on AWS. With over 400 instance types, you'll find the right fit for your compute-intensive tasks.

  2. Elastic Fabric Adapter (EFA): Think of it as a superhighway for data. It lets thousands of CPUs or GPUs chat at breakneck speeds.

  3. AWS ParallelCluster: Your personal HPC butler. It sets up and scales clusters so you can focus on the work that matters.

  4. Amazon FSx for Lustre: The speed demon of storage. It handles massive datasets like a pro, perfect for tasks that need quick access to tons of data.

  5. AWS Batch: The smart scheduler. It makes sure your jobs get the resources they need, when they need them.

Let's break it down:

EC2 instances like C5n and P3dn are built for speed, delivering up to 100 Gbps of network throughput. That's music to the ears of anyone running fluid dynamics simulations or weather models.

EFA takes networking to the next level. It's the secret sauce that lets deep learning models in TensorFlow or PyTorch communicate at lightning speed.

"EFA is used in fluid dynamics computations, large-scale weather modeling, and deep learning models built with frameworks like TensorFlow and PyTorch." - AWS Documentation

ParallelCluster is your HPC wingman, handling the nitty-gritty of cluster management so you can focus on your work.

FSx for Lustre is like a data firehose, processing massive datasets at hundreds of GB/s. It's a game-changer for tasks like genome analysis or oil and gas simulations.

Finally, AWS Batch keeps everything running smoothly, making sure your jobs get the resources they need without breaking the bank.

With this powerhouse lineup, AWS has positioned itself as a serious player in the HPC arena. Whether you're modeling weather patterns or training the next big AI model, AWS has the tools to get the job done.

Setting up for HPC workload profiling

Let's get your AWS environment ready for HPC workload profiling. It's like prepping a race car - you need the right tools and setup.

AWS setup for profiling

First, configure your AWS services. Set up EC2 instances, storage, and networking for HPC workloads.

For Amazon EC2, pick the right instance type. AWS offers these HPC-optimized families:

Instance Family Best For Available Types
Hpc6a AMD-based compute hpc6a.48xlarge
Hpc6id Intel-based with high memory hpc6id.32xlarge
Hpc7a Latest AMD processors hpc7a.12xlarge, hpc7a.24xlarge, hpc7a.48xlarge, hpc7a.96xlarge
Hpc7g ARM-based with AWS Graviton hpc7g.4xlarge, hpc7g.8xlarge, hpc7g.16xlarge

Setting up instances for profiling

Now, tweak some kernel parameters for profiling tools access:

For User-Mode sampling:

echo 0 | sudo tee /proc/sys/kernel/yama/ptrace_scope

For Hardware Event-Based sampling:

echo 0 | sudo tee /proc/sys/kernel/perf_event_paranoid

These commands lower security restrictions that might block profiling tools.

Key profiling tools

Time for the profiling tools. Let's use Intel® VTune™ Profiler:

  1. Install VTune Profiler on your instance.
  2. Launch it:
sudo <vtune_install_dir>/vtune_profiler/bin64/vtune-gui
  1. Create a new project and set your analysis settings.

VTune Profiler will collect data and show where your HPC workload spends its time in the "Hotspots by CPU Utilization" view.

Best practices for HPC profiling

Picking the right instance types

Choosing the best EC2 instance for HPC is crucial. Here's what you need to know:

  • C5 instances: Great for compute-heavy tasks like scientific simulations
  • R5 instances: Ideal for memory-intensive jobs like big data analysis
  • I3 instances: Perfect for tasks needing fast storage, thanks to NVMe SSDs

Top HPC instance types:

Instance Type vCPUs Memory (GiB) Network Performance Best For
c5n.18xlarge 72 192 100 Gbps Compute-intensive tasks
r5.24xlarge 96 768 25 Gbps Memory-intensive jobs
i3.16xlarge 64 488 25 Gbps High I/O operations

Don't oversize. Pick instances that fit your job without wasting resources.

Improving network performance

Fast networking is key for HPC. Here's how to speed things up:

  1. Use cluster placement groups to reduce latency
  2. Turn off Hyper-Threading for floating-point heavy jobs
  3. Choose high network throughput instances (C5n and P3dn offer up to 100 Gbps)
  4. Try Elastic Fabric Adapter (EFA) for apps needing lots of inter-instance communication

Storage best practices

Good storage can make or break your HPC job:

  1. Use Amazon FSx for Lustre for big data sets (hundreds of GB/s throughput)
  2. Pick EBS-optimized instances for fast, dedicated EBS connections
  3. Use instance store volumes for temporary, high-speed storage

Storage options compared:

Storage Type Best For Performance
Amazon FSx for Lustre Large datasets, high throughput Up to hundreds of GB/s
EBS volumes Persistent storage Varies by volume type
Instance store Temporary, high-performance storage Very high (directly attached)
sbb-itb-6210c22

Profiling methods

HPC workload profiling on AWS isn't one-size-fits-all. Here are three key methods:

User-mode sampling

This method interrupts processes to collect data. It's like taking snapshots of your app's performance.

  • Gives you a peek into how your app is performing
  • Only slows things down by about 5%
  • Great for finding hotspots and analyzing threading

Intel® VTune™ Profiler uses this to see how apps use processor resources. It rates utilization as Idle, Poor, Ok, or Ideal.

Hardware event-based sampling

This one uses special hardware units to measure how software and hardware play together.

  • Gives detailed info without slowing things down much
  • Comes in two flavors:
    1. End-to-end measurements
    2. Sampling-based measurements

The Likwid tool uses end-to-end measurements. You need to pin your app to specific cores, but it's lightweight and supports all events.

Remote profiling techniques

For HPC on AWS, remote profiling is key. It lets you run the app and the profiler UI on different machines.

You've got options:

  1. Direct connect
  2. SSH access
  3. YourKit Connection Broker

For CUDA apps, try these:

  • Nsight Systems CLI: nsys profile -o timeline ./myapplication
  • Nsight CUDA CLI: nv-nsight-cu-cli -o profile ./bin/Release/atomicIncTest

These create files you can analyze on your local machine.

When picking a method, think about what you need and how much slowdown you can handle. Each method gives you different insights into your HPC workload on AWS.

Understanding profiling results

After collecting profiling data for your HPC workload on AWS, it's time to make sense of it. Let's dive into how to read the data and find bottlenecks.

Reading the data

Profiling tools spit out a ton of info. Here's how to tackle it:

  1. Key metrics: Focus on execution time, CPU usage, memory, and I/O.
  2. Visualize: Use tools like Intel VTune Profiler to see data graphically.
  3. Compare runs: Look at different instance types or input sizes.

Check out this finding from MAQAO ONE View on GROMACS:

"Comparing GROMACS on AWS Graviton3 vs Graviton3E showed a 13% speedup."

This kind of insight helps you pick the right instance for your workload.

Finding bottlenecks

Now, let's hunt for performance issues:

  1. Hotspots: Find resource-hungry code sections.
  2. Resource use: Low CPU or memory? That's inefficient.
  3. Communication: For distributed workloads, check network usage.
  4. I/O operations: Slow storage access can kill HPC performance.

MAQAO's GROMACS analysis found:

"The largest assembly loop was 42% faster on Graviton3E than Graviton3."

This points to where the newer instance shines.

Here's a quick way to organize your findings:

Metric Current Target Potential Boost
Execution Time 100s 80s 20%
Memory Usage 80% 60% 25%
Network Throughput 5 GB/s 8 GB/s 60%

Improving HPC workloads with profiling insights

Now that you've analyzed your profiling results, it's time to put that knowledge to work. Here's how to fine-tune your HPC workloads on AWS for better performance.

Adjusting instance settings

Tweaking your instance configurations can lead to big gains:

  1. Right-size your instances: Match instance types and sizes to your workload needs. This cuts costs by eliminating idle instances.

  2. Set up auto-scaling: Use AWS ParallelCluster to scale resources based on demand. This keeps performance high and costs low.

  3. Customize without AMIs: Use custom bootstrap actions to tweak instances. It's faster and easier than creating new AMIs.

"These optimizations can result in significant cost savings when you apply them to large fleets." - Isaac Jordan, Amazon CodeGuru Profiler team

Making code more efficient

Optimizing your code is crucial for getting the most out of AWS infrastructure:

  1. Use CodeGuru Profiler: This tool automatically suggests optimizations. It's saved Amazon millions through its recommendations.

  2. Focus on hotspots: Target the most resource-hungry parts of your code first.

  3. Consider GPU acceleration: For suitable workloads, NVIDIA GPUs can speed up processing while using the same energy.

"After implementing recommendations from CodeGuru Profiler, one application reduced its CPU time spent creating new AWS SDK clients from 18.57% to less than 1%, resulting in improved latency and reduced costs."

Optimization Benefit
Right-sizing Lower costs, better resource use
Auto-scaling Better performance, lower cost
CodeGuru Profiler Less latency, lower costs
GPU acceleration Faster processing, same energy

Wrap-up

HPC workload profiling on AWS can supercharge your cloud resources. Here's what you need to know:

1. Pick the right tools

AWS has a toolbox for HPC:

Service What it's for
EC2 Heavy-duty computing
EFA Fast networking
ParallelCluster Easy cluster setup
FSx for Lustre Speedy storage

2. Optimize your setup

  • Use Placement Groups for network boost
  • Skip Hyper-Threading for float-heavy jobs
  • Go big with c5n.18xlarge for CPU work

3. Keep an eye on things

Watch your workloads and tweak as needed:

  • CloudWatch for metrics
  • Switch instance types based on data
  • Fix code bottlenecks you find

4. Learn from the pros

Others have cracked the HPC code on AWS:

"Researchers want a cloud 'maker-shop' with top-notch tools. We need to get them in quickly and safely." - Luc Betbeder-Matibet, UNSW Research Tech Services Director

Related posts

Read more