HPC Workload Profiling on AWS: Best Practices

Want to supercharge your HPC workloads on AWS? Here's what you need to know:

HPC workload profiling analyzes resource usage to optimize performance
AWS offers powerful tools for HPC: EC2, EFA, ParallelCluster, FSx for Lustre
Key profiling methods: user-mode sampling, hardware event-based sampling, remote profiling
Best practices include choosing the right instances, improving network performance, and optimizing storage

Quick comparison of AWS HPC services:

Service	Purpose	Key Feature
EC2	Compute power	400+ instance types
EFA	Network speed	Fast node communication
ParallelCluster	Cluster management	Automatic resource setup
FSx for Lustre	File storage	High-speed data access
AWS Batch	Job scheduling	Resource optimization

By profiling your HPC workloads on AWS, you can spot bottlenecks, fine-tune performance, and get more bang for your buck. Let's dive into how to set up, profile, and optimize your HPC tasks on the cloud.

HPC workloads on AWS

HPC workloads aren't your average computing tasks. They're the heavy lifters of the digital world, demanding massive parallel computing, lightning-fast networking, and storage that can keep up.

AWS has stepped up to the plate with a suite of services tailored for these power-hungry workloads:

Amazon EC2: The muscle behind HPC on AWS. With over 400 instance types, you'll find the right fit for your compute-intensive tasks.
Elastic Fabric Adapter (EFA): Think of it as a superhighway for data. It lets thousands of CPUs or GPUs chat at breakneck speeds.
AWS ParallelCluster: Your personal HPC butler. It sets up and scales clusters so you can focus on the work that matters.
Amazon FSx for Lustre: The speed demon of storage. It handles massive datasets like a pro, perfect for tasks that need quick access to tons of data.
AWS Batch: The smart scheduler. It makes sure your jobs get the resources they need, when they need them.

Let's break it down:

EC2 instances like C5n and P3dn are built for speed, delivering up to 100 Gbps of network throughput. That's music to the ears of anyone running fluid dynamics simulations or weather models.

EFA takes networking to the next level. It's the secret sauce that lets deep learning models in TensorFlow or PyTorch communicate at lightning speed.

"EFA is used in fluid dynamics computations, large-scale weather modeling, and deep learning models built with frameworks like TensorFlow and PyTorch." - AWS Documentation

ParallelCluster is your HPC wingman, handling the nitty-gritty of cluster management so you can focus on your work.

FSx for Lustre is like a data firehose, processing massive datasets at hundreds of GB/s. It's a game-changer for tasks like genome analysis or oil and gas simulations.

Finally, AWS Batch keeps everything running smoothly, making sure your jobs get the resources they need without breaking the bank.

With this powerhouse lineup, AWS has positioned itself as a serious player in the HPC arena. Whether you're modeling weather patterns or training the next big AI model, AWS has the tools to get the job done.

Setting up for HPC workload profiling

Let's get your AWS environment ready for HPC workload profiling. It's like prepping a race car - you need the right tools and setup.

AWS setup for profiling

First, configure your AWS services. Set up EC2 instances, storage, and networking for HPC workloads.

For Amazon EC2, pick the right instance type. AWS offers these HPC-optimized families:

Instance Family	Best For	Available Types
Hpc6a	AMD-based compute	hpc6a.48xlarge
Hpc6id	Intel-based with high memory	hpc6id.32xlarge
Hpc7a	Latest AMD processors	hpc7a.12xlarge, hpc7a.24xlarge, hpc7a.48xlarge, hpc7a.96xlarge
Hpc7g	ARM-based with AWS Graviton	hpc7g.4xlarge, hpc7g.8xlarge, hpc7g.16xlarge

Setting up instances for profiling

Now, tweak some kernel parameters for profiling tools access:

For User-Mode sampling:

echo 0 | sudo tee /proc/sys/kernel/yama/ptrace_scope

For Hardware Event-Based sampling:

echo 0 | sudo tee /proc/sys/kernel/perf_event_paranoid

These commands lower security restrictions that might block profiling tools.

Key profiling tools

Time for the profiling tools. Let's use Intel® VTune™ Profiler:

Install VTune Profiler on your instance.
Launch it:

sudo <vtune_install_dir>/vtune_profiler/bin64/vtune-gui

Create a new project and set your analysis settings.

VTune Profiler will collect data and show where your HPC workload spends its time in the "Hotspots by CPU Utilization" view.

Best practices for HPC profiling

Picking the right instance types

Choosing the best EC2 instance for HPC is crucial. Here's what you need to know:

C5 instances: Great for compute-heavy tasks like scientific simulations
R5 instances: Ideal for memory-intensive jobs like big data analysis
I3 instances: Perfect for tasks needing fast storage, thanks to NVMe SSDs

Top HPC instance types:

Instance Type	vCPUs	Memory (GiB)	Network Performance	Best For
c5n.18xlarge	72	192	100 Gbps	Compute-intensive tasks
r5.24xlarge	96	768	25 Gbps	Memory-intensive jobs
i3.16xlarge	64	488	25 Gbps	High I/O operations

Don't oversize. Pick instances that fit your job without wasting resources.

Improving network performance

Fast networking is key for HPC. Here's how to speed things up:

Use cluster placement groups to reduce latency
Turn off Hyper-Threading for floating-point heavy jobs
Choose high network throughput instances (C5n and P3dn offer up to 100 Gbps)
Try Elastic Fabric Adapter (EFA) for apps needing lots of inter-instance communication

Storage best practices

Good storage can make or break your HPC job:

Use Amazon FSx for Lustre for big data sets (hundreds of GB/s throughput)
Pick EBS-optimized instances for fast, dedicated EBS connections
Use instance store volumes for temporary, high-speed storage

Storage options compared:

Storage Type	Best For	Performance
Amazon FSx for Lustre	Large datasets, high throughput	Up to hundreds of GB/s
EBS volumes	Persistent storage	Varies by volume type
Instance store	Temporary, high-performance storage	Very high (directly attached)

Profiling methods

HPC workload profiling on AWS isn't one-size-fits-all. Here are three key methods:

User-mode sampling

This method interrupts processes to collect data. It's like taking snapshots of your app's performance.

Gives you a peek into how your app is performing
Only slows things down by about 5%
Great for finding hotspots and analyzing threading

Intel® VTune™ Profiler uses this to see how apps use processor resources. It rates utilization as Idle, Poor, Ok, or Ideal.

Hardware event-based sampling

This one uses special hardware units to measure how software and hardware play together.

Gives detailed info without slowing things down much
Comes in two flavors:
1. End-to-end measurements
2. Sampling-based measurements

The Likwid tool uses end-to-end measurements. You need to pin your app to specific cores, but it's lightweight and supports all events.

Remote profiling techniques

For HPC on AWS, remote profiling is key. It lets you run the app and the profiler UI on different machines.

You've got options:

Direct connect
SSH access
YourKit Connection Broker

For CUDA apps, try these:

Nsight Systems CLI: nsys profile -o timeline ./myapplication
Nsight CUDA CLI: nv-nsight-cu-cli -o profile ./bin/Release/atomicIncTest

These create files you can analyze on your local machine.

When picking a method, think about what you need and how much slowdown you can handle. Each method gives you different insights into your HPC workload on AWS.

Understanding profiling results

After collecting profiling data for your HPC workload on AWS, it's time to make sense of it. Let's dive into how to read the data and find bottlenecks.

Reading the data

Profiling tools spit out a ton of info. Here's how to tackle it:

Key metrics: Focus on execution time, CPU usage, memory, and I/O.
Visualize: Use tools like Intel VTune Profiler to see data graphically.
Compare runs: Look at different instance types or input sizes.

Check out this finding from MAQAO ONE View on GROMACS:

"Comparing GROMACS on AWS Graviton3 vs Graviton3E showed a 13% speedup."

This kind of insight helps you pick the right instance for your workload.

Finding bottlenecks

Now, let's hunt for performance issues:

Hotspots: Find resource-hungry code sections.
Resource use: Low CPU or memory? That's inefficient.
Communication: For distributed workloads, check network usage.
I/O operations: Slow storage access can kill HPC performance.

MAQAO's GROMACS analysis found:

"The largest assembly loop was 42% faster on Graviton3E than Graviton3."

This points to where the newer instance shines.

Here's a quick way to organize your findings:

Metric	Current	Target	Potential Boost
Execution Time	100s	80s	20%
Memory Usage	80%	60%	25%
Network Throughput	5 GB/s	8 GB/s	60%

Improving HPC workloads with profiling insights

Now that you've analyzed your profiling results, it's time to put that knowledge to work. Here's how to fine-tune your HPC workloads on AWS for better performance.

Adjusting instance settings

Tweaking your instance configurations can lead to big gains:

Right-size your instances: Match instance types and sizes to your workload needs. This cuts costs by eliminating idle instances.
Set up auto-scaling: Use AWS ParallelCluster to scale resources based on demand. This keeps performance high and costs low.
Customize without AMIs: Use custom bootstrap actions to tweak instances. It's faster and easier than creating new AMIs.

"These optimizations can result in significant cost savings when you apply them to large fleets." - Isaac Jordan, Amazon CodeGuru Profiler team

Making code more efficient

Optimizing your code is crucial for getting the most out of AWS infrastructure:

Use CodeGuru Profiler: This tool automatically suggests optimizations. It's saved Amazon millions through its recommendations.
Focus on hotspots: Target the most resource-hungry parts of your code first.
Consider GPU acceleration: For suitable workloads, NVIDIA GPUs can speed up processing while using the same energy.

"After implementing recommendations from CodeGuru Profiler, one application reduced its CPU time spent creating new AWS SDK clients from 18.57% to less than 1%, resulting in improved latency and reduced costs."

Optimization	Benefit
Right-sizing	Lower costs, better resource use
Auto-scaling	Better performance, lower cost
CodeGuru Profiler	Less latency, lower costs
GPU acceleration	Faster processing, same energy

Wrap-up

HPC workload profiling on AWS can supercharge your cloud resources. Here's what you need to know:

1. Pick the right tools

AWS has a toolbox for HPC:

Service	What it's for
EC2	Heavy-duty computing
EFA	Fast networking
ParallelCluster	Easy cluster setup
FSx for Lustre	Speedy storage

2. Optimize your setup

Use Placement Groups for network boost
Skip Hyper-Threading for float-heavy jobs
Go big with c5n.18xlarge for CPU work

3. Keep an eye on things

Watch your workloads and tweak as needed:

CloudWatch for metrics
Switch instance types based on data
Fix code bottlenecks you find

4. Learn from the pros

Others have cracked the HPC code on AWS:

"Researchers want a cloud 'maker-shop' with top-notch tools. We need to get them in quickly and safely." - Luc Betbeder-Matibet, UNSW Research Tech Services Director

HPC Workload Profiling on AWS: Best Practices

HPC workloads on AWS

Setting up for HPC workload profiling

AWS setup for profiling

Setting up instances for profiling

Key profiling tools

Best practices for HPC profiling

Picking the right instance types

Improving network performance

Storage best practices

sbb-itb-6210c22

Profiling methods

User-mode sampling

Hardware event-based sampling

Remote profiling techniques

Understanding profiling results

Reading the data

Finding bottlenecks

Improving HPC workloads with profiling insights

Adjusting instance settings

Making code more efficient

Wrap-up

Related posts

Read more

EventBridge Security Best Practices

AWS Glue Data Quality Best Practices 2024

AWS RDS Blog Insights: Performance Tuning

HPC Workload Profiling on AWS: Best Practices

Related video from YouTube

HPC workloads on AWS

Setting up for HPC workload profiling

AWS setup for profiling

Setting up instances for profiling

Key profiling tools

Best practices for HPC profiling

Picking the right instance types

Improving network performance

Storage best practices

sbb-itb-6210c22

Profiling methods

User-mode sampling

Hardware event-based sampling

Remote profiling techniques

Understanding profiling results

Reading the data

Finding bottlenecks

Improving HPC workloads with profiling insights

Adjusting instance settings

Making code more efficient

Wrap-up

Related posts

Read more

EventBridge Security Best Practices

AWS Glue Data Quality Best Practices 2024

AWS RDS Blog Insights: Performance Tuning

Get in Touch