Want to supercharge your HPC workloads on AWS? Here's what you need to know:
- HPC workload profiling analyzes resource usage to optimize performance
- AWS offers powerful tools for HPC: EC2, EFA, ParallelCluster, FSx for Lustre
- Key profiling methods: user-mode sampling, hardware event-based sampling, remote profiling
- Best practices include choosing the right instances, improving network performance, and optimizing storage
Quick comparison of AWS HPC services:
Service | Purpose | Key Feature |
---|---|---|
EC2 | Compute power | 400+ instance types |
EFA | Network speed | Fast node communication |
ParallelCluster | Cluster management | Automatic resource setup |
FSx for Lustre | File storage | High-speed data access |
AWS Batch | Job scheduling | Resource optimization |
By profiling your HPC workloads on AWS, you can spot bottlenecks, fine-tune performance, and get more bang for your buck. Let's dive into how to set up, profile, and optimize your HPC tasks on the cloud.
Related video from YouTube
HPC workloads on AWS
HPC workloads aren't your average computing tasks. They're the heavy lifters of the digital world, demanding massive parallel computing, lightning-fast networking, and storage that can keep up.
AWS has stepped up to the plate with a suite of services tailored for these power-hungry workloads:
-
Amazon EC2: The muscle behind HPC on AWS. With over 400 instance types, you'll find the right fit for your compute-intensive tasks.
-
Elastic Fabric Adapter (EFA): Think of it as a superhighway for data. It lets thousands of CPUs or GPUs chat at breakneck speeds.
-
AWS ParallelCluster: Your personal HPC butler. It sets up and scales clusters so you can focus on the work that matters.
-
Amazon FSx for Lustre: The speed demon of storage. It handles massive datasets like a pro, perfect for tasks that need quick access to tons of data.
-
AWS Batch: The smart scheduler. It makes sure your jobs get the resources they need, when they need them.
Let's break it down:
EC2 instances like C5n and P3dn are built for speed, delivering up to 100 Gbps of network throughput. That's music to the ears of anyone running fluid dynamics simulations or weather models.
EFA takes networking to the next level. It's the secret sauce that lets deep learning models in TensorFlow or PyTorch communicate at lightning speed.
"EFA is used in fluid dynamics computations, large-scale weather modeling, and deep learning models built with frameworks like TensorFlow and PyTorch." - AWS Documentation
ParallelCluster is your HPC wingman, handling the nitty-gritty of cluster management so you can focus on your work.
FSx for Lustre is like a data firehose, processing massive datasets at hundreds of GB/s. It's a game-changer for tasks like genome analysis or oil and gas simulations.
Finally, AWS Batch keeps everything running smoothly, making sure your jobs get the resources they need without breaking the bank.
With this powerhouse lineup, AWS has positioned itself as a serious player in the HPC arena. Whether you're modeling weather patterns or training the next big AI model, AWS has the tools to get the job done.
Setting up for HPC workload profiling
Let's get your AWS environment ready for HPC workload profiling. It's like prepping a race car - you need the right tools and setup.
AWS setup for profiling
First, configure your AWS services. Set up EC2 instances, storage, and networking for HPC workloads.
For Amazon EC2, pick the right instance type. AWS offers these HPC-optimized families:
Instance Family | Best For | Available Types |
---|---|---|
Hpc6a | AMD-based compute | hpc6a.48xlarge |
Hpc6id | Intel-based with high memory | hpc6id.32xlarge |
Hpc7a | Latest AMD processors | hpc7a.12xlarge, hpc7a.24xlarge, hpc7a.48xlarge, hpc7a.96xlarge |
Hpc7g | ARM-based with AWS Graviton | hpc7g.4xlarge, hpc7g.8xlarge, hpc7g.16xlarge |
Setting up instances for profiling
Now, tweak some kernel parameters for profiling tools access:
For User-Mode sampling:
echo 0 | sudo tee /proc/sys/kernel/yama/ptrace_scope
For Hardware Event-Based sampling:
echo 0 | sudo tee /proc/sys/kernel/perf_event_paranoid
These commands lower security restrictions that might block profiling tools.
Key profiling tools
Time for the profiling tools. Let's use Intel® VTune™ Profiler:
- Install VTune Profiler on your instance.
- Launch it:
sudo <vtune_install_dir>/vtune_profiler/bin64/vtune-gui
- Create a new project and set your analysis settings.
VTune Profiler will collect data and show where your HPC workload spends its time in the "Hotspots by CPU Utilization" view.
Best practices for HPC profiling
Picking the right instance types
Choosing the best EC2 instance for HPC is crucial. Here's what you need to know:
- C5 instances: Great for compute-heavy tasks like scientific simulations
- R5 instances: Ideal for memory-intensive jobs like big data analysis
- I3 instances: Perfect for tasks needing fast storage, thanks to NVMe SSDs
Top HPC instance types:
Instance Type | vCPUs | Memory (GiB) | Network Performance | Best For |
---|---|---|---|---|
c5n.18xlarge | 72 | 192 | 100 Gbps | Compute-intensive tasks |
r5.24xlarge | 96 | 768 | 25 Gbps | Memory-intensive jobs |
i3.16xlarge | 64 | 488 | 25 Gbps | High I/O operations |
Don't oversize. Pick instances that fit your job without wasting resources.
Improving network performance
Fast networking is key for HPC. Here's how to speed things up:
- Use cluster placement groups to reduce latency
- Turn off Hyper-Threading for floating-point heavy jobs
- Choose high network throughput instances (C5n and P3dn offer up to 100 Gbps)
- Try Elastic Fabric Adapter (EFA) for apps needing lots of inter-instance communication
Storage best practices
Good storage can make or break your HPC job:
- Use Amazon FSx for Lustre for big data sets (hundreds of GB/s throughput)
- Pick EBS-optimized instances for fast, dedicated EBS connections
- Use instance store volumes for temporary, high-speed storage
Storage options compared:
Storage Type | Best For | Performance |
---|---|---|
Amazon FSx for Lustre | Large datasets, high throughput | Up to hundreds of GB/s |
EBS volumes | Persistent storage | Varies by volume type |
Instance store | Temporary, high-performance storage | Very high (directly attached) |
sbb-itb-6210c22
Profiling methods
HPC workload profiling on AWS isn't one-size-fits-all. Here are three key methods:
User-mode sampling
This method interrupts processes to collect data. It's like taking snapshots of your app's performance.
- Gives you a peek into how your app is performing
- Only slows things down by about 5%
- Great for finding hotspots and analyzing threading
Intel® VTune™ Profiler uses this to see how apps use processor resources. It rates utilization as Idle, Poor, Ok, or Ideal.
Hardware event-based sampling
This one uses special hardware units to measure how software and hardware play together.
- Gives detailed info without slowing things down much
- Comes in two flavors:
- End-to-end measurements
- Sampling-based measurements
The Likwid tool uses end-to-end measurements. You need to pin your app to specific cores, but it's lightweight and supports all events.
Remote profiling techniques
For HPC on AWS, remote profiling is key. It lets you run the app and the profiler UI on different machines.
You've got options:
- Direct connect
- SSH access
- YourKit Connection Broker
For CUDA apps, try these:
- Nsight Systems CLI:
nsys profile -o timeline ./myapplication
- Nsight CUDA CLI:
nv-nsight-cu-cli -o profile ./bin/Release/atomicIncTest
These create files you can analyze on your local machine.
When picking a method, think about what you need and how much slowdown you can handle. Each method gives you different insights into your HPC workload on AWS.
Understanding profiling results
After collecting profiling data for your HPC workload on AWS, it's time to make sense of it. Let's dive into how to read the data and find bottlenecks.
Reading the data
Profiling tools spit out a ton of info. Here's how to tackle it:
- Key metrics: Focus on execution time, CPU usage, memory, and I/O.
- Visualize: Use tools like Intel VTune Profiler to see data graphically.
- Compare runs: Look at different instance types or input sizes.
Check out this finding from MAQAO ONE View on GROMACS:
"Comparing GROMACS on AWS Graviton3 vs Graviton3E showed a 13% speedup."
This kind of insight helps you pick the right instance for your workload.
Finding bottlenecks
Now, let's hunt for performance issues:
- Hotspots: Find resource-hungry code sections.
- Resource use: Low CPU or memory? That's inefficient.
- Communication: For distributed workloads, check network usage.
- I/O operations: Slow storage access can kill HPC performance.
MAQAO's GROMACS analysis found:
"The largest assembly loop was 42% faster on Graviton3E than Graviton3."
This points to where the newer instance shines.
Here's a quick way to organize your findings:
Metric | Current | Target | Potential Boost |
---|---|---|---|
Execution Time | 100s | 80s | 20% |
Memory Usage | 80% | 60% | 25% |
Network Throughput | 5 GB/s | 8 GB/s | 60% |
Improving HPC workloads with profiling insights
Now that you've analyzed your profiling results, it's time to put that knowledge to work. Here's how to fine-tune your HPC workloads on AWS for better performance.
Adjusting instance settings
Tweaking your instance configurations can lead to big gains:
-
Right-size your instances: Match instance types and sizes to your workload needs. This cuts costs by eliminating idle instances.
-
Set up auto-scaling: Use AWS ParallelCluster to scale resources based on demand. This keeps performance high and costs low.
-
Customize without AMIs: Use custom bootstrap actions to tweak instances. It's faster and easier than creating new AMIs.
"These optimizations can result in significant cost savings when you apply them to large fleets." - Isaac Jordan, Amazon CodeGuru Profiler team
Making code more efficient
Optimizing your code is crucial for getting the most out of AWS infrastructure:
-
Use CodeGuru Profiler: This tool automatically suggests optimizations. It's saved Amazon millions through its recommendations.
-
Focus on hotspots: Target the most resource-hungry parts of your code first.
-
Consider GPU acceleration: For suitable workloads, NVIDIA GPUs can speed up processing while using the same energy.
"After implementing recommendations from CodeGuru Profiler, one application reduced its CPU time spent creating new AWS SDK clients from 18.57% to less than 1%, resulting in improved latency and reduced costs."
Optimization | Benefit |
---|---|
Right-sizing | Lower costs, better resource use |
Auto-scaling | Better performance, lower cost |
CodeGuru Profiler | Less latency, lower costs |
GPU acceleration | Faster processing, same energy |
Wrap-up
HPC workload profiling on AWS can supercharge your cloud resources. Here's what you need to know:
1. Pick the right tools
AWS has a toolbox for HPC:
Service | What it's for |
---|---|
EC2 | Heavy-duty computing |
EFA | Fast networking |
ParallelCluster | Easy cluster setup |
FSx for Lustre | Speedy storage |
2. Optimize your setup
- Use Placement Groups for network boost
- Skip Hyper-Threading for float-heavy jobs
- Go big with c5n.18xlarge for CPU work
3. Keep an eye on things
Watch your workloads and tweak as needed:
- CloudWatch for metrics
- Switch instance types based on data
- Fix code bottlenecks you find
4. Learn from the pros
Others have cracked the HPC code on AWS:
"Researchers want a cloud 'maker-shop' with top-notch tools. We need to get them in quickly and safely." - Luc Betbeder-Matibet, UNSW Research Tech Services Director