19.12.2020 AWS article about NICE DCV with ParallelCluster (walk-through)

December 19, 2020 AWS, Cloud, DCV, HPC

AWS has created an overview of NICE DCV features and application areas in different markets: Remote visualization in HPC using NICE DCV with ParallelCluster.

Here is an example image from a molecular dynamics (MD) DCV session described in the article:

Here the related video:

The article also shows an AWS ParallelCluster config file which we can use to demonstrate a ParallelCluster creation to test DCV running in the browser as well as the SLURM cluster including automatic cloud compute server allocation which has been created by ParallelCluster (the steps below assume that you have installed the aws-cli and setup your credentials for AWS earlier):

##################################################################
# Test DCV and SLURM in an AWS ParallelCluster setup
##################################################################
pip3 install aws-parallelcluster --upgrade --user
pcluster version
> 2.10.0
# check for a few parameter we need to configure in the ParallelCluster settings below 
# check for default VPC id, Subnets and Keyname which you can copy into the ParallelCluster configuration
aws ec2 describe-vpcs | grep VpcId
aws ec2 describe-subnets | egrep "AvailabilityZone.:|SubnetId"
aws ec2 describe-key-pairs | grep KeyName
# create config file - replace with your values listed above
cat > pc.config << EOF
[global]
cluster_template = hpc
update_check = true
sanity_check = true

[aws]
aws_region_name = us-east-1

[aliases]
ssh = ssh {CFN_USER}@{MASTER_IP} {ARGS}

[vpc public-private]
vpc_id = vpc-xxxx
master_subnet_id = subnet-xxxx
compute_subnet_id = subnet-xxxx

[cluster hpc]
key_name = your-keyname
base_os = ubuntu1804
scheduler = slurm
master_instance_type = g4dn.xlarge
vpc_settings = public-private
# fsx_settings = fsx-scratch2  # we don't use FSX in this test
dcv_settings = dcv
queue_settings = compute

[queue compute]
enable_efa = true
placement_group = DYNAMIC
compute_resource_settings = default

[compute_resource default]
instance_type = c5n.18xlarge
max_count = 64

[fsx fsx-scratch2]
shared_dir = /lustre
# fsx_fs_id = fs-xxxxxxx   # we don't use FSX in this test

[dcv dcv]
enable = master
port = 8443
access_from = 0.0.0.0/0
EOF
# end of config file

#################################################################
# Create the ParallelCluster infrastructure
#################################################################
pcluster create DCV-Cluster -c pc.config
# We can check the cloudformation stack creation with 
aws cloudformation describe-stacks --stack-name parallelcluster-DCV-Cluster
# When finished we should see our new cluster in the output of 
pcluster list
> DCV-Cluster  CREATE_COMPLETE  2.10.0
# Display the cluster status
pcluster status DCV-Cluster
> Status: CREATE_COMPLETE
> MasterServer: RUNNING
> MasterPublicIP: 99.81.73.140
> ClusterUser: ubuntu
> MasterPrivateIP: 172.31.10.65
> ComputeFleetStatus: RUNNING
# List cluster instances with:
pcluster instances DCV-Cluster
> MasterServer         i-0562a34d623877c9e
# Login to the head node via pcluster ssh command
pcluster ssh DCV-Cluster -i ~/.ssh/YOUR_KEY.pem

#################################################################
# Connect to DCV via your browser. We set "-s" to show us the URL which we 
# copy and paste into the browser:
#################################################################
pcluster dcv connect -k ~/.ssh/YOUR_KEY.pem -s DCV-Cluster
> Please use the following one-time URL in your browser within 30 seconds:
https://99.81.73.137:8443?authToken=NqZXqNan__PPa9RYptPZxIKeFGpsWvTh9I9iNh22_OR3aVaVrkfZE5tIupoqHRnqO699MShswudLXV2VwTQqyAhjPPazdRvGOEh8dercL89qTLu5L6HawXaHzJoN7apst7XA6KJSpSqJFwmyFGW6Tq8FZehuzP9pOuOpkxXOzZYK1yrObgn5bML0BB4aso5cbOQP5VBqdEXUaOMPoVTqmwXKgW5Oit7HHUPxUGNBTzPRqlnoJidpRmo754wICoO-#zBjEUS4jb2ORjadzVBje
# Your browser should connect to the DCV session on your ParallelCluster master node 

#################################################################
# Let's test SLURM inside ParallelCluster 
#################################################################
# Inside the DCV session open a terminal or connect via ssh 
sinfo
> PARTITION AVAIL  TIMELIMIT  NODES  STATE NODELIST
> compute*     up   infinite     64  idle~ compute-dy-c5n18xlarge-[1-64]
#
# running a job automatically allocates a new compute node
srun  --msg-timeout=45 --pty hostname
> compute-dy-c5n18xlarge-1
sinfo
> PARTITION AVAIL  TIMELIMIT  NODES  STATE NODELIST
> compute*     up   infinite     63  idle~ compute-dy-c5n18xlarge-[2-64]
> compute*     up   infinite      1    mix compute-dy-c5n18xlarge-1
#
# Let's have a look at the cloud type SLURM configuration for ParallelCluster
cat /opt/slurm/etc/pcluster/slurm_parallelcluster_compute_partition.conf
> NodeName=compute-dy-c5n18xlarge-[1-64] CPUs=72 State=CLOUD Feature=dynamic,c5n.18xlarge,default,efa
> NodeSet=compute_nodes Nodes=compute-dy-c5n18xlarge-[1-64]
> PartitionName=compute Nodes=compute_nodes MaxTime=INFINITE State=UP Default=YES
#
# Shut down the ParallelCluster infrastructure
pcluster delete DCV-Cluster
> Deleting: DCV-Cluster
> Status: EBSCfnStack - DELETE_COMPLETE
> Status: RootRole - DELETE_COMPLETE
> Cluster deleted successfully.
> Checking if there are running compute nodes that require termination...
> Compute fleet cleaned up.

AWS ParallelCluster allows to start a full HPC cluster including interactive full-performant Remote 3D Desktop access enabled by NICE DCV including a dynamic SLURM cluster with one command and the configuration file above.

Read our overview of Pros and Cons of HPC in the Cloud or find more Technical Guides related to Remote 3D and HPC. Any questions just let us know.