IaaS GPU Cluster - It's Alive!!

Dec 31, 2022

Yes, we're finally there! After a number of years of potential, much interest but not commitment we're finally now able to bring a GPU offering to our IaaS platform 🎉!  It's certainly been a long time coming and while we've been through a couple of different iterations and proof of concept implementations, it's finally at a point where we can officially introduce it as a production offering!

In the immortal works of Victor Frankenstein (or Gene Wilder if you prefer 😁) “It's alive, it's ALIVE…!”

Ok, so what is it that we have here?? Broadly speaking our current build looks like this: a pair of compute hosts each with 2 x Nvidia A40 GPU's running VMware vSphere 7 for the virtualisation layer (you probably guessed this bit already 😜)

That gives us 48GB GDDR6 memory with ECC, 696GB/s memory bandwidth, and 10,752 CUDA cores per GPU. These hosts will be starting with 2 x A40 GPU's each and can accommodate another 2 GPU's each should the demand require it..........I may have ranted a little on the specs in an earlier blog which you can find here if you missed it. This offering is initially only available on our Christchurch IaaS platform but hopefully (with sufficient interest of course) we'll be able to bring it northwards to our  presence in the CDC Silverdale datacentre facility in Auckland. Further to this the plan would be to simply add nodes, validate new workloads and grow capacity as required.

The thing is, this particular GPU is good at accelerating a range of workloads in the DC which makes it a good choice for both enterprise businesses and IaaS providers as well! Now, given its topical atm and because everyone loves a a bit of openai.com, I decided to ask ChatGPT what the best use case was for the Nvidia A40 GPU, here's the response:

Not a particularly creative or cool use of ChatGPT (there are a heap out there!!!) but interesting..... And while that's reasonably broad it's actually not too far from the mark when you consider our inital two or three workloads:

  1. We'll be hosting a number of shared RDSH (that's Remote Desktop Session Host) virtual machines for "knowledge worker" type workloads. In this scenario productivity apps like Microsoft Office including Outlook, Word, Excel, Teams, most Browsers, and the general responsiveness of the Windows desktop will all  utilise a GPU if detected thus improving the end user experience and performance overall. Obviously video performance is considerably improved with the additional benefit of reduced impact on other users sharing the RDSH vm.
  2. We'll also host (technically still TBC at time of writhing this🤞🏼) virtual machines running ESRI's ArcGIS and MIKE Powered by DHI - again by leveraging the virtual machines GPU for tasks like modelling, rendering maps, performing spatial analyses and visualising data can all of which can be accelerated. With the appropriate configuration and tuning performance levels provided by a virtual machine backed by a GPU can match and even exceed the levels previously reserved for expensive high end workstations. As an added  benefit, by centralising the GPU's in the DC we can extend the effective user base who can leverage them while managing costs.

These use cases require different licenses from Nvidia as they utilise different features but both benefit from an end user performance perspective. We combine these licenses with a memory allotment based on what is require for the GPU (i.e. no one size fits all!!) This configuration comes in the form of a profile, for example: nvidia_a40_12q - this profile allocates 12GB of video memory to the GPU for "professional graphics" type applications and enables full functinoality of the GPU including all CUDA features. Whereas, a nvidia_a40_8a profile allocates 8GB of memory and only enables limited features (i.e. no CUDA). This same type of profile can be used with VDI to allocate 1 - 2GB video memory to single user vm's.  We don't have a specific VDI offering today currently but, its certainly something that we could look at down the track........🤔 Gone are the days of only selective users enjoying the performance benefits of GPU's, with them now tucked away safely in the DC they can be leveraged by everyone.

(insert Oprah meme here - "You get a GPU, and you get a GPU...... everyone gets a GPU!!!")🤣

Lets finish up by asking ChapGPT what the benefits of using GPU's in IaaS are (yes, tee'ing it a little here 😁):

Hopefully something of interest for you in there somewhere, certainly don't hesitate to reach out if you have any questions.