1

I am a scientist working in a high performance computing project(the computational programming aspect) and I have no idea of networking or networking hardware required to answer my question. A professor friend of mine who works in HPC gave me the following configuration - Buy two AMD Ryzen 16 core processors and connect them through an infiniband interconnect. We are planning to buy 2 desktops that host those AMD Ryzen 16 core processors. That we hope will give us the same speed as a 32 core processor. If the question is raised as to why not buy a single 32 core processor the answer is the prevailing market situation has rendered the 32 or 64 core processor unavailable(World wide present chip shortage)

My question is whether the above architecture allows one to add a 64 core processor in the future or whether all the nodes in this cluster have to have the same number of cores ?

We are planning to run high performance weather forecast simulations as shown here - HPC for weather forecasting and this software is designed for use across multiple nodes. It has been parallelized.

  • As far as I know, Infiniband is mainly used for networking *between* systems (much like Ethernet) and doesn't automatically create a single dual-CPU machine; you still have two separate 16-core machines that just exchange data very fast. Has your friend mentioned anything about exactly what kind of cluster technology they plan on using that'd make use of the interconnect and what requirements it has? (And unrelated to that, are there any _existing_ HPC clusters that you can borrow time on, instead of setting up your own?) – u1686_grawity Mar 28 '22 at 14:56
  • @user1686 No. This department is new and they have no existing HPC clusters whatsoever. My Professor friend lives on a different continent. I would be ok not having a single dual CPU machine as long tas hey exchange data very fast using a infiniband. –  Mar 28 '22 at 15:01
  • If the software is designed for use spread across multiple nodes, e.g, using something like MPICH instead of MP, and if the processes depend on low latency data transfers between nodes, then infiniband may make sense. The architecture is expandable if you purchase a IB switch but you are looking at pricey hardware at that point. – doneal24 Mar 28 '22 at 15:27
  • @doneal24 Yes WRF is designed for use spread across multiple nodes. Not sure about MPICH though. I have read about people using OpenMPI. –  Mar 28 '22 at 15:35
  • Do not expect the speed to scale linearly with the number of systems. IB is low latency but still much much slower than in-node memory. In many cases, 2 16-core systems may wind up being somewhere around 1.5x faster than a single 16-core system. Depends on the interconnect (IB vs. tcp) and on the application's design. – doneal24 Mar 28 '22 at 15:41
  • @doneal24 Yes thanks. The ideal solution would be to use the 64 core processor but the earliest availability for that is June 2023 and beyond. I have to make an investment that does allow me to add that processor in the future. I do not want the current investment to be rendered useless when that bigger processor is available in the future. –  Mar 28 '22 at 15:48

0 Answers0