Explained: What is a supercomputer and where is it used?

Facebook’s parent company Meta today announced that it has built what it believes is among the fastest supercomputers in the world right now. Furthermore, the company says that it’s supercomputer dubbed as AI Research SuperCluster (RSC), will be ‘the fastest in the world’ once fully built out in mid-2022.

In its core, RSC has a total of 6,080 NVIDIA A100 GPUs, which make up a total of 760 NVIDIA DGX A100 systems. Meta says that these GPUs are more powerful than the V100 GPUs that it uses in its existing systems. Each of these DGX (NVIDIA’s line of servers and workstations which uses GPUs for deep learning applications) communicates via an NVIDIA Quantum 1600 Gb/s InfiniBand two-level Clos fabric. And for storage RSC has 175 petabytes of Pure Storage FlashArray, 46 petabytes of cache storage in Penguin Computing Altus systems, and 10 petabytes of Pure Storage FlashBlade.

When RSC is complete, the InfiniBand network fabric will connect 16,000 GPUs as endpoints (as against the 6,080 GPUs deployed right now), which the company says will make it one of the largest such networks deployed to date. Additionally, the company says that it is planning to upgrade the caching and storage system to handle one exabyte of data per second as against the existing 16 TB/s.

But why build such a machine?

Boasting rights aside, Meta has built the RCS for two specific purposes. First, of course, is building the metaverse. “Our long-term investments in self-supervised learning and in building next-generation AI infrastructure with RSC are helping us create the foundational technologies that will power the metaverse and advance the broader AI community as well,” Meta wrote in a blog post.

In addition to helping develop future technologies, RCS will help the company tackle some of its existing issues – misinformation and harmful content.

As of now, Meta uses a combination of human supervision and AI models to identify misinformation and harmful content on its platforms, which includes Facebook, Instagram and WhatsApp among others. But clearly that’s not enough. Misleading content containing harmful information keeps slipping on its platforms despite the oversight. To tackle this, Meta wants to develop AI models that can take into account millions of parameters to analyse content, which includes a combination of text, images, graphics, video and voice, and flag or remove any discourse that it deems dangerous. Meta’s existing systems can’t handle such complex models in a way that they are able to deliver results in record time, right now. That is where RCS comes into picture.

RSC will help Meta’s AI researchers build better AI models that can learn from trillions of examples that use multimodal signals to determine whether an action, sound or image is harmful or benign. “This research will…help keep people safe on our services today,” the company added.

Which brings us to the most important question – what is a supercomputer?

In simplest words, a supercomputer is a computer that can perform millions of tasks within seconds. It can not only handle millions of queries at a time but it can also solve complex problems, which would otherwise take a general purpose computer years to solve within record time depending on the complexity of the problem).

IBM describes supercomputers as the fastest computers in the world that are made up of interconnects, I/O systems, memory and processor cores. The company explains that unlike traditional computers, supercomputers use more than one central processing unit (CPU), that are grouped into compute nodes, comprising a processor or a group of processors and a memory block. A supercomputer can contain tens of thousands of nodes that communicate and collaborate with one another for solving specific problems.

But what about GPUs in Meta’s RCS?

Now-a-days, companies are using GPUs or graphics processing units instead of CPUs for building supercomputers. GPUs are already used in smartphones, monitors and smart TVs and PCs for…well…handling graphics. Now, highly powerful versions of GPUs also called the General Purpose GPUs or GPGPUs are being used for making supercomputers.

There’s a reason for this change.

GPUs use parallel processing for performing any task. This means that GPUs can handle multiple tasks at a time without success or failure of one process affecting the other. This parallel processing nature makes them extremely efficient in handling workloads and faster in performing calculations and assigned tasks. Using GPUs instead of CPUs can reduce the time taken to perform any calculation, which is why they are being used in supercomputers these days.

For comparison, a CPU-based supercomputer called Jaguar was upgraded with NVIDIA’s Tesla K20 GPUs back in 2012. After the upgrade the supercomputer was renamed to Titan. It’s performance went from 2.3 petaflops to over 20 petaflops (the unit in which performance of supercomputers is measured). Titan became roughly ten times faster and five times more energy efficient than Jaguar while fitting inside the same 200 cabinets as its predecessor.

What operating systems do supercomputers run?

Unlike general purpose computers that run on Apple’s macOS or Windows or even Google’s ChromeOS, supercomputers mostly run Linux operating system owing to its open-source nature. Supercomputers are designed for specific purposes and vastly differ in configuration. So, designing and maintaining a proprietary OS for such machines is time consuming and expensive.

On the other hand, Linux is free to use and easy to customise, which is why it is used in case of supercomputers.

What are supercomputers used for?

Supercomputers are most used for research and development purposes in a variety of areas including weather forecasting, space research, testing strength of encryption and even in developing drugs for various diseases. IBM, for instance, is using a supercomputer – Summit – consisting of 16 systems with more than 330 petaflops, 775,000 CPU cores, and 34,000 GPUs for helping researchers understand COVID-19 and find its treatments and potential cures.

Summit, which is one of the most energy efficient supercomputers in the world, will help medical researchers to understand the US’ cancer population and develop anti-cancer drugs. In addition to this, IBM’s Sierra and Summit supercomputers are also being used by researchers to identify patterns in human proteins and cellular systems for developing drugs for diseases such as Alzheimer’s, heart disease and addiction. Sierra is also being used for assessing the performance of nuclear weapon systems.

Piz Daint supercomputer, which is developed by Cray and deployed at Swiss National Supercomputing Centre in Switzerland, is used for a variety of purposes including analysing data collected from the Large Hadron Collider.

The post Explained: What is a supercomputer and where is it used? appeared first on BGR India.