Meta’s Grand Teton brings NVIDIA Hopper to its data centers

0

Meta today announced its next-generation artificial intelligence platform, Grand Teton, including NVIDIA’s design collaboration.

Compared to the company’s previous generation Zion EX platform, the Grand Teton system packs more memory, network bandwidth and compute capacity, said Alexis Bjorlin, vice president of Meta Infrastructure Hardware, during of the OCP World Summit 2022, an Open Compute Project conference.

AI models are widely used on Facebook for services such as News Feed, content recommendations, and hate speech identification, among many other applications.

“We are thrilled to introduce this new member of the family here at the top,” said Bjorlin, adding his thanks to NVIDIA for their deep collaboration on the design of Grand Teton and their continued support of OCP.

Designed for data center scale

Named after the 13,000 foot mountain that crowns one of Wyoming’s two national parks, Grand Teton uses NVIDIA H100 Tensor Core GPU to train and run AI models that grow rapidly in size and capabilities, requiring more computation.

The NVIDIA Hopper Architectureon which the H100 is based, includes a Transformer Engine to speed up work on these neural networks, which are often referred to as base models because they can address a growing set of applications ranging from natural language processing to healthcare, to robotics and more.

The NVIDIA H100 is designed for performance as well as power efficiency. H100 accelerated servers, when connected to the NVIDIA network across thousands of servers in large-scale data centers, can be 300 times more energy efficient than CPU-only servers.

“NVIDIA Hopper GPUs are designed to solve the world’s tough challenges, delivering accelerated computing with greater power efficiency and improved performance, while scaling up and lowering costs,” said Ian Buck, VP President of Hyperscale and High Performance Computing at NVIDIA. “With Meta sharing the H100-powered Grand Teton platform, system builders around the world will soon have access to an open design for large-scale data center compute infrastructure to supercharge AI across all sectors.”

mountain of a machine

Grand Teton sports 2x the network bandwidth and 4x the bandwidth between host CPUs and GPU accelerators compared to Meta’s previous Zion system, Meta said.

The added network bandwidth allows Meta to create larger clusters of systems to train AI models, Bjorlin said. It also packs more memory than Zion to store and run larger AI models.

Simplified deployment, increased reliability

Consolidating all of this functionality into a single integrated server “dramatically simplifies systems deployment, allowing us to install and provision our fleet much faster and increasing reliability,” Bjorlin said.

Share.

About Author

Comments are closed.