How the Memory Shortage Is Impacting AI and HPC Projects

Surging demand for high bandwidth memory (HBM) is impacting the cost and availability of DRAM memory chips and NAND storage drives, which is driving up costs for everything from notebook PCs to AI data center deployments. The situation, which stems from a confluence of events that could take years to overcome, is also changing how HPC and AI teams architect their storage clusters.

If you’re in the market for a new laptop computer this year, be prepared to shell out hundreds of dollars extra, as the prices for standard DDR4 and DDR5 RAM kits have increased 200% or more over the past six months. Prices for NVMe drives, which are manufactured in the same facilities as DRAM and HBM, have also increased by more than 100% in some cases. The cost of HBM, which is used in AI clusters built around GPUs from Nvidia and AMD, has also gone up.

There are several underlying causes of the price hikes and shortages. The primary cause is the prioritization of HBM production capacity by Samsung Electronics, SK Hynix, and Micron Technology to satisfy surging demand from AI data centers. Manufacturing HBM requires about 3x the wafer capacity of standard DRAM per gigabyte, according to Micron. The switch to HBM is leaving less clean room capacity available for lower margin consumer products, like DRAM and NVMe drives.

A related cause is limited memory and NVMe production capacity among manufacturers due to previous oversupply issues and a pullback in production capacity in 2022 and 2023. Manufacturers expanded production capacity following the surge in demand for small electronics during the COVID pandemic, but the demand didn’t last, leading the vendors to limit their manufacturing capacity to keep prices from collapsing.

Price for 64GB DDR5 DIMMs (Source: Counterpoint Research)

Another factor is Nvidia‘s decision to switch to LPDDR (low-power double data rate) memory, which has traditionally been used primarily in smartphones and other handheld devices. That switch, which adds demand equivalent to that a major smartphone maker, represents “a seismic shift for the supply chain which can’t easily absorb this scale of demand,” Counterpoint Research Director MS Hwang said last November.

That pullback in memory and NVMe manufacturing capacity, combined with AI-driven demand boom for NBM and a shift to LPDDR memory, is now coming back to haunt the chip manufacturers. While building a memory or NVMe factory is not as expensive or time-consuming as, say, building a chip fabrication plant using the latest process, it nevertheless requires months of planning and years to build.

The shortage has been building for months, and manufacturers are taking steps that will alleviate the shortage. Micron, for instance, broke ground in Idaho on its new ID1 facility three years ago, but it’s not currently slated to come online until the middle of 2027. And it won’t be until 2028 before the ID1 facility meaningfully impacts the NAND supply crunch, according to Micron VP of Marketing, Mobile and Client Business Unit Christopher Moore.

“The entire industry is short,” Moore told tech news outlet Wccftech in a recent article. “This is not a Micron issue, it’s an industry issue, where us and our peers or our competitors are all rushing to service these segments as much as we can, and there’s just not enough supply to go around.”

While the NAND makers are scrambling to build new production lines, manufacturing process constraints are working against them, Moore said. The result is that the DRAM shortage could persist until after the AI boom starts facing. “It’s a really unfortunate situation,” he added. “But I think it’s really important for people to understand we are still servicing the consumer market.”

The shortages are starting to impact HPC and AI customers, said VAST Data VP of GTM Execution Phil Manez.

“The tone of the customer has gone very quickly from super opinionated [on what type of NVMe storage they want] to ‘What can you get me that will work in a decent lead time?’” he said. “We’re seeing aggressive buying and customers calling around asking for massive quantities of capacity.”

Sales NVMe drives, such as this SSD from Micron, are projected to grow from $115 billion globally in 2025 to $405 billion in 2030

The shortage is not causing customers to pull back on their decision to build large AI and HPC clusters, as spending on storage historically has been a fraction of spending on other components, notably processors and GPUs, not to mention skilled people and their salaries, Manez said. However, a shortage of the NAND wavers that become the storage media in NVMe drives is leading customers to consider changing how they’re architecting their storage clusters.

“There are certain drive sizes that are more in demand, less in demand,” Manez told HPCwire. “We’re having conversations with our customers around ‘Originally you were potentially looking at this drive capacity. How do you feel about reformatting your clusters or re-architecting systems for what we can get our hands on?’”

VAST Data software-defined storage architecture is designed to run on NVMe flash storage, specifically QLC (Quad-Level Cell), the most efficient type of flash storage, although it can run on other types of NVMe drives. In some respects, it’s being impacted more than parallel storage vendors who have the capability to run entirely, or in part, on traditional spinning disk, although HDDs have their own supply issues, Manez pointed out.

However, VAST is touting its overall storage efficiency as a competitive advantage during the NAND shortage.

“If you look at our efficiency capabilities, we have the most efficient erasure coding on the planet,” Manez said. “So when you build out a VAST cluster, our erasure coding overhead is under 3% at scale, so that means I’m getting a lot more of my capacity that’s turning from raw to usable. We also have the most advanced data reduction on the planet. So we use a combination of duplication, compression and something called similarity-based data reduction.”

Similarity-based data reduction uses algorithms to find pieces of data that are similar to other pieces. Instead of storing those two pieces of data in their entirety, VAST uses hashing algorithms to calculate the difference between them, and stores that instead. According to Manez, it can compress data by up to 25%. When combined with other forms of data reduction (including compression and classic data deduplication), VAST touts 50% overall reduction in raw data sizes. You can read more about similarity-based data reduction here.

HPC and AI storage provider DDN is also adapting to the NAND shortage. The Chatsworth, California company said the fact that it can use a combination of NVMe, HDDs, and an older type of SSD that’s based on SATA that is slower and offers less capacity.

“AI depends on increasingly fast, scalable, and cost-efficient data infrastructure, and the global NAND shortage makes traditional flash-dependent architectures unsustainable,” Alex Bouzari, CEO and Co-Founder of DDN, said in a December press release. “DDN’s EXAScaler and Infinia platforms allow customers to achieve full GPU performance using any storage media, protecting their budgets, supply chains, and AI roadmaps for years to come.”

Demand for super-dense HBM stacks is outstripping supply (Source: Rambus)

The company says this flexibility to use a variety of drive types helps to reduce risk compared to storage vendors that are dependent on NVMe drives. “With these innovations, organizations can achieve the same or better AI outcomes while reducing high-end SSD requirements by 35–65%, cutting total storage CAPEX by 40–70%, and lowering OPEX by 30–60%,” the company said.

Pure Storage recommends that customers and prospects stay in close communication with their storage providers to determine how the NAND shortage will impact their projects.

“Nobody has a crystal ball, but at the moment, it doesn’t look like NAND pricing is going to be dropping anytime soon,” the company wrote in a recent blog post. “Vendors with disciplined supply chain practices can buffer some volatility, but no one is fully insulated from global trends.”

Pure says customers should map out their expected capacity growth and engage their storage vendors early and often. Buying early will help customers lock-in their orders and secure the capacity they need, Pure says. It also recommends considering other consumption models, such as using storage-as-a-service.

“The best approach for IT leaders isn’t urgency or panic—it’s clarity, planning, and partnering with vendors who have the supply chain resilience to deliver consistently even in volatile environments,” it says.

One vendor that sees a silver lining in the memory and NAND shortage cloud is Hailo, which develops AI accelerator for edge use cases. The company says it has removed the dependency on DRAM entirely with its Hailo-8 and Hailo-8L chips.

“By keeping the full inference pipeline on-chip, Hailo-8/8L eliminate the most expensive and supply-constrained component in the system,” Hailo CTO Avi Baum tells HPCwire via email. “In practical terms, avoiding DRAM can reduce bill of materials by up to $100 per device, while also improving power efficiency and system reliability.”

This article first appeared on HPCwire.

The post How the Memory Shortage Is Impacting AI and HPC Projects appeared first on AIwire.

Source link

What's Hot

Nothing's AirDrop rival is back from the dead, and I gave it a shot: Is this the ultimate file transfer app you'll ever need?

Visa Inc. Launches Intelligent Commerce Connect to Power AI-Driven Payments

iPhone with 200MP telephoto camera may not arrive until 2028

How the Memory Shortage Is Impacting AI and HPC Projects

Tim Cook’s Legacy Is Turning Apple Into a Subscription

Inventor recalls eye imaging breakthrough

Companies Are Racing to Buy GPUs. Many Sit Idle

Mozilla Used Anthropic’s Mythos to Find and Fix 271 Bugs in Firefox

iPhone Pro 13 Rumored to Feature 1 TB of Storage

Oculus Quest X Headset: Discover a Shining New Star

Fujifilm’s 102-Megapixel Camera is the Size of a Typical DSLR

Review: Mi 10 Mobile with Qualcomm Snapdragon 870 Mobile Platform

Comparison of Mobile Phone Providers: 4G Connectivity & Speed

Which LED Lights for Nail Salon Safe? Comparison of Major Brands

Subscribe to Updates

What's Hot

How the Memory Shortage Is Impacting AI and HPC Projects

Related Posts