3D NAND لا يمكن تغيير قوانين الفيزياء / FAV التكنولوجيا المحدودة

3D NAND لا يمكن تغيير قوانين الفيزياء

جديد, مميزة

Gary Hilson

الكاتب

With Optane mothballed and emerging memories still emerging, the gap between 3D NAND flash and DRAM persists. New architectures enabled by Compute Express Link (CXL) may negate the need to fill it, but can flash be optimized to make the gap smaller?

There’s only so much that can be done with the NAND flash itself, while interfaces like Non-Volatile Memory Express (NVMe) and the solid-state drive (SSD) that the flash is enclosed in can help to get more performance and efficiencies that might allow NAND to make gains.

Kioxia is one company that is looking to advance 3D NAND flash so it can gain ground on DRAM. The company’s XL-flash is an extremely low-latency, high-performance flash memory based on Kioxia’s BiCS technology, specifically aimed at addressing the performance gap between existing volatile memories and flash memory.

In an exclusive interview with EE Times, Kioxia America executive VP and CMO Scott Nelson said XL-flash falls into the category of storage-class memory or persistent memory. Emerging memories like magnetoresistive random-access memory (MRAM), resistive random-access memory (ReRAM) and phase-change memory (PCM/PCRAM) are all seen as falling under this umbrella, with the latter being the basis for 3D Xpoint/Intel Optane. The challenge has been that they have not been able to cost-effectively fill the gap, let alone catch up to DRAM.

Nelson said the candidates for filling this storage-layer gap, including Optane, have been too expensive. “Optane wasn’t very scalable,” he said, noting that scalability and price need to intersect to bridge the performance gap between TLC 3D NAND and DRAM.

In the meantime, there has been innovation around 3D NAND, not the least of which is the increasing number of layers.

In late 2020, Micron Technology announced it had leapfrogged others in the industry with its 176-layer 3D NAND flash memory. The company abandoned the floating gate in favor of a charge-trap approach and combined it with its CMOS-under-array (CMA) architecture, which enables Micron to improve performance and density. By spring 2022, Micron had announced that its 232-layer 3D NAND flash would be available in 2023.

Micron’s proprietary CMA technique constructs the multilayered stack over the chip’s logic, packing more memory into a tighter space and shrinking die size, yielding more gigabytes per wafer. (Source: Micron Technology)

Samsung, meanwhile, recently announced it was readying its 300-layer NAND for production—its ninth-generation 3D NAND—employing a double-stack architecture with a projected release sometime next year. The company implemented this technique in 2020 for its seventh-generation, 176-layer 3D NAND chip. Meanwhile, SK Hynix is believed to be using a triple-stack design for its forthcoming 321-layer 3D NAND devices set for mass production in early 2025.

NAND advances go beyond higher layer stack

“Everyone is fixated on the number of layers because it’s an easy way to define generations,” Nelson said. But lateral density is just as important because more layers add cost, he said. “For Kioxia, the number of layers is not important as it is the lateral scaling to minimize the cost.”

That’s where architecture comes into play, Nelson said. Kioxia’s CMOS directly Bonded to Array (CBA) architecture involves the production of a 3D NAND cell array and I/O CMOS on separate wafers, using optimal production nodes. He said this approach maximizes the bit density of the memory array and I/O performance because the CMOS circuitry is separated from the NAND array—each can be optimized on its own merit.

Kioxia’s CBA architecture involves the production of a 3D NAND cell array and I/O CMOS on separate wafers using optimal production nodes, which maximizes the bit density of the memory array and I/O performance because the CMOS circuitry is separated from the NAND array. (Source: Kioxia)

Flash represents an attempt to make NAND higher-performance compared with standard, and the latency is on the order of 10× faster than the latency of regular NAND, Nelson said. “We’re talking 5 to 10 µs as a read latency, compared with 50 µs for standard TLC.”

Kioxia, along with Western Digital, announced its eighth-generation BiCS 3D NAND memory with 218 active layers, which employs the CBA architecture and lateral shrink technology to increase bit density. Nelson said Kioxia has been focused on lateral scalability since its sixth-generation 3D NAND to differentiate its design approach and bring a more cost-effective solution to market.

NAND has always been optimized for cost and has evolved from floating-gate technology because it couldn’t be shrunk any further, Nelson said. And as 3D NAND has matured, SLC and MLC are going by the wayside, with TLC now dominating. “There are multiple versions of TLC cells today,” Nelson said.

This is a lot of work being done with QLC, which is denser than TLC, he added, and even penta-level cell work, which is at the low-cost end. But although QLC SSDs have a high density and cost less, they don’t perform as well, are more error-prone and don’t last as long as SSDs that use more expensive TLC NAND.

Samsung’s version of a low-latency NAND was dubbed vertical NAND (V-NAND), which the company hasn’t been all that vocal about recently, Nelson said.

In an email interview, a Samsung representative said that one of the solutions to extending and advancing its V-NAND would be to create stacks exceeding 1,000 layers, and the company envisions stacking 1,000-plus layers by 2030. “In order to do so, however, we must overcome a number of technological challenges, including etching limitations that come from higher channel holes and cell current control,” the representative said.

The company is also working to enhance structural stability through innovative process technologies, as well as height control, while it adds on more cell layers. “In addition to various efforts at the hardware level, we’re looking into improving our software solutions as well, including I/O control, to maximize overall V-NAND performance,” the representative said.

Samsung’s eighth-generation V-NAND achieved high bit density through a Cell-on-Peri (COP) structure, which the company introduced with the previous generation. (Source: Samsung)

Samsung said that the industry is approaching an inflection point that requires disruptive innovation on multiple fronts if NAND is to meet the needs of future storage solutions.

Different vendors have taken different approaches with 3D NAND, Jim Handy, principal analyst with Objective Analysis, told EE Times in an exclusive interview.

“There are certain things that Micron has done better than Samsung,” he said. “There are a lot of things that Samsung has done better than anybody else, too. Everybody seems to be going off in their own area of specialty.”

Micron’s big advancements use CMA and string stacking to put 32 layers on top of 32 to get 64, Handy said. “Samsung has been trying very hard not to do either one of those technologies. Since that was the direction they chose to take, they got really, really good at making the layers thinner than anybody else knew how.”

AI could circumvent limitations

Ultimately, despite doing different things, the major NAND makers are coalescing around the same process, Handy said.

But despite all the advancements, NAND’s write speed will hold it back from significantly closing the gap with DRAM or reaching Optane performance. It boils down to quantum mechanics, which means flash write speeds clocking in at tens of milliseconds, while DRAM writes in tens of nanoseconds, Handy said. “It’s a million-to-one ratio difference in speed.”

That limitation will keep NAND flash from filling the gap, he said, but for high-read workloads, there are all sorts of tricks that can be done.

One trick might be to use AI to better manage the NAND when it’s in an SSD. Microchip Technology’s flash controllers are embedded with a machine-learning engine to help extend the life of the NAND and improve the bit error rate.

In an exclusive interview, Ranya Daas of Microchip’s data center solutions business unit said using algorithms in the background adds to the overhead because it requires processing power. However, she said, machine learning allows for the NAND cells to be trained to reduce the number of reads and retries to optimize read voltage. “You know exactly which reference voltage to go and read right from the first time.”

Daas said there are opportunities to extend the life of the NAND flash, reduce latency and not add background processing that must be done in real time.

SSD maker Phison Electronics is also exploiting AI to improve how the flash performs inside a drive.

“One thing that you cannot do is overcome the intrinsic latency of flash,” Phison CTO Sebastien Jean said in an exclusive interview with EE Times. “It has the latency structure that it has. In any realistic workload with any realistic amount of data, you can’t possibly cache enough of it to statistically make a difference.”

Phison’s Sebastien Jean

In addition to its fourth-generation LDPC ECC engine, Phison is focused on the pain points that can be improved with AI, Jean said. Its Imagin+ customization and design service includes AI computational models and AI service solutions to help the company’s customers design and engineer custom flash deployments.

Imagin+ works with Phison products optimized for aiDAPTIV AI+ML workloads. aiDAPTIV+ integrates SSDs into the AI computing framework to improve the overall operational performance and efficiencies for AI hardware architecture. It structurally divides large-scale AI models and runs the model parameters with SSD offload support.

One of the challenges raised by AI workloads is that current AI models run primarily on GPUs and DRAM, but the growth rate of models will far exceed the capacity that GPUs and DRAM can provide. Phison’s approach is designed to maximize the executable AI models within the limited GPU and DRAM resources.

In a sense, AI is enabling flash to better handle AI.

Jean said one way it can be used is for hot/cold mapping. In the early days of flash storage array adoption, companies had to decide what data was important enough to be stored on faster flash rather than a slower spinning disk. He said by improving hot/cold detection mapping, the life of the drive can be increased, reduce latency and maintain tighter performance throughout the entire read/write cycle.

But doing this mapping has limitations algorithmically in a shared tenancy environment, Jean said. “Algorithmic functions don’t work because the patterns are much too chaotic to be detectable, but machine learning works well there.”

محرك أقراص صلب أكثر ذكاءً يحسن أداء الفلاش

وقال جان إن هناك طريقة أخرى لتحسين ذاكرة الفلاش تتمثل في دعم وظائف SSD الخارجية، سواء كانت تخزين الحوسبة أو إلغاء التثبيت، والتي يمكن أن تأخذ مهام التطبيقات، وليس فقط السماح لهم بالتفاعل مع محركات الأقراص السلبية. " انها ليست مثيرة أو مثيرة، لكنها في الواقع مفيدة للغاية. "

وقال إنه في بعض الحالات، من المنطقي أن تتواجد التطبيقات على محركات الأقراص الصلبة ذات الحالة الصلبة وأن تعمل الخوارزميات الذكية مباشرة.

وقال جان إن أيا من هذه التحسينات في محركات الأقراص الصلبة ذات الحالة الصلبة لم يصل إلى مستوى أداء Optane. " لكنه يسمح للأقراص الصلبة ذات الحالة الصلبة بالمشاركة بشكل أكبر في هذا النظام البيئي المتنامي. "

وقال إن تشغيل حسابات بسيطة على محركات الأقراص الصلبة يسمح باستخدام وحدات المعالجة المركزية ووحدات معالجة الرسومات لأشياء أكثر ذكاءً، وأشار إلى أن هذا النهج يقلل أيضًا من وقت الإدخال / الإخراج اللازم لنقل البيانات ذهابًا وإيابًا.

يبدو أن الحصول على أداء أعلى من NAND يعتمد على محركات الأقراص الصلبة.

في مقابلة حصرية مع EE Times، قال Jim Yastic من Macronix أن معظم الموضوعات في قمة فلاش 2022 تدور حول محركات الأقراص الصلبة وتغييرات الهندسة المعمارية لتقليل تكاليف التغليف والخلفية. وقال إنه حتى مع إضافة المزيد من الطبقات، فإن هذا هو المكان الذي تكمن فيه معظم المزايا الأساسية. " تم نقل الكثير من إدارة الذاكرة إلى هذه الأقراص الصلبة الصلبة. "

تتضمن الطرق الأخرى التي لا تتضمن تعديل NAND نفسه تغييرات معمارية على بيئات الحوسبة، مثل CXL، والتي تعمل على تحسين توفر الذاكرة والتخزين. يخلق CXL فرصًا لتحسين استخدام ذاكرة فلاش NAND لأنه يهدف إلى تخصيص موارد الحوسبة بشكل أفضل حيث تكون هناك حاجة إليها وتقليل حركة البيانات.

" هذه مشكلة متعددة الأبعاد تحتاج إلى معالجة "، وقال ياستيك.

وقال ياستيك إنه بما أن 3D NAND هي تقنية ناضجة، وكذلك الأقراص الصلبة الصلبة، فهي تتعلق بالسعر والربح لزيادة القيمة المقترحة مع مواكبة الاتجاهات المعمارية المتغيرة.

وقال إن أحد الاتجاهات هو محاولة تحسين الأداء وخفض التكاليف، والطريقة الواضحة لتحقيق ذلك هي تقليل مقدار الذاكرة المطلوبة. وقال ياستيك أنه من خلال ملء هذه الفجوة باستخدام الذاكرة الدائمة تحت DRAM، فإن Optane من Intel لديه القدرة على القيام بذلك لأنه يمكن أن " يخدم الذئاب التوأم".

أوبتان هو مفهوم رائع، وأضاف أنه هو اتجاه الصناعة على المدى الطويل" ولكن على المدى القصير، لا تزال القوى الاقتصادية تدفع اختيار المهندسين المعماريين".

تم التخلي عن Optane ( وتكنولوجيا Xpoint ثلاثية الأبعاد ) في نهاية المطاف، لأنه لا يمكن لأي قدر من الأداء تغيير الاقتصاد؛ هذه ليست طريقة فعالة من حيث التكلفة لسد الفجوة على الإطلاق.

وقال هاندي من تحليل Objective أنه حتى لو كان ثلاثي الأبعاد NAND يصنع هذه الطبقات، فإن هيكل التكلفة يجب أن يكون أقل، وبالتالي فإن التكنولوجيا لا تزال مربحة. وقال إنه سواء كان هناك عدد أقل من DRAM والمزيد من NAND، أو العكس، أو ذاكرة على مستوى التخزين بين الاثنين لتقليل DRAM وذاكرة الفلاش، أيهما يفوز على أساس أداء التكلفة، فإنه سيحقق الهدف. " إذا تم بيع NAND أسرع بسعر منخفض بما فيه الكفاية، فإنه سيكون موضع ترحيب كبير. "