Electronics Components World

Featured Article - Improving Memory Management In Multi-function Embedded Devices By Stephen Harris and David Cross, Cypress Semiconductor

Publication date: 09 December 2008

Featured Article - Improving Memory Management In Multi-function Embedded Devices By Stephen Harris and David Cross, Cypress Semiconductor

Consumer embedded devices are becoming increasingly loaded with a range of diverse software applications and device drivers due to end user multi-tasking demands.

The most apparent example of this is the mobile handset, which consolidates the functionality of a portable media player, digital still camera, portable navigation system, and web terminal. Despite these existing challenges, consumer embedded designers are being tasked with adding even more functionality, resulting from consumer requirements for more multimedia.

The accumulating requirements and resultant software overhead is manageable in many fixed function devices, but becomes a serious bottleneck when legacy memory architectures from these devices are used as a blueprint for their multi-function descendants.

Many key players in the mobile device industry have opted to use MLC NAND Flash (with current costs at roughly $2.00 USD/GByte) in their products, instead of less cost effective flash technology such as NOR and SLC NAND. While this is the most economical high-density storage solution, MLC NAND is difficult to manage effectively while maintaining high throughput.

The most basic MLC NAND controller must at least encapsulate the following functionality:

1) Error Correction

2) Bad Block Management, and

3) Wear Leveling.

Error correction requires either the use of a software algorithm that reads all incoming and outgoing data, placing the processor in the data path, or a dedicated hardware ECC engine. When a soft algorithm is used, all error correction work must also be done in software. In the case where a dedicated ECC engine is implemented, there are at least two approaches that are possible.

The first is to implement an ECC engine which does both error correction and error detection. This implementation is less common on embedded processors due to the complexity and inflexibility of such a design.

As MLC NAND technology continues to develop, the amount of bit errors caused by geometry related issues increases and in turn calls for higher orders of bit error correction. Due to this and other factors, inflexible hardware ECC designs for embedded processors which evolve more slowly than NAND technology quickly become obsolete, often not justifying the up front development costs.

The second, more common, approach is to implement the ECC engine as an ECC calculation or error detection mechanism. This approach relies on software to actually perform the error correction for read data pages and retrieve the ECC from the engine to write it to NAND for written pages. As is the case with a soft implementation, this results in the embedded processor becoming a part of the data path for all incoming and outgoing data between the NAND and destination peripherals.

A basic MLC NAND controller is also required to manage bad blocks. This involves correctly interpreting blocks marked by the manufacturer as bad, as well as identifying blocks that have gone bad through repeated usage. This management is almost always implemented in software, which creates extra load for a multi-purpose applications processor.

As an additional part of bad block management, NAND controllers need to implement a wear-leveling algorithm in order to reduce the number of blocks that become bad from over-usage. Depending on the methods used to implement this algorithm, this requires that large data structures be maintained in volatile memory or read from the nonvolatile storage in order to track usage statistics.

The requirement for NAND controllers to manage bad blocks and spread writes throughout the NAND also necessitates that the physical block locations where data is actually written to the NAND be different from the logical block locations seen by a file system.

This logical to physical mapping must be maintained by the software and imposes additional memory requirements for data structures, as well as processor cycles for their maintenance. These requirements scale proportionally to the size of the attached NAND device or devices used.

The impact of all of these software requirements for memory management can result in significant performance reductions, however this is not always the case for fixed function devices. In the latest iPod Nano (3rd Generation) for example, the Apple engineers were able to achieve above average performance during data transfer between a PC host and their device (~10 MBps), an industry usage model commonly referred to as “sideloading”.

Teardown analysis demonstrates that there are essentially two ways that their engineering team was able to attain this performance. The first is by dedicating much of their application processor’s bandwidth to the sideloading activity. The iPod is unusable as a music player while sideloading is ongoing. The second is to use large file system caches in the 256Mb DDR SDRAM and report back to the PC host that data is written long before it actually hits non-volatile memory.

The latest iPod is therefore a good example of a fixed function device that can achieve high performance despite memory management overhead, at least so long as it is not being used for other purposes in parallel. It also demonstrates the level of expertise that the Apple engineers have with respect to MLC NAND management.

Given this internal Apple capability, one would expect that the learning about memory management gained with the iPod Nano, a product whose sideloading performance has increased with every subsequent generation, would have been applied to their groundbreaking iPhone. However, benchmarking the sideloading performance of the iPhone shows a greater than 70% reduction in throughput.

A closer look at the architecture of these devices reveals several similarities in their design. Both use high performance Samsung Processors. In the case of the iPhone, Engadget indicates that the processor can be clocked up to 700 MHz and contains an eight stage pipeline.

Although the exact specifications of the iPod 3G Samsung processor are unclear, it is reasonable to conjecture that Apple would not use a less powerful processor for a device that contains many additional features and greater functionality. Indeed, Apple is using four times as much DDR SDRAM in the iPhone (1Gb), and this memory is stacked in a single package with the applications processor. Both designs also have large MLC NAND chips attached, of comparable capacity.

The capacity similarities indicate similar memory requirements for NAND management data structures. However, the fact that multiple software tasks are ongoing likely means that less non-volatile memory is available for file system caching. This may be a factor, despite the fact that the iPhone has approximately four times as much DDR SDRAM as compared to the 4GB Nano. The performance loss can therefore not be directly attributed to reduced processor capacity or smaller volatile memory sizes.

While it cannot be determined for certain what the root cause of the loss in performance is without thorough implementation details of the respective designs, a high level look at the differences between the two devices seems to strongly indicate where the primary bottleneck is: the processor in the multi-functional iPhone is tasked to handle more operations in parallel.

The phone functionality and the user interface remain active as data is transferred to the device. Unlike the iPod, which sideloads data without distraction by design, the iPhone has no such luxury. Rather, it is required to handle more complex applications, multi-task, and devote its attention to multiple interrupt sources. It therefore has fewer cycles to burn on memory management.

This example seemingly demonstrates the inherent limitations in throwing large, power hungry processors at performance problems, at least with respect to battery dependent mobile devices. The limitation is that this approach is essentially less effective in terms of performance than other solutions that exist on the market. It also requires device manufacturers to have a great deal of NAND expertise, expertise that took an innovative giant such as Apple several device iterations to perfect.

These types of inherent imitations can be remedied by using a fundamentally different mobile architecture called West Bridge, an architecture that was developed for the embedded world and patterned after PC manufacturer’s utilization of Northbridge and Southbridge chips.

Fig 1aFigure 1b

 

 

 

 

 

 

 

 

 

 

 

Diagram 1: Embedded Architecture Patterning After PC 

This Bridge type architecture enables high performance peripherals to be integrated into existing mobile platforms quickly and effectively. New technology is integrated into these peripheral controller chipsets at a rate much faster than is possible for embedded processors.

For the memory management scenario discussed previously, a device such as Cypress Semiconductor’s West Bridge Astoria product integrates peripheral control with all of the functionality that is required for MLC NAND management.

figure 2afigure 2b

 

 

 

 

 

 

 

 

 

Diagram 2: Improving Performance - West Bridge Type Architecture

The many CPU cycles that were previously burned on the central processor for this memory management can now be eliminated. This new architecture also enables MLC NAND performance to be deterministic rather than a function of the number of features on a given device.

The overall industry trend is leading towards greater complexity required for effective memory management. Using the fundamental architectures of today as an approach to solve the increasingly complex problems of tomorrow is likely to limit the potential performance of future multi-function embedded devices.

David Cross (odc@cypress.com) is a staff applications engineer in the Data Communications Division of Cypress Semiconductor. He holds a bachelors degree in Electrical Engineering from Marquette and is a graduate student at Stanford.

Stephen Harris (esh@cypress.com) is a senior product-marketing engineer in the Data Communications Division of Cypress Semiconductor where he works on market research, Pproduct definition and business development. He holds a bachelors degree in Electrical Engineering from the University of Colorado.

For Further Information, Please Visit http://www.cypress.com

Send to a Colleague!
Your Email:
Their Email:
Comments: