Stream Processor Architecture

Stream Processor Architecture

By (author) 

Free delivery worldwide

Available. Dispatched from the UK in 3 business days
When will my order arrive?


Media processing applications, such as three-dimensional graphics, video compression, and image processing, currently demand 10-100 billion operations per second of sustained computation. Fortunately, hundreds of arithmetic units can easily fit on a modestly sized 1cm2 chip in modern VLSI. The challenge is to provide these arithmetic units with enough data to enable them to meet the computation demands of media processing applications. Conventional storage hierarchies, which frequently include caches, are unable to bridge the data bandwidth gap between modern DRAM and tens to hundreds of arithmetic units. A data bandwidth hierarchy, however, can bridge this gap by scaling the provided bandwidth across the levels of the storage hierarchy.
The stream programming model enables media processing applications to exploit a data bandwidth hierarchy effectively. Media processing applications can naturally be expressed as a sequence of computation kernels that operate on data streams. This programming model exposes the locality and concurrency inherent in these applications and enables them to be mapped efficiently to the data bandwidth hierarchy. Stream programs are able to utilize inexperience local data bandwidth when possible and consume expensive global data bandwidth only when necessary.
Stream Processor Architecture presents the architecture of the Imagine streaming media processor, which delivers a peak performance of 20 billion floating-point operations per second. Imagine efficiently supports 48 arithmetic units with a three-tiered data bandwidth hierarchy. At the base of the hierarchy, the streaming memory system employs memory access scheduling to maximize the sustained bandwidth of external DRAM. At the center of the hierarchy, the global stream register file enables streams of data to be recirculated directly from one computation kernel to the next without returning data to memory. Finally, local distributed register files that directly feed the arithmetic units enable temporary data to be stored locally so that it does not need to consume costly global register bandwidth. The bandwidth hierarchy enables Imagine to achieve up to 96 of the performance of a stream processor with infinite bandwidth from memory and the global register file.
show more

Product details

  • Hardback | 120 pages
  • 155 x 235 x 9.65mm | 830g
  • Dordrecht, Netherlands
  • English
  • 2001 ed.
  • XIV, 120 p.
  • 0792375459
  • 9780792375456

Table of contents

Foreword. Acknowledgements.1: Introduction. 1.1 Stream Architecture. 1.2. The Imagine Media Processor. 1.3. Contributions. 1.4. Overview. 2: Background. 2.1. Special-purpose Media Processors. 2.2. Programmable Media Processors.2.3. Vector Processors. 2.4. Stream Processors. 2.5. Storage Hierarchy. 2.6. DRAM Access Scheduling. 2.7. Summary. 3: Media Processing Applications. 3.1. Media Processing. 3.2. Sample Applications. 3.3. Application Characteristics. 4: The Imagine Stream Processor. 4.1. Stream Processing. 4.2. Architecture. 4.3. Programming Model. 4.4. Implementation. 4.5. Scalability and Extensibility. 5: Data Bandwidth Hierarchy. 5.1. Overview. 5.2. Communication Bottlenecks. 5.3. Register Organization. 5.4. Evaluation. 5.5. Summary. 6: Memory Access Scheduling. 6.1. Overview. 6.2. Modern DRAM. 6.3. Memory Access Scheduling. 6.4. Evaluation. 6.5. Summary. 7: Conclusions. 7.1. Imagine Summary. 7.2. Future Architectures. References. Index.
show more

Review quote

`I can recommend Stream Processor Architecture to every engineer and researcher interested in this subject. The book should be interesting to anyone working on processing of data, where latency and accuracy are less important than speed.'
IEEE Communications December 2003
show more