A Systolic Array Parallelizing Compiler

A Systolic Array Parallelizing Compiler

By (author) 

Free delivery worldwide

Available. Dispatched from the UK in 3 business days
When will my order arrive?

Description

Widespread use of parallel processing will become a reality only if the process of porting applications to parallel computers can be largely automated. Usually it is straightforward for a user to determine how an application can be mapped onto a parallel machine; however, the actual development of parallel code, if done by hand, is typically difficult and time consuming. Parallelizing compilers, which can gen- erate parallel code automatically, are therefore a key technology for parallel processing. In this book, Ping-Sheng Tseng describes a parallelizing compiler for systolic arrays, called AL. Although parallelizing compilers are quite common for shared-memory parallel machines, the AL compiler is one of the first working parallelizing compilers for distributed- memory machines, of which systolic arrays are a special case. The AL compiler takes advantage of the fine grain and high bandwidth interprocessor communication capabilities in a systolic architecture to generate efficient parallel code. xii Foreword While capable of handling an important class of applications, AL is not intended to be a general-purpose parallelizing compiler.
show more

Product details

  • Hardback | 130 pages
  • 162.6 x 236.2 x 17.8mm | 362.88g
  • Dordrecht, Netherlands
  • English
  • 1990 ed.
  • XVIII, 130 p.
  • 0792391225
  • 9780792391227

Table of contents

1 Introduction.- 2 Systolic array programming.- 2.1 The Warp machine.- 2.2 The W2 programming language.- 2.3 The AL programming language.- 2.4 Related work.- 3 Data relations.- 3.1 Linear data relations.- 3.2 Joint data compatibility classes.- 3.3 Scope of data compatibility classes.- 3.4 Summary.- 4 Loop Distribution.- 4.1 Intercell communication.- 4.2 The basic loop distribution scheme.- 4.3 Distributed loop parallelism.- 4.3.1 Intraloop parallelism.- 4.3.2 Interloop parallelism.- 4.4 Optimization.- 4.4.1 Load balancing.- 4.4.2 Communication scheduling.- 4.5 Related work.- 5 Implementation.- 5.1 External interface.- 5.2 Compiling DO* loops.- 5.3 The ALIGN* statement.- 5.4 Parallel accumulation.- 5.5 Program debugging.- 6 Evaluation.- 6.1 Matrix computations.- 6.1.1 LU decomposition.- 6.1.2 QR decomposition.- 6.1.3 Singular value decomposition.- 6.2 2D Fast Fourier Transform.- 6.3 Partial differential equation solvers.- 6.3.1 SOR.- 6.3.2 Line SOR.- 6.3.3 Two-color SOR.- 6.4 Summary.- 7 Conclusions.- A Linear data relations in Livermore Loops.- B Benchmark programs.- B.1 LU decomposition.- B.1.1 Single cell.- B.1.2 Multiple cells.- B.2 QR decomposition.- B.2.1 Single cell.- B.2.2 Multiple cells.- B.3 Singular value decomposition.- B.3.1 Single cell.- B.3.2 Multiple cell.- B.4 2D Fast Fourier Transform.- B.4.1 Single cell.- B.4.2 Multiple cell.- B.5 Partial differential equation solvers.- B.5.1 SOR.- B.5.2 Line SOR.- B.5.3 Two-color SOR.
show more