i
i
i
i
i
i
i
i
18.4. Case Studies 863
When the unified shaders have executed a pixel shader program for a 64
entry vector, it “exports” the pixel shader outputs, which typically consist
of color. Two quads (2 ×2 ×2 pixels) can be handled per cycle. Depth can
also be exported (but this is seldom done), which costs an extra cycle. The
backend central groups pixel quads and reorders them to best utilize the
eDRAM bandwidth. Note that at this point, we have “looped through”
the entire architecture. The execution starts when triangles are read from
memory, and the unified shader pipes executes a vertex shader program for
the vertices. Then, the resulting information is rerouted back to the setup
units and then injected into the unified shader pipes, in order to execute a
pixel shader program. Finally, pixel output (color + depth) is ready to be
sent to the frame buffer.
The daughter chip of the Xbox 360 performs merging operations, and
the available bandwidth is 32 GB/s. Eight pixels, each with four samples,
can be sent per clock cycle. A sample stores a representation of a 32-bit
color and uses lossless Z-compression for depth. Alternatively, 16 pixels can
be handled if only depth is used. This can be useful for shadow volumes
and when rendering out depth only, e.g., for shadow mapping or as a pre-
pass for deferred shading. In the daughter chip, there is extra logic (called
“AZ” in Figure 18.15) for handling alpha blending, stencil testing, and
depth testing. When this has been done, there are 256 GB/s available
directly to the DRAM memory in the daughter chip.
As an example of the great advantage of this design, we consider alpha
blending. Usually, one first needs to read the colors from the color buffer,
then blend the current color (generated by a pixel shader), and then send
it back to the color buffer. With the Xbox 360 architecture, the generated
color is sent to the daughter chip, and the rest of the communication is han-
dled inside that chip. Since depth compression also is used and performed
in the daughter chip, similar advantages accrue.
When a frame has been rendered, all the samples of the pixels reside
in the eDRAM in the daughter chip. The back buffer needs to be sent to
main memory, so it can be displayed onscreen. For multi-sampled buffers,
the downsampling is done by the AZ unit in the daughter chip, before it is
sent over the bus to main memory.
18.4.2 Case Study: The PLAYSTATION
R
3 System
The PLAYSTATION 3 system
8
is a game system built by Sony Computer
Entertainment. The architecture of the PLAYSTATION 3 system can be
seen in Figure 18.17. The Cell Broadband Engine
TM
was developed by
8
“PlayStation,” “PLAYSTATION,” and the “PS” Family logo are registered trade-
marks and “Cell Broadband Engine” is a trademark of Sony Computer Entertainment
Inc. The “Blu-ray Disc” name and logo are trademarks.