Ray tracing is the reason animated films look so good these days and why games still have a way to go when it comes to realism, even with powerful graphics processing units (GPUs). The problem for gamers is that GPUs are rasterization engines, and most games are tuned for that. Cinema-quality animated movies tend to be created using large “render farms” that are network clusters.
Caustic Graphics is looking to literally change the landscape by replacing the render farms with workstations running its CausticTwo boards. The faster CausticTwo boards, which are replacing the CausticOne boards, work in conjunction with a GPU to optimize the rendering of ray-traced images (Fig. 1). Essentially, they convert a 3D environment created with a 3D content creation tool like Blender into a 2D rendition. They are designed to accelerate the processing of effects such as soft shadows, glossy reflections, and, of course, caustics (reflected and refracted light).
Typically, high-resolution rendering took hours or days depending upon the size of the video files and the level of quality. Caustic Graphics looks to turn this into a real-time process, significantly changing how artists approach the problem. This is akin to the changes in 3D CAD when workstation GPU boards made real-time 3D manipulation a reality. Accelerated ray-traced rendering will help in this environment as well since realistic presentations of objects or architectural designs are common requirements.
The ray-trace optimization initially targeted higher-performance floating point since that was a major issue in ray-trace computations. Multicore solutions like GPUs can bring massive floating-point processing to bear, but there is a problem. Most GPUs and even CPUs like to work using single-instruction multiple-data (SIMD) operations. GPU systems like Nvidia’s Tesla are built on blocks of eight and scale up from there (see “SIMT Architecture Delivers Double-Precision Teraflops” at www.electronicdesign.com, ED Online 19280). The Nvidia GPU has 240 cores.
This basic SIMD approach works well for rasterizing 3D models where the kinds of computations are going to be the same, and this holds true for the initial step from a light source in a ray-tracing algorithm. The problem occurs when the light hits an object and is dispersed in different directions. This changes the calculations since each vector is processed differently. Caustic Graphics finds similar calculations and groups them together so GPU engines can operate on them in parallel. It allows tens of thousands of rays to be processed efficiently in parallel.
THE DATABASE ENGINE
Caustic Graphics has developed extensions to OpenGL ES 2.0 to address its approach to ray tracing. A typical configuration incorporates a CPU, a GPU, and a CausticTwo board (Fig. 2). An application interfaces with an Caustic Grahpics OpenGL implementation called CausticGL that in turn manages the CausticTwo board and the GPU through the GPU’s standard OpenGL driver. The coordination occurs in the CPU. The CausticGL driver can fall back to a software implementation if the CausticTwo hardware is not available, but it will be slower.
The CausticTwo board handles the ray intersection tests and schedules the rays that have locality of reference in 3D space to enable the efficient shading of a ray’s color information by the GPU. The GPU is shading while the CausticTwo performs ray intersection tests, other database queries, and scheduling.
The GPU is used for the final rendering and can handle multiple frames. This is typically done for real-time rendering to provide double buffering. But only a single frame can be done for offline rendering, potentially adding more computational resources for the single frame, speeding up the process or allowing a higher-quality rendition.
The GPU includes environment-related information to assist in the rendering. Likewise, some of this information is replicated in the database processing unit (DBPU). This lets the driver query the database and determine what calculations can be grouped together and performed on the GPU. In turn, this permits the GPU to perform ray-tracing support operations with rasterization efficiency that would otherwise be impossible.
Essentially, Caustic Graphics has created a sophisticated content addressable memory. It can handle multiple queries in parallel and support multiple databases. In this case, it would have one database per frame. It also can handle raytracing style queries such as ray queries and photon kNN (k nearest neighbors).
The system has been optimized for its target application where a data-defining frame is fixed with lots of queries against it. The database does not change after it is set up, and it is discarded when the process is done. The drivers and hardware can handle more than one database being active at a time, so double buffering is part of the equation making real-time rendering possible.
The CausticTwo board has a 16x PCI Express interface. Multiple boards in a system are supported, as different boards could handle the queries on the same database if the databases are replicated on each board. Another approach would have different boards handle different frames. The bottom line is to keep all processing units active. Idle units do not accelerate anything.
MORE THAN A RAY OF TRUTH
Caustic Graphics is targeting the high-end graphicsrendering market, which will keep the company’s hands full. So, it’s unlikely that other applications will be able to take advantage of the database engine in the near term. Still, some academic research will occur, and some determined developers may be able to take advantage of the CausticTwo’s performance.
Applications that could take advantage of the CausticTwo include those where database queries do not change the database itself. This parallels the opening of the GPU in recent years where GPUs changed from black boxes behind a driver into more generic supercomputer platforms. In the meantime, a horde of media moguls is lusting after the CausticTwo.