Pixomatic 2.0 is now available, with a full DX7-class feature set!
Pixomatic 2.0 is the just-released major revision of our industry-best software
rasterizer, written by Michael Abrash and Mike Sartain. It's directly
comparable to a high-end DX7-class graphics accelerator in terms of features, and
performs high-quality rendering on any x86-compatible Windows or Linux machine that
supports MMX. It does two-texture multitexture, Gouraud, specular, fog, alpha
testing, alpha blending, 16- and 24-bit z, bilinear filtering, texture transforms,
and projected textures. It handles transformation, clipping, and projection of
trilists, tristrips, trifans, quadlists, polygons, pointsprites, linelists, and
linestrips, drawn through begin/end primitives or indexed or non-indexed streams.
It performs perspective-correct rasterization, with per-triangle mipmapping,
subpixel and subtexel accuracy, and 32-bit color depth. It clears, fills, copies,
stretches, antialiases, and dithers. And while it's nowhere near as fast as the
latest consoles and 3D cards, it's fast enough to run Quake II at 28 fps on a
PIII/733 MHz, 67 fps on a P4/2.2 GHz, and 108 fps on a P4/3.3 GHz.
Pixomatic does not support cubemaps, programmable pixel shaders, per-pixel
mipmapping, or trilinear filtering, because those features don't lend themselves well
to efficient implementation in software. Pixomatic also does not provide APIs for
lighting or programmable vertex shaders, as these can be done just as efficiently -
and considerably more flexibly - by the game itself rather than by Pixomatic.
(Pixomatic provides a per-vertex user callback to make it easy for the game to implement both features.)
You can find a more complete list on the Pixomatic features page, but the above should be
enough to give you an idea of what Pixomatic can and can't do.
Given that 3D hardware is commonplace nowadays, what value does Pixomatic aim to
deliver? Pixomatic can deliver value in either of two main ways. The first is by providing
reliable and consistent technology for games that need or could use 3D, but
don't require complex pixel shaders, huge numbers of triangles, or massive fill rate,
and want to avoid the the complications and costs of dealing with PC hardware. While
that may sound modest, in fact the benefits in terms of time and money saved,
resources freed, and potential increase in market size can be substantial, as
discussed in detail on the
"Why Pixomatic?" page. The
second way in which Pixomatic can add value is by serving as a fallback renderer
for games that require hardware to look their best, but would still like to be able to run
on every machine regardless of hardware acceleration, in order to broaden their market
and reduce technical support costs and returns.
Let's take a closer look at how Pixomatic works and why we designed it that
way.
We knew from the start that it had to be easy to write games on top of
Pixomatic, or to port games to it; thus, Pixomatic's feature set maps closely to
that of hardware that's in the same performance ballpark. For efficiency,
Pixomatic uses a custom API, but one that is designed for easy porting from
industry standards. Pixomatic has none of the quirks you might expect from a software
rasterizer, and places no unusual demands on your game design; there's no
retaining of world information, no BSP trees or span lists or edge lists or
any of the many clever tricks software rasterizers have relied on the past.
Pixomatic is a very straightforward implementation of the classic 3D
pipeline: polygons in any of several forms go in one end, and, after z and
stencil testing, the rasterized, pixel-shaded result comes out the other end
and is written to the frame buffer. All the rasterization features work together
in almost any combination (the only exception is that no frame buffer or z buffer
drawing can occur when the stencil buffer is being written to). If you're
familiar with either OpenGL or DX, you'll have no problems with learning to use
Pixomatic.
Our second requirement was to produce the highest-quality results possible,
within the constraints of ease of use and the performance limitations
of software rasterization. We did this by implementing our core feature set
with a minimum of 8 bits for each color component throughout the pipeline,
doing subpixel rasterization and homogeneous clipping, and doing
perspective-correct, subtexel-accurate texture mapping. Then we added all
the rendering features we could get to run fast enough, still with the same
color depth and accuracy: dot3 per-pixel lighting; antialiasing; stencil
shadows; optional 24-bit z; and bilinear filtering, plus two faster filters
that work well for light maps.
Our final objective was, of course, performance. It's not all that hard to
write a software rasterizer that produces high-quality output; the hard part
is writing one that does it fast. We knew from the start that performance
would be our biggest challenge with Pixomatic, so we designed Pixomatic from
scratch for the best possible performance on PIII- and P4-class machines,
using every optimization technique we could think of that wouldn't degrade
rendering quality.
At the heart of Pixomatic's performance and quality is what we call the
welder, the software that compiles the pixel pipeline on the fly whenever
the rasterization state changes, producing code equivalent to hand-tuned
assembly language. (In fact, it effectively is hand-tuned assembly code; the
compilation involves intelligent stitching together and fixing up of
hand-optimized code fragments.) The welded pixel pipeline uses all 8 MMX
registers and all 8 general-purpose registers to keep dynamic variables
in registers at almost all times (there are a few save/restore cases when
Gouraud, specular, and both textures are in use together with bilinear
filtering). The texel lookup itself requires a mere 5 instructions, thanks
to careful use of MMX. Only one branch - the loop branch - is performed per
pixel, apart from the z, stencil, and alpha tests, if they're enabled. The
pipeline early-outs on z or stencil failure.
Moving up a level, the triangle pipeline, which drives the pixel pipeline,
works by generating a list of spans to be drawn for each triangle; each span
is no longer than 16 pixels, with floating-point perspective-correct texture
calculations performed at each end. The pixel pipeline then draws each span
in turn, interpolating linearly from one end to the other, producing results
indistinguishable from performing the perspective divide for every pixel,
but with much better performance. The span generator automatically uses SSE
or 3DNow if either is present, and the SSE version is written entirely in
hand-tuned assembly language, with 7 general-purpose registers, 8 MMX
registers, and 6 XMM registers in use simultaneously. Z prefetching is used
to improve effective memory latency when prefetch instructions are available.
In short, the rasterization pipeline is designed to provide the best
performance we know how to wring out of an x86 processor with MMX and
optionally SSE or 3DNow, while still maintaining 32-bit, perspective-correct,
subpixel- and subtexel-precise rendering quality.
The geometry pipeline is similarly constructed for maximum performance. A
combination of C, hand-tuned assembly code, and compiled-on-the-fly code is
used to handle the many combinations of interface types, primitive types,
and stream configurations as efficiently as possible. We even developed a
custom preprocessor so we could reuse tuned code across the full range of
configurations. Again, SSE and 3DNow are used if available.
A variety of MMX-optimized clears and fills are provided, along with blts to
32-, 24-, and 16-bit targets, the latter with dithering. These too are
assembly code in some places and compiled-on-the-fly in others, as
appropriate. Prefetching is used to accelerate the clears, fills, and
blts whenever it's available.
Finally, Pixomatic offers buffer management and screen-update functions,
and, if you'd like, can automatically figure out the fastest way to update
the front buffer, choosing between GDI blts and its own MMX-optimized blts.
How did our drive for performance work out? We're pleased with the results,
which we feel are reasonably close to the best that can be achieved on
current x86 CPUs with a general-purpose 3D API. Pixomatic's performance is
more than adequate for Quake II, and would have been fine for most of the
bestselling games of 2001, as discussed here.
In short, Pixomatic implements a standard, core set of 3D functionality
that's well suited to implementation on a CPU, with a familiar, easy-to-use
API and the best performance we could produce without sacrificing rendering
quality or ease of use. It doesn't try to do things that can't be done well
given the limitations of CPU performance, but what it does, it does efficiently,
accurately, and reliably.
You can download the Pixomatic demos,
or call 425.893.4300 or email to try the Pixomatic SDK!
|