[Update (2004/11/12): This blog entry has transitioned into a full-fledged java.net article. The content in the article is very similar to that below, but contains a few clarifications and slightly better formatting. Therefore, I suggest you visit the newarticle page instead.]
Ever since the new OpenGL-based Java 2D pipeline became available in J2SE 5.0, developers have been asking the same question: "Which rendering operations are accelerated by OpenGL?"... While I've tried my best to answer these questions clearly, I know that my answers never tell the whole story. There is just no simple way to answer that question with just a few sentences or a "matrix of supported operations" or anything like that. Even my colleagues will tell you that I usually resort to wild handwaving and whiteboard diagramming (that verges on interpretive dance at times) when I try to explain this stuff in the office.
Therefore, I compiled this document to help answer the hot question and explain all the caveats that developers might encounter when they run their application with the OpenGL-based pipeline enabled. Even this one (long!) document is probably not sufficient. There are at least two more topics that I would like to cover in the near future: a performance comparison of the OpenGL-based pipeline, and a roadmap describing some of the features and performance improvements we would like to implement for it in the future.
[This document describes the current state of the OpenGL-based pipeline as of J2SE 5.0. Keep in mind that this story may change a bit in future releases as we find ways to accelerate more operations using OpenGL.]
2) General Comments
- This document does not discuss how to enable the OpenGL-based pipeline for your application. For more information on that topic, as well as detailed platform-specific release notes, click here.
- In this document, the term "OpenGL surface" refers to hardware surfaces, such as an AWT Frame (the screen), the Swing backbuffer, or a VolatileImage (which is backed by an OpenGL "pbuffer"). Rendering operations to an OpenGL surface are accelerated in hardware, as described below (e.g. fillRect(), drawString(), drawImage(), copyArea(), etc). In some cases, operationsfrom an OpenGL surface are accelerated in hardware (e.g. copying a VolatileImage to the screen will result in a fast VRAM->VRAM operation).
- The term "OpenGL texture" is used to differentiate OpenGL texture objects from the surfaces described above, since one cannot render to a texture as one would, say, a pbuffer. (This all gets a little muddied if you start talking about the "render-to-texture" extension, which is a bit complex and outside the scope of this document.) Operations from an OpenGL texture are accelerated in hardware (e.g. transforming a managed image to, say, a BufferStrategy backbuffer will result in a fast VRAM->VRAM operation).
- Remember that VRAM (memory located on the graphics hardware) is a finite resource. Even if the OpenGL-based pipeline can be enabled on a given graphics device, there may not be enough VRAM available to hold all your images. As mentioned above, we attempt to back VolatileImages using OpenGL pbuffers, but pbuffers can be very resource hungry objects since they often contain 8-bit stencil buffers, depth buffers, accumulation buffers, etc, in addition to the 24- or 32-bit color buffer that one would expect. We try to choose the least resource-demanding pbuffer format, but even so, some drivers return a pbuffer that requires 20 bytes (or more) per pixel! (Just one 1024x768 VolatileImage could require more than 15 MB of VRAM!) If we are unable to fit a VolatileImage in VRAM, we will always fall back and create the image in system memory so your application will still work properly, albeit more slowly than the ideal.
- OpenGL textures only store color information, so they do not require as much VRAM as pbuffers. For example, a 1024x512 INT_ARGB managed image that is cached in an OpenGL texture will require only about 2 MB of VRAM. However, OpenGL requires that textures have power-of-two dimensions. This means that if your managed image does not have power-of-two dimensions, we will create an OpenGL texture with power-of-two dimensions so that your image will fit. The downside of this approach is that it is potentially wasteful of VRAM. Consider a 129x257 managed image: we will cache the image in a 256x512 texture, which requires about four times as much VRAM as one would expect. (Graphics hardware manufacturers are beginning to support non-power-of-two sized textures in their latest products, and our pipeline is already prepared for this extension, so that we are not required to create power-of-two sized textures. Sadly, this extension is only supported on the very latest hardware, so the above caveats still apply for current hardware.) As with pbuffers, if we are unable to cache a managed image in an OpenGL texture (due to limited VRAM, or the image dimensions exceed the maximum texture size allowed by OpenGL), your application will still work properly, but copying from that image will likely be slower than if it was cached in texture memory.
- For most rendering operations described below, clipping is fully hardware accelerated by OpenGL. For rectangular clip regions, we use glScissor() which provides extremely fast rectangular clipping in hardware. For complex (shape) clip regions we use the OpenGL stencil buffer, which is also a very fast way to clip out non-rectangular regions.
- To determine which operations are being accelerated by OpenGL in your application, you can enable tracing with the following system property:
For more information on tracing (and other system properties), click here .
- The Java 2D API can be divided into three general categories of rendering operations: shapes, text, and images. Whether a particular rendering operation can be accelerated by OpenGL depends on the type of operation and the current Graphics2D state. (Read on for more than you ever wanted to know about the OpenGL-based pipeline...)
3) Shape Rendering
Operations in this category include drawLine(), fillRect(), draw(Shape), etc. The way that each operation is handled largely depends on whether the ANTIALIASING RenderingHint is turned on, in addition to the other relevant Graphics2D state.
3.1) Non-antialiased Rendering (ANTIALIAS_DEFAULT/OFF)
Some basic operations can be rendered directly by OpenGL simply by passing down the coordinates of the operation. Specifically, these basic operations include drawLine(), drawRect(), drawPolygon(), drawPolyline(), and fillRect(). More complex operations, such as drawArc() and fill(Shape) are converted to easily digestible spans, which are then rendered by OpenGL. The Graphics2D state determines how the operation is handled by OpenGL:
- If the current Paint is a simple Color (either opaque or translucent), then we set the current OpenGL color state using the value from the Color object. Geometry that is rendered subsequently will be drawn with this solid color value, according to the current Composite state (see below).
- If the current Paint is a GradientPaint, we can use OpenGL's texture coordinate generation mechanism to dynamically apply a GradientPaint to the geometry being rendered. The process used here is fairly complex and outside the scope of this document, but it is safe to say that the technique is very fast even on old graphics hardware. This GradientPaint technique works equally well for all AlphaComposite rules, but we have to punt to software loops in the case of XOR mode.
- If the current Paint is a TexturePaint, the approach is very similar to that described for GradientPaint above. However, there are two caveats to be aware of. First, the BufferedImage used for the TexturePaint must be cached in texture memory (see "Image Rendering" section below). Second, the BufferedImage used for the TexturePaint must have power-of-two dimensions (unless the new GL_ARB_texture_non_power_of_two is available, as discussed in the "General Comments" section). The texture coordinate generation mechanism that we use will only tile the texture image properly if it has power-of-two dimensions. If either of these two restrictions is not met, we will just fall back on the existing software-based TexturePaint implementation.
- For custom Paint implementations, we will simply fall back on our software pipelines to complete the operation.
All 12 Porter-Duff rules defined by the AlphaComposite class can be accelerated by OpenGL. Likewise, if XOR mode is set, then we will use OpenGL's XOR logic operation to accelerate XOR rendering. For custom Composite implementations, we will fall back on our software pipelines to complete the operation.
For simple draw operations (such as drawLine()), the geometry can be sent directly to OpenGL only when there is a thin stroke (i.e. a default BasicStroke with width=1.0) installed on the Graphics2D object. If the stroke state is any more complex, then the shape will be sent to the software rasterizer and converted into spans, which will then be rendered by OpenGL as a list of simple quads. (The composite and paint operations will still be accelerated by OpenGL as described above when rendering the spans.)
If the current AffineTransform represents a simple translation (no scale, shear, or rotation), then the translation factors will be applied to the parameters of the operation and the operation will be performed by OpenGL. If the current AffineTransform is more complex, then the shape will be sent to the software rasterizer and converted into spans, which will then be rendered by OpenGL as a list of simple quads. (The composite and paint operations will still be accelerated by OpenGL as described above when rendering the spans.)
3.2) Antialiased Rendering (ANTIALIAS_ON)
When antialiasing is enabled, shape rendering operations go through the software geometry rasterizer, which knows how to optimally apply the current transform, stroke, and clip state in order to produce something easily digestible by OpenGL. Specifically, the geometry is converted into a series of alpha mask tiles. (There is actually a ton of things going on here, but for the sake of simplicity I'll just talk about this process from the perspective of the OpenGL-based pipeline, which only knows how to take these alpha tiles and turn them into something visible on the screen.)
Even though the software rasterizer is heavily involved when antialiasing is enabled, I would still argue that the operation can be considered "accelerated", since OpenGL can be used to apply the mask to the current Paint and composite the result to the destination OpenGL surface.
Due to the way the operation is defined, OpenGL will only accelerate the alpha mask operation if:
- the current Paint is a Color object (either opaque or translucent) AND
the current Composite is of type AlphaComposite.SRC_OVER
- the current Paint is an opaque Color object AND
the current Composite is of type AlphaComposite.SRC AND
the "extra alpha" value of the AlphaComposite is 1.0
If the above restictions are not met (e.g. a GradientPaint is installed), we will use a slower path, but rest assured that we will use OpenGL whenever possible to render the antialiased shape to the destination surface.
4) Text Rendering
Operations in this category include drawString(), drawGlyphVector(), etc.
Rendering of text, both antialiased and non-antialiased, is accelerated by the OpenGL-based pipeline. We maintain an OpenGL texture that acts as a hardware glyph cache, so commonly used glyphs can simply be texture mapped to the destination surface, taking advantage of the hardware accelerated compositing offered by OpenGL. The heuristics used by the OpenGL glyph cache are subject to change, but in J2SE 5.0, we attempt to cache a glyph if its width and height are each less than or equal to 16 pixels. If the glyph cannot fit in the OpenGL glyph cache (which can hold approximately 1024 16x16 glyphs), we render each glyph individually using a process very similar to that descibed in Section 3.2 (including the same restrictions on the current Paint and Composite).
5) Image Rendering
Operations in this category include all the drawImage() variants. If you are unfamiliar with the concepts of VolatileImages and "managed images", I highly suggest you read through Chet's blogs on those subjects.
Imaging operations are usually accelerated in hardware by OpenGL, even if one of the 12 AlphaComposite rules is installed on the Graphics2D. Generally speaking, the OpenGL-based pipeline will accelerate the following operations:
- simple copies (e.g. drawImage(img, x, y, null))
- simple scales (e.g. drawImage(img, x, y, w, h, null))
- arbitrary transforms (e.g. drawImage(img, xform, null))
Exactly how the image data is rendered to an OpenGL surface depends on the types of images involved. Each type of imaging operation is described below.
5.1) System Memory Surface --> OpenGL Surface
System memory surfaces (e.g. a BufferedImage that has not yet been cached in an OpenGL texture) of the following types can be rendered directly by OpenGL:
The glDrawPixels() operation can handle simple copies and simple scales (in conjunction with glPixelZoom()), so these operations should be relatively performant. However, glDrawPixels() is known to be somewhat slow, especially on graphics hardware in the x86 world, so this is not the most optimal path.
There is no direct way in OpenGL for transforming system memory surfaces (barring the "pixel transform" extension, which is either not available or not performant on most graphics hardware). Therefore, the OpenGL-based pipeline will use a special tiled approach that uses an intermediate OpenGL texture object to transform the system memory surface:
This approach is reasonably fast since the intermediate texture operations are handled in hardware, but note that it is currently defined only for NEAREST_NEIGHBOR interpolation. (We have an RFE open that would make this work for BILINEAR as well, but for now BILINEAR and BICUBIC hints are handled by our software transform loops in this case.)
5.2) Managed Image (OpenGL Texture) --> OpenGL Surface
Managed images of all types can be cached in an OpenGL texture (there are direct loops defined for the types mentioned in Section 5.1, but generally speaking we can cache any image type by first going through an intermediate surface). Once an image has been cached in an OpenGL texture object, that image can be rendered to an OpenGL surface by mapping the texture to an OpenGL quad. The texture-mapped quad will respect the current AffineTransform state, and will therefore be transformed.
For example, if there is a rotation transform set on the Graphics2D object, the texture will be rotated by the graphics hardware. Likewise, the variants of drawImage() that take scaling parameters will scale the texture mapped quad before rendering to the destination OpenGL surface. Transforming a managed image with either NEAREST_NEIGHBOR or BILINEAR interpolation RenderingHints will be accelerated by OpenGL in hardware. Unfortunately, OpenGL does not support BICUBIC interpolation for textures, so we fall back on our software transform loops for the BICUBIC case.
5.3) VolatileImage (OpenGL Pbuffer) --> OpenGL Surface
Simple copies and scaled copies from a pbuffer-backed VolatileImage to an OpenGL surface will be accelerated by the VRAM->VRAM glCopyPixels() operation, and should be relatively performant. There is no direct way in OpenGL for transforming pbuffers (barring the render-to-texture approach, which is not discussed here). Therefore, copying a pbuffer-backed VolatileImage with an arbitrary transform will use a tiled approach similar to that described in Section 5.1:
This approach is reasonably fast since the intermediate texture operations are handled in hardware, but note that it is currently defined only for NEAREST_NEIGHBOR interpolation, as mentioned in Section 5.1.
- The Graphics2D.copyArea() operation is accelerated for OpenGL surfaces, using the very fast VRAM->VRAM glCopyPixels() operation.
- The BufferStrategy.show() method (for "flip" strategies) will result in a native SwapBuffers() operation, which causes the contents of the hardware OpenGL backbuffer to be "flipped" to the front buffer (i.e. the screen). Depending on the platform and graphics drivers, this operation may be synchronized with the vertical refresh of the monitor.
- The Image.flush() method will delete any OpenGL textures in use by a managed image, or any OpenGL pbuffer in use by a VolatileImage.
I hope this article answers most of the questions developers have been asking for the past few months. If you see any glaring omissions, something you would like clarified, or topics for a future "Behind the Graphics2D" article, please post a comment. I'll try to incorporate your suggestions into this document so that it can be the "definitive source" for this topic (if that's possible).
In my ears: Wire, "154"
In my eyes: Kobo Abe, "The Woman in the Dunes"