In this article, we will discuss a method for rendering high-fidelity text with Metal. It’s easy to overlook text rendering when thinking about 3D graphics. However, very few games or applications can get by without displaying any text, so it’s important to consider how we can best use the GPU to incorporate text into our Metal apps.
The Structure and Interpretation of Fonts
We begin the process of drawing text by selecting a font. Fonts are collections of glyphs, which are the graphical representation of characters or portions of characters. In modern font formats, glyphs are represented as piecewise curves, specifically line segments and quadratic Bezier curves.
Drawing a string of text consists of at least two distinct phases. First, the text layout engine determines which glyphs will be used to represent the string and how they’ll be positioned relative to one another. Then, the rendering engine is responsible for turning the abstract description of the glyphs into text on the screen.
Approaches to Real-Time Text Rendering
There are numerous ways you might go about rendering text on iOS. You are probably familiar with UIKit controls such as
UITextField. These UI elements are backed by a powerful framework called Core Text. Core Text is a Unicode text layout engine that integrates tightly with Quartz 2D (Core Graphics) to lay out and render text.
Text layout is an enormously complicated subject that must account for different scripts, writing directions, and typographic conventions. We would never want to reinvent this functionality ourselves, so we’ll let Core Text do a lot of the heavy lifting as we settle in on our solution for real-time text rendering.
Let’s take a brief tour through common ways of drawing text in the context of real-time 3D graphics.
One of the most flexible approaches to text rendering is dynamic rasterization, in which strings are rasterized on the CPU, and the resulting bitmap is uploaded as a texture to the GPU for drawing. This is the approach taken by libraries such as stb_truetype.
The disadvantage of dynamic rasterization is the computational cost of redrawing the glyphs whenever the text string changes. Even though much of the cost of text rendering occurs in the layout phase, rasterizing glyphs is nontrivial in its demand on the CPU, and there is no extant GPU implementation of font rasterization on iOS. This technique also requires an amount of texture memory proportional to the font size and the length of the string being rendered. Finally, when magnified, rasterized text tends to become blurry or blocky, depending on the magnification filter.
Many applications that use the GPU to draw text prefer to render all possible glyphs up-front rather than drawing them dynamically. This approach trades off texture memory for the computational cost of rasterizing glyphs on-demand. In order to minimize the amount of texture memory required by a font, the glyphs are packed into a single rectangular texture, called an atlas. The figure below illustrates such a texture.
An ancillary data structure stores the texture coordinates describing the bounding rectangle of each glyph. When drawing a string, the application generates a mesh with the appropriate positions and texture coordinates of the string’s constituent glyphs.
One disadvantage of font atlases is that they consume a sizable amount of memory even when many of the glyphs they contain are not used at runtime.
Like dynamic rasterization, text rendered with an atlas texture suffers from a loss of fidelity upon magnification. Here, the problem may be even worse, since the glyphs are often drawn smaller in order to pack an entire font into one texture.
In the next section, we will seek to rectify some of the issues with naïve atlas-based text rendering.
The approach we will explore in depth uses a signed-distance field, which is a precomputed representation of a font atlas that stores the glyph outlines implicitly. Specifically, the texel values of a signed-distance field texture correspond to the distance of the texel to the nearest glyph edge, where texels outside the glyph take on negative values.
In order to store a signed-distance field in a texture, it must be scaled and quantized to match the pixel format. Our sample project uses a single-channel 8-bit texture, consuming one byte per pixel. By this construction, texels that fall exactly on the edge of a glyph have a value of 127, or 50%.
In its simplest incarnation, signed-distance field rendering can be done with fixed-function alpha testing. By discarding all fragments whose value is less than 50%, only those pixels inside the glyph will be rendered. Unfortunately, this has the effect of producing “wiggles” along the edges of the glyph, since the downsampled and quantized distance texture does not capture adequate detail to perfectly recreate the ideal outline.
An improvement to the alpha test technique is to use a pixel shader to interpolate between the inside and outside of the glyph, thus smoothing out the wiggly discontinuity. This is described in detail below.
Signed-distance field rendering was brought into the mainstream by Chris Green’s 2007 article describing the use of the technology in Valve’s hit game Team Fortress 2. Our implementation will closely follow the scheme laid out in Green’s paper.
Signed-Distance Field Rendering in Metal
In this section we will describe in detail a method for achieving smooth text rendering on the GPU with Metal.
Generating a Font Atlas Texture
The first step is to render all of the available glyphs in our chosen font into an atlas. For this article, an optimal packing is not used; rather, the glyphs are simply laid out left to right, wrapping lines from top to bottom in a greedy fashion. This simplifies the implementation greatly, at the cost of some wasted space.
The sample code constructs a font atlas from a
UIFont by determining the maximum size of the selected font whose glyphs will fit entirely within the bitmap used to construct the atlas (4096×4096 pixels). It then uses Core Text to retrieve the glyph outlines from the font and render them, without antialiasing, into the atlas image.
As the font atlas is being drawn, the implementation also stores the origin and extent (i.e., the texture coordinates) of each glyph in a separate array. This array is used during rendering to map from the laid-out glyphs to their respective area on the atlas texture.
Generating a Signed-Distance Field
The above procedure produces a binary image of the font at a fairly high resolution. That is, pixels that fall inside of a glyph are all the way “on” (255), and pixels that are outside a glyph are all the way “off” (0). We now need to perform a signed-distance transform on this bitmap to produce a signed-distance field representation of the font atlas, which we will use for rendering.
A Brute Force Approach
Generating a signed-distance field entails finding the distance from each texel to the closest glyph edge. Some implementations, such as GLyphy, perform this calculation directly against the piecewise curve representation of the glyphs. This approach can have spectacular fidelity, but the implementation is complicated and fraught with edge cases.
As we have chosen to generate a bitmap representation of the font, we could simply iterate over the neighborhood of each texel, performing a minimizing search as we encounter texels on the other side of the edge. To even approach tractability, this requires that we choose a reasonably-sized area in which to conduct our search.
A reasonable heuristic is half the average stroke width of the font. For example, in a font for which a typical stroke is 20 pixels wide, texels that are so far inside the glyph that they have a distance of greater than 10 are already going to be clamped to the maximum value of “insideness” during rendering. Similarly, texels at a distance of more than 10 outside the glyph are unlikely candidates for influencing the way the glyph is rendered. Therefore, we would conduct a search on the 10 x 10 neighborhood around each texel.
According to Green, the brute-force approach was suitably fast for creating distance fields for text and vector artwork on the class of workstations used in the production of TF2. However, since there has been so much research into signed-distance field generation, let’s take a look at a slightly better approach that will allow us to generate them fairly quickly, even on mobile hardware.
A Better Approach: Dead Reckoning
Signed-distance fields have broad applicability, and have therefore been the subject of much research. G. J. Grevera, building on a venerable algorithm known as the chamfer distance algorithm, constructed a more precise heuristic known as “dead reckoning.” In essence, the algorithm performs two passes over the source image, first propagating the minimal distance down and to the right, then propagating the minimal distances found in the first pass back up and to the left. At each step, the distance value is determined as the minimum distance value over some mask surrounding the center texel, plus the distance along the vector to the nearest previously-discovered edge.
Without going into all the details of this algorithm, it is worth noting that it is drastically faster than the brute-force approach. Across both passes, dead reckoning consults a neighborhood of just 3×3 texels, far fewer than a brute-force algorithm of similar accuracy. Although I have not implemented it with a compute shader, I strongly suspect it could be made faster still with a GPU-based implementation.
Using Core Text for Layout
Once we have the signed-distance field representation of a font, we need a way to transform it into glyphs on the screen. The first part of this process is, again, using the Core Text layout engine to tell us which glyphs should be rendered, and how they should be positioned. We use a
CTFramesetter object to lay out the text in a chosen rectangle. The framesetting process produces an array of
CTLine objects, each containing series of glyphs.
To construct a text mesh for rendering with Metal, we enumerate the glyphs provided by the Core Text framesetter, which gives us their screen-space coordinates and an index into the table of texture coordinates we constructed earlier when building the font atlas texture. These two pieces of data allow us to create an indexed triangle mesh representing the text string. This mesh can then be rendered in the usual fashion, with a fragment shader doing the heavy lifting of transforming from the signed-distance field texture to an appropriate color for each pixel.
The Orthographic Projection
When drawing text onto the screen, we use an orthographic, or parallel projection. This kind of projection flattens the mesh to the near viewing plane without introducing the foreshortening inherent in a perspective projection.
When rendering UI elements, it is convenient to select an orthographic projection whose extents match the native resolution of the screen. Therefore, the orthographic projection used by the sample application uses (0, 0) as the upper-left corner of the screen and the drawable width and height of the screen, in pixels, as the bottom-right corner, matching UIKit’s convention.
Mathematically, this transformation is represented by the following matrix:
Here , , , , , and are the left, right, top, bottom, near and far clipping plane values. The near and far planes are assumed to sit at and , respectively.
The Vertex and Fragment Functions
The vertex function for drawing text is very straightforward; it looks exactly like vertex functions we’ve used in the past. Each vertex of the text mesh is transformed by a model matrix (which can be used to position and scale the text) and a combined view-projection matrix, which is simply the orthographic projection matrix discussed above.
The fragment function, on the other hand, is a little more involved. Since we are using the signed-distance field as more of a look-up table than a texture, we need to transform the sampled texel from the field into a color value that varies based on the proximity of the pixel to the corresponding glyph’s edge.
We apply antialiasing at the edges of the glyphs by interpolating from opaque to translucent in a narrow band around the edge. The width of this band is computed per-pixel by finding the length of the gradient of the distance field using the built-in
dfdy functions. We then use the
smoothstep function, which transitions from 0 to 1 across the width of this smoothed edge, using the sampled distance value itself as the final parameter. This produces an edge band that is roughly one pixel wide, regardless of how much the text is scaled up or down. This refinement on Green’s original approach is due to Gustavson and is explained in detail in his chapter in OpenGL Insights.
Here is the complete fragment function for rendering a glyph from a signed-distance field representation:
float edgeDistance = 0.5; float dist = texture.sample(samp, vert.texCoords).r; float edgeWidth = 0.7 * length(float2(dfdx(dist), dfdy(dist))); float opacity = smoothstep(edgeDistance - edgeWidth, edgeDistance + edgeWidth, dist); return half4(textColor.r, textColor.g, textColor.b, opacity);
Note that we return a color that has an alpha component, so the pipeline state we use should have alpha blending enabled in order for the text to properly blend with the geometry behind it. This also implies that the text should be drawn after the rest of the scene geometry.
The Sample App
The sample project for this post renders a paragraph of text that can be zoomed and panned in real-time. Interactivity is achieved with a pair of
UIGestureRecognizers. Notice how the edges of the glyphs remain quite sharp even under extreme magnification, in contrast to the way a pre-rasterized bitmap texture would become jagged or blurry under magnification.
In this post we have considered a handful of ways to draw text with the GPU. We took a deep look at the use of signed-distance fields to increase the fidelity of rendered text while keeping texture memory requirements to a minimum. This technique allows quick generation of font atlases which can then be used to draw text on the GPU extremely fast. Give it a try, and please comment below if you’ve found it useful.