skip to main content
research-article
Open Access

A Function-Based Approach to Interactive High-Precision Volumetric Design and Fabrication

Published:29 September 2023Publication History

Skip Abstract Section

Abstract

We present a novel function representation (F-Rep) based geometric modeling kernel tailor-made to support computer aided design (CAD) and fabrication of high resolution volumetric models containing hundreds of billions of voxel grid elements. Our modeling kernel addresses existing limitations associated with evaluating, storing, and accessing volumetric data produced by F-Reps in contexts outside of rendering. The result is an F-Rep modeling kernel well suited for CAD-based applications.

Our kernel utilizes a sparse volume data structure to manage F-Rep data while efficient F-Rep evaluation is achieved through a combination of interval arithmetic (IA), just-in-time (JIT) compilation of user-defined functions, and massively parallel evaluation on the GPU. We employ IA as the basis for local pruning of the function evaluation tree to minimize total function evaluations, we use a novel JIT compilation scheme to optimize function execution, and we take advantage of GPU-parallelism to enhance computational throughput. We illustrate the kernel’s effectiveness in visualizing and slicing models with complex defining functions and detailed geometry, and utilize the geometry kernel to manufacture a physical part. Additionally, we present performance metrics across multiple hardware configurations demonstrating significant performance improvements over existing F-Rep geometry kernels, and we examine how our geometry kernel scales with computing power.

Skip 1INTRODUCTION Section

1 INTRODUCTION

Computer aided design (CAD) software is typically powered by a geometry kernel which defines the operations and representations available for designing and storing models. Traditional CAD software implements geometry kernels which describe models using boundary-representations (B-Rep) that aim to define a solid as a collection of surface patches properly connected to separate internal regions from the exterior ambient space. New fabrication technologies like additive manufacturing (AM) have pushed the boundaries on what is manufacturable, extending the gamut of parts that can be built beyond what is feasibly represented with existing B-rep systems. AM systems provide a mechanism for converting digital models to physical parts through direct control over local material deposition, meaning full utilization of AM technologies necessitate a complete description throughout the interior of the model. The information needed for AM fabrication corresponds more directly to a volumetric representation; however, volumetric models encounter practical limitations. Conversion of B-Rep model descriptions into a volumetric form can be challenging due to the computational cost and potential robustness issues associated with B-Rep point membership classification (PMC). Volumetric representations can also be difficult to handle at scales supported by AM technologies, with existing AM systems capable of micron-level features (including internal variations in material or process properties) across meter-sized build volumes [Bader et al. 2018]. The corresponding voxel models can include hundreds of billions of elements with associated memory requirements running upwards of terabytes. Current modeling software is incapable of designing and representing models at these resolutions, and the issue will only be exacerbated by enhanced capabilities of future AM systems.

To address the challenge of representing and designing high resolution models, we introduce a geometric modeling kernel built using a function-based representation (F-Rep) aimed at providing the capabilities to efficiently design, visualize, and print large scale volumetric models. F-Reps provide an alternative model basis which allows for unambiguous and highly parallelizable PMC [Bloomenthal and Wyvill 1997; Pasko et al. 1995; Keeter 2020] while still supporting the modeling techniques common to B-Rep geometry kernels. Our geometric modeling kernel framework consists of two components: an interpreter used to evaluate user-defined models and a sparse volume data structure that stores output produced by the interpreter for visualization and manipulation. Together these two components, in conjunction with the graphics processing unit (GPU), provide the means to support large-scale volumetric modeling applications at scales and speeds not currently achievable by existing software tools.

Skip 2RELATED WORK Section

2 RELATED WORK

2.1 Function-based Representation

F-Reps serve as a powerful alternative to B-Reps for solid modeling, support many common model assembly techniques like constructive solid geometry (CSG), are capable of exact representations, and have been employed in numerous applications ranging from the modeling of micro-structures [Pasko et al. 2011] to large scale terrain [Génevaux et al. 2015]. However, F-Reps define models using implicit functions, and implicit functions are conventionally considered difficult to render directly. In addition, F-Reps have traditionally taken a monolithic approach to model representation; all modifications made to a model’s geometry reside in a single expression. As a monolithic F-Rep model grows in complexity, even modifications that apply only to localized regions make the model globally more expensive to evaluate. A sizable portion of research around F-Reps has been focused on reducing the computational cost required for evaluation and visualization.

Traditionally two methods have been used for F-Rep rendering. The first involves polygonization of an implicitly defined level-set using algorithms like marching cubes or dual contouring [Maple 2003; Ju et al. 2002; de Araújo et al. 2015] followed by polygon rendering via rasterization. The polygonization approach takes advantage of the extensive software and hardware support for polygonal mesh visualization, however there are also drawbacks involving robustness and accuracy. Polygonal meshes cannot exactly represent curved surfaces; shading methods can be employed to make the surface mesh appear smooth, but polygonization inevitably introduces geometric artifacts that degrade accuracy of the model. Polygonization is also not guaranteed to produce a mesh corresponding to the boundary of a valid 3D solid. As a result, it is wise to avoid relying on the robustness of PMC based on polygonized models [Manson and Schaefer 2010; Fryazinov and Pasko 2007]. The second and more common method for F-Rep visualization is ray-casting; models are rendered by casting rays through space which are tested for intersection with the region described by the geometric defining function. Finding the F-Rep surface in a robust fashion while also limiting the computational cost associated with computing the intersection is a challenging problem equivalent to root-finding.

Intersections can be computed analytically only for limited classes of simple geometric defining functions, e.g., low degree polynomials [Loop and Blinn 2006; Kanamori et al. 2008; Singh and Narayanan 2010]. When the intersection cannot be computed analytically, marching techniques can be used to search along the ray for surface intersections. To reduce the total number of intersection checks, if a bound on the derivative of the defining function is known either locally or globally, Lipshitz-based techniques can be used to generate an adaptive step size to enable more efficient marching along the ray [Hart et al. 1989; Hart 1995; Keinert et al. 2014; Seyb et al. 2019; Galin et al. 2020]. Alternatively, if a bound on the derivative is not known, interval arithmetic (IA) [Moore 1966] techniques can be applied to recursively subdivide a ray until a root of the defining function is isolated [Mitchell 1990; de Figueiredo and Stolfi 2004; De Cusatis et al. 1999; Knoll et al. 2007;, 2009a].

In addition to root isolation, IA-based techniques can also be used to reduce the computation cost associated with evaluation of the geometric defining function. Duff [1992] demonstrates how interval arithmetic can be used to construct a hierarchical spatial partitioning and locally simplify the defining function. When constructing an F-Rep defined through CSG operations with geometric primitives (e.g., cube, sphere, torus, etc.), a reduced or “pruned” evaluation tree that preserves the local geometry can be derived for any spatial subdivision. Evaluations associated with geometric primitives which do not overlap the spatial subdivision are omitted, reducing the function evaluation count. Keeter [2020] extends the work of Duff [1992] by presenting a GPU interpreter which utilizes IA to prune the geometric defining function. However, instead of pruning the tree at the level of geometric primitives, Keeter takes the approach of pruning the individual mathematical operations which comprise the geometric defining function. Jazar and Kry [2023] apply IA to the temporal domain, presenting methods to efficiently evaluate closed-form animated implicit surfaces. Their evaluation scheme uses IA to range-bound the time derivatives and locally evaluate only the regions of the defining function which change between time steps.

2.2 Sparse Volume Data Structures

We utilize a sparse volume data structure to manage output produced by our GPU interpreter. Sparse volumes are well studied and arise in many application domains. For a thorough examination of state-of-the-art techniques, including sparse volume construction and visualization, we refer the reader to Aleksandrov et al. [2021] and Beyer et al. [2015]. Our focus lies primarily on GPU-based sparse volume structures which can be integrated into our geometry kernel to store the interpreter output for visualization and manipulation at scales which meet and exceed existing software and hardware CAD tools.

Sparse volumes are necessary due to the extreme memory requirements of volumetric data. Sparsity can significantly reduce memory consumption of a given volume by allocating memory to describe only the relevant properties in regions of space within the volume that are needed. Octrees are an example of a commonly used structure which provide sparsity [Knoll et al. 2006;, 2009b; Kruger and Westermann 2003] and, in addition to reducing memory overhead, can be used to improve rendering performance with techniques like empty space skipping [Ruijters and Vilanova 2006; Zellmann 2019]. When sparsity alone is insufficient, out-of-core implementations can also be used [Gobbetti et al. 2008; Fogal et al. 2013; Liu et al. 2012] in which data is dynamically streamed to the GPU as needed, allowing for sparse grids to contain information exceeding GPU memory.

Other hierarchical multi-level grids exist, for example the bounding volume hierarchy (BVH) [Ganter and Manzke 2019]. Of most immediate interest is the \(N^3\)-tree, a structure which provides the means for easily re-partitioning input data to optimize memory efficiency and traversal speed. Crassin et al. [2009] present the earliest implementation of a GPU-based sparse grid which utilizes an \(N^3\)-tree, dubbed GigaVoxel, to enable visualization of massive volumetric data. Hoetzlein [2016] introduces GVDB, a GPU-based sparse grid inspired by Museth’s OpenVDB [2013]. In an approach similar to Crassin et al. [2009], GVDB pairs an \(N^3\)-tree with a voxel atlas comprised of bricks assigned to tree nodes. GVDB uses a novel memory pooling architecture for storing the tree topology to achieve major performance improvements over previous methods.

More recently Museth provided a GPU extension of OpenVDB in the form of NanoVDB [2021]; a linearized, read-only representation of the OpenVDB structure. While limited in use outside of direct volume rendering due to its read-only nature, NanoVDB provides impressive results in render-based workloads. Kim et al. [2022] also extend OpenVDB with NeuralVDB by incorporating neural networks to learn the sparse grid topology and volumetric data contained within. The trained networks greatly reduce memory needed to store the sparse grid, but the requirement for both training and decoding steps provide a significant challenge for implementation of real-time applications.

2.3 Contributions

Our geometry kernel design addresses issues associated with current F-Rep geometry kernels and their applications to CAD. While existing kernels, like the one presented by Keeter [2020], can achieve impressive rendering speeds, they do not provide a means of efficiently accessing data produced by an F-Rep model’s defining function outside of a rendering context. The data derived from an F-Rep model is treated as ephemeral, existing only for the length of a render frame. Interacting with a model in any context outside of rendering is typically implemented by first tessellating the model’s surface. This approach not only makes common CAD operations like computing a model’s bulk properties reliant on the robustness and accuracy of the tessellation process, it also results in the discarding of all volumetric information obtained by querying the F-Rep. To overcome such limits on access to data for the underlying F-Rep, our geometry kernel is designed to both improve performance over existing F-Rep geometry kernels and to address the challenges of managing large scale volumetric data produced by F-Rep evaluation, avoiding the need for surface tessellation. Contributions include:

  • JIT compilation of user-created model defining functions, improving interpreter performance.

  • Enhanced pruning capabilities which extend prunability to the arithmetic operations, allowing for additional defining function optimization.

  • Reduced memory requirements for storing optimized copies of a model’s defining function.

  • Integration of a sparse volume data structure to efficiently manage and store data produced by the interpreter.

  • A demonstration of how the sparse volume data structure can be utilized to efficiently perform CAD operations like slice decomposition and bulk property evaluation without the need for surface tessellation.

  • A thorough evaluation of rendering performance, demonstrating improvements in frame-rate and supported model grid size compared to existing methods.

Skip 3F-REP INTERPRETER Section

3 F-REP INTERPRETER

3.1 Model Representation

Within modeling software, the geometry kernel defines the allowable set of operations which can be used to describe a model and forms the foundation from which a model is constructed. Our approach is to implement a geometry kernel which uses a general-purpose interpreter on the GPU to handle model evaluation. Models are defined through function composition; individual functions representing mathematical operations are exposed to the user and can be composed form a hierarchical graph structure, or “function composition tree”, describing a model’s geometric defining function. The mathematical operations behave as an assembly language for authoring models. This approach is similar in principle to Keeter, but deviates significantly in regards to implementation to enable support of additional features such as JIT compilation.

An example function composition tree associated with Equation (1), which defines a unit disc in \(\mathbb {R}^2\) or a unit radius cylinder aligned with the z-axis in \(\mathbb {R}^3\), is provided in Figure 1. (1) \(\begin{equation} f_{\textrm {disk}}(x,y) \lt 0 \;\; \textrm {where} \;\; f_{\textrm {disk}}(x,y) = \sqrt {x^2 + y^2} - 1 \end{equation}\)

Fig. 1.

Fig. 1. An implicit equation, level-set visualization, and slice stack of a wood screw as well as a physical model fabricated on a Stratasys J750 3D printer.

Fig. 2.

Fig. 2. A defining function for a unit disk depicted as a function composition tree where each node in the tree represents an algebraic operation. x and y represent the x and y components of a given point in space for which the function is evaluated at.

The interpreter traverses the composition tree, evaluating the function associated with each node, to build the model. While this approach to evaluation is typically less efficient when compared to a model-specific program, we just-in-time (JIT) compile and locally optimize the defining function to reduce interpreter overhead.

3.2 JIT Operation Grouping

Our interpreter supports a range of built-in mathematical operations, including functions corresponding to algebraic operations, which can be used to construct a model. In addition to the fixed set of base operations, our interpreter uses a novel JIT compilation scheme, allowing for dynamic expansion of the interpreter’s function set to reduce the cost associated with interpreter evaluation.

A primary drawback of interpreters is the overhead required to fetch and execute instructions. Each node in the function composition tree has a corresponding instruction which represents a mathematical function to be called by the interpreter. The larger the tree, the more functions that need to be called, the greater the interpreter overhead necessary to evaluate the tree. When performing localized evaluation, the total number of instructions can be reduced by eliminating, or “pruning”, branches of the function composition tree which cannot affect local geometry [Keeter 2020]. We apply pruning when possible but, as detailed in Section 3.5, only certain classes of operations are capable of pruning the function composition tree. Defining functions containing few to no prunable candidate operations either cannot be locally optimized or may not offset evaluation cost enough to warrant usage of the interpreter. We employ JIT compilation as an additional optimization strategy to operations which do not meet the pruning classification.

To reduce interpreter overhead the function composition tree is simplified by grouping together sequential non-prune candidate operations. Each sequence of operations is merged, or “fused”, together to form a single function, or “operation group”, which requires only one interpreter instruction to execute. Operations which are capable of pruning the function composition tree are kept independent of operation groups as to preserve all possible local reductions of the function composition tree. Figure 2 demonstrates how the algebraic operations which form the unit disk’s defining function from Equation (1) can be packaged into a single new function replacing the tree of composed algebraic functions from Figure 1.

Fig. 3.

Fig. 3. A function composition tree representing the unit disk in which the algebraic operations have been merged into a single node. Instead of defining the disk using several functions as can be seen in Figure 1, JIT compilation allows for the defining function to be represented using one function which combines all algebraic operations together. This reduces the number of functions in the composition tree enabling faster evaluation by the interpreter.

Operation groups are generated automatically through the geometry kernel by traversing the function composition tree and recursively merging parent nodes with their children as described in Algorithm 1. Once the function composition tree is fully traversed, an additional optimization pass is applied to each operation group where repeat expressions and intermediate values are identified for reuse and stored locally within slots. It should be noted that we do not implement this step directly and instead take advantage of the CUDA compiler to handle value reuse identification. Instead, we topologically sort each operation group subgraph and store values using static single-assignment. The compiler then produces optimized code during interpreter compilation.

Listing 1 shows the result of the topological sorting pass applied to the \(f_{\textrm {disk}}\) operation group from Figure 2. For reasons explained in Section 3.6, identification of repeat expressions is limited to operation groups and is not applied to the function composition tree as a whole.

Listing 1.

Listing 1. C++ implementation of the \(f_{\textrm {disk}}\) operation group generated by the geometry kernel. Static single-assignment is used so that the CUDA compiler is able to further optimize the operation group implementation.

3.3 Operation Encoding

In order to facilitate communication between the CPU, where function composition trees are constructed, and the GPU interpreter, where function composition trees are evaluated, we assign each node of the function composition tree an operation identifier, or “opcode”, which the interpreter uses to obtain a reference to the associated function. The set of built-in operations are assigned opcodes which index into a fixed function table. For operation groups, the geometry kernel assigns opcodes that index into a dynamically generated JIT function table.

3.4 Compilation Pipeline

In addition to opcode generation, we register newly constructed operation groups with the interpreter so they can be compiled for use during evaluation on the GPU. We construct a string representation of each registered operation group to be JIT compiled using NVRTC, NVIDIA’s runtime compilation library [Cook 2013]. NVRTC produces parallel thread execution (PTX) files which are loaded by the CUDA linker to generate a CUDA module. Each time a new operation group is defined, the module must be recompiled to update the JIT function table. Caching is employed at several levels of the compilation pipeline to both prevent unnecessary compilation and to make recompilation as fast as possible. Compilation is deferred until just before the interpreter is executed to avoid repeated compilations when introducing multiple new functions.

3.5 Pruning

After interpreter compilation, we move the function composition tree to the GPU. A final preprocessing step is applied to the defining function where branches of the function composition tree which have no affect on localized regions of the geometry are omitted, or “pruned”. Pruning produces locally simplified function composition trees that preserve model geometry while avoiding unnecessary function evaluations. Because functions in the composition tree often influence the geometry only in localized regions, it is worthwhile to avoid unnecessary evaluations of a component function outside of the region where it contributes to defining the geometry. Without pruning, adding new localized features makes the function composition tree globally more expensive to evaluate, and large amounts of compute time can be spent evaluating functions within the composition tree which do not contribute to the local geometry.

The pruning component of the GPU interpreter is supported by employing interval arithmetic (IA) [Moore 1966; Moore et al. 2009], a technique initially developed as a method for tracking error inherent in finite-precision floating point representation of real numbers. IA involves extending functions from operation on real inputs to operate over intervals representing sets of real numbers. A generic interval \(\hat{x}\) is defined as: (2) \(\begin{equation} \hat{x} = (\bar{x}, \bar{x}) \; \textrm {where} \; \lbrace x \in \mathbb {R} \; | \; \bar{x} \le x \le \bar{x} \rbrace \end{equation}\)

The interval extension, \(\hat{f}\), of the real function f, satisfies the following condition: (3) \(\begin{equation} \lbrace f(x) \in \hat{f}(\hat{x}) \; \forall \; x \in \hat{x} \rbrace \end{equation}\)

An interval extension (IE) operates on intervals and returns an interval whose range contains all possible output values of the real function for any real value in the input interval. We use IA to enable pruning by identifying components of the function composition tree that do not contribute to the model’s geometry in a given sub-region [Keeter 2020].

Pruning is performed during evaluation of the IE of the model’s defining function which we generate by replacing each function, \(f_n\), comprising the function composition tree with its IE, \(\hat{f}_n\). We built our own IA library which defines the IE of standard mathematical operations using a suite of rounding-controlled floating-point intrinsics provided within CUDA. A point, \(\boldsymbol {p} = \lbrace x,y,z \rbrace\), provided as input to the defining function is also replaced with an axis aligned box represented through coordinate intervals, \(\boldsymbol {\hat{p}} = \lbrace \hat{x}, \hat{y}, \hat{z} \rbrace\). As the function composition tree is traversed, attempts to prune the tree are initiated at any node containing an operation designated as a “prunable candidate”. Any binary function belonging to the fixed set of built-in interpreter operations is classified as a candidate for pruning if assigned a “prune function” within the geometry kernel. A prune function specifies the criteria by which one of the binary function’s interval inputs is incapable of contributing to the output. When the prune function successfully identifies an input as not contributing, the branch of the function composition tree producing the non-contributing interval can be omitted without modification to the resulting geometry within the interval region \(\boldsymbol {\hat{p}}\). The pruned function composition tree can then be used in place of the unpruned function composition tree when evaluating any point or subinterval contained in \(\boldsymbol {\hat{p}}\).

The most straightforward candidates for pruning in a F-Rep implementation of CSG are min and max, which implement union and intersection, respectively. The prune functions associated with min and max are able to identify non-contributing components of the function composition tree by checking for disjoint input intervals (Algorithm 2). An illustration which uses color-coding to demonstrate how pruned function composition trees manifest themselves within a model is provided in Figure 3. The model’s defining function consists of a single prune candidate function, \(f_{\textrm {min}}\), which combines a sphere and block resulting in a function composition tree with a total of three pruned tree permutations. Each color on the model corresponds to one of the three pruned tree arrangements. The first pruned configuration, shown in green, appears at the interface between the sphere and the block. In this region the input intervals provided to \(f_{\textrm {min}}\) during tree pruning are not disjoint, indicating that both \(f_{\textrm {sphere}}\) and \(f_{\textrm {block}}\) are capable of contributing to the model’s geometry. Pruning in this region produces a function composition tree with no omitted operations and evaluation of the defining function requires traversal of the full unpruned tree. In the red and blue regions of the model the input intervals to \(f_{\textrm {min}}\) are disjoint, producing pruned function composition trees with removed branches. Areas of the model in red denote a pruned tree which omits \(f_{\textrm {sphere}}\) while blue regions omit \(f_{\textrm {block}}\).

Fig. 4.

Fig. 4. A rendering of an F-Rep model with shading based on a color-coding of pruned function composition trees. Each color represents a unique pruned tree configuration assigned to a leaf node of the sparse volume structure used to store the model. There are a total of three pruned tree permutations: \(f_{\textrm {sphere}}\) pruned (red), \(f_{\textrm {block}}\) pruned (blue), and no pruning (green). \(f_{\textrm {block}}\) and \(f_{\textrm {sphere}}\) each represent an operation grouping containing the defining functions for the block and sphere, respectively, while \(f_{\textrm {min}}\) defines the union operation via the min function.

We extend the list of supported prunable candidate functions beyond the min and max operations in the geometry kernel from Keeter [2020] by including prune functions for addition, subtraction, multiplication, and division. Inclusion of the algebraic operation also motivates our support to introduce user control to selectively disable pruning candidacy. While pruning of min and max inputs is generally desired, the algebraic functions require much stricter criteria when identifying non-contributing inputs and do not typically result in successfully pruned inputs. For example, addition requires one input to be tightly bounded to \((0,0)\) for pruning to take place. However, there are many modeling operations like lofting, which routinely occur with intervals known to meet such algebraic pruning criteria. User-controlled disabling of operation prunability allows for pruning optimizations to be applied in situations where an operation is expected to produce a pruned output, and saves the interpreter from having to call prune functions when identification of non-contributing intervals is unlikely. The geometry kernel is also free to incorporate operation which are no longer prunable into operation groups, further simplifying the function composition tree.

3.6 Evaluation

Evaluation of either the model’s defining function or IE involves traversing the function composition tree. A vector valued input \(\boldsymbol {p}\), specifying either a three-dimensional region or point in space is provided to the interpreter to begin evaluation. Regions, represented through intervals, describe axis-aligned volumes over which the IE should be evaluated. Points, on the other hand, define a location for which the defining function should be computed. The input type of \(\boldsymbol {p}\) controls if the scalar or IE function tables are loaded, dictating whether the defining function or its IE is evaluated. Given a point \(\boldsymbol {p}\), the interpreter’s main execution loop (Algorithm 3) traverses the function composition tree using a top-down approach, alternating between descent and ascent until each node has been visited (Algorithms 4 and 5). Functions associated with internal nodes are evaluated after their children have been visited while leaf nodes are executed when reached (Algorithms 6 and 7). Intermediate values produced while traversing the tree are stored within a stack-based memory system; leaf node evaluations push new values onto the stack and internal node evaluations both pop old values from, and push new values onto the stack. The essential distinction made between evaluation of the defining function and its IE is found in Algorithm 6, where we call the prune function associated with the prunable candidate operation which can lead to pruning of the input branches.

Each region provided as input to the IE of the defining function can potentially have a distinct pruned function composition tree. Minimizing the memory consumption of prune trees is crucial, as we are concerned primarily with large volumes containing many regions over which a significant number of pruned trees may be generated. To avoid excessive memory utilization we store the locally optimized trees in a compact representation which does not require generating copies of the initial function composition tree. Successfully pruned branches are encoded as traversal decisions which are used during subsequent evaluations to skip over irrelevant components of the defining function. Each prune decision is stored as two bits, a “prune bit” and a “choice bit”. An active prune bit specifies that a branch of the tree has been pruned, while the choice bit determines the direction to traverse if pruned. During evaluation of the defining function’s IE, when a prunable candidate operation is encountered with an inactive prune bit its prune function is evaluated. If either input to the function is determined not to contribute, the prune and choice bit associated with the prunable operation are updated and used on subsequent visits to direct traversal, skipping pruned branches (Algorithm 4 and 6). Each prune and choice bit pair constitute a single prune decision and, when combined with all other prune and choice bits, form a bitmask representing a pruned function composition tree. The bitmask serves as a compact and lightweight representation allowing for millions of pruned tree copies (associated with numerous spatial sub-domains) to be stored with minimal memory overhead. The bitmask-based approach helps maximize memory access speeds by maintaining only a single copy of the function composition tree which can be cached and accessed efficiently by all GPU threads.

Our need to minimize pruned tree memory usage also makes it difficult to apply the slot-based value-reuse optimization, used within operation groups and in earlier work by Keeter, to the entire function composition tree. Complications arise in the slot-based method with the inclusion of branch pruning; as branches are pruned, slots may become inactive. Information pertaining to the slots each operation reads from and writes to must be updated and maintained for each pruned tree. This information becomes prohibitively expensive when storing even a moderate number of pruned trees. While implementation differences prevent direct comparison between our pruned tree representation and Keeter’s, estimates can be made to illustrate the memory savings of our approach. For example, Keeter benchmarks his geometry kernel using a bear head model, provided by Hazel Fraticelli and Anthony Taconi, with a defining function consisting of 541 terms, 27 of which are CSG operations. Keeter allocates pruned tree nodes in blocks of 64, totalling 512 bytes (8 bytes per node). Provided every pruned tree is reduced to a single 64 node chunk, the lower bound on memory consumption for a single pruned tree is 512 bytes. Our pruned tree bitmasks representation requires 2 bits per CSG operation, totaling 8 bytes per pruned tree, consuming at least 64 times less memory. Storing \(1,\!000,\!000\) unique pruned trees requires between 512 MB and 4.5 GB using Keeter’s geometry kernel while our implementation needs only 8 MB.

Instead of using the slot-based memory system, our solution is to implement the hybrid scheme detailed in this section. A stack is used to store output produced during traversal of the function composition tree, while slots are used within the confines of operation groups to maintain support of value reuse. Prunable candidate operations are excluded from operation groups preventing value reuse between branches, leaving slots within operation groups unaffected when a branch is pruned. The hybrid implementation allows for value reuse without the overhead required to track slot related information.

Skip 4SPARSE VOLUME DATA STRUCTURE Section

4 SPARSE VOLUME DATA STRUCTURE

4.1 Data Layout

To manage high resolution models we use a spacial subdivision approach to store function values and pruned function composition trees produced by the interpreter in a compact and accessible form. We implement an \(N^3\)-tree sparse volume data structure which builds from the memory pooling system of both OpenVDB and GVDB [Museth 2013; Hoetzlein 2016]. The design of the \(N^3\)-tree consists of nodes at multiple levels forming a grid hierarchy. The subdivision factor, N, of the hierarchy is configurable in much the same way as OpenVDB and GVDB; a target tree topology configuration is specified as a vector of values representing the log\(_2\) resolution of each grid level. For example, a configuration of \(\mathinner {\langle {4,3,2}\rangle }\) corresponds to a 3-level tree where each leaf brick contains \((2^4)^3\) voxels, interior nodes contain \((2^3)^3\) leaf node children, and top level nodes contain \((2^2)^3\) interior node children. The resulting sparse grid supports up to \(512^3\) voxels.

We store the \(N^3\)-tree in an efficient memory layout through the use of an allocator which provides the means to access the \(N^3\)-tree data on both the CPU and GPU. Museth and Hoetzlein observe that while node data and resolution vary between grid levels, nodes at a particular level are divided similarly. As such, we allocate a “pool group” at each tree depth level responsible for storing node data (Figure 4). A pool group is organized into individual pools of the same number of elements. Each pool within a pool group describes a collection of data, or “property”, for all nodes, while entries of each pool can be accessed via an index to obtain all properties for a single node. Organization through memory pools and node properties provides a highly configurable approach to tree representation.

Fig. 5.

Fig. 5. A visual representation of the data and topology layout of our \(N^3\) -tree sparse data structure. Each depth level of the tree has an associated pool group which stores information on the tree. The pool group is divided into individual pools each storing a grouping of data, or “property”, for all nodes at a given depth level. Every pool can be accessed via an index to obtain a node-specific copy of a property. The root depth of the tree consists of a single node while the internal and leaf depths can contain multiple nodes. In addition to the pool groups which describes the topology layout, a geometry pool group containing voxel bricks storing output of the interpreter is allocated alongside the leaf depth. The geometry pool group is divided into channels each of which describes a single value at a given voxel location. Leaf nodes reserve voxel bricks from the geometry pool group to store model information generated by the interpreter corresponding to the node’s location in space.

The default configuration of our \(N^3\)-tree consists of a primary pool group allocated at each depth level. This primary pool group is referred to as the topology pool group as it stores the connectivity information of the hierarchical grid. The topology pool group is composed of three properties; Node, Children, and Geometry (Figure 5). The Node property’s primary responsibility is to maintain information related to the following:

Fig. 6.

Fig. 6. Each data structure defines a property used to represent our \(N^3\) tree. The value of N within the Children property represents the log \(_2\) number of children for a given tree depth level specified in the topology configuration.

  • depth: topology pool group which the node belongs to

  • topology_idx: location of the node within its topology pool group

  • parent: topology_idx of the node’s parent, located in the pool group one depth level higher

  • origin: node’s spatial position within the sparse grid

Together, depth and topology_idx form a value pair which uniquely define a node within the tree.

The Children, and Geometry properties track depth dependent information. Topology pool groups at depths other than the leaf depth store the Children property, maintaining information on each node’s children. The Children property defines a bitmask used to track active children and an index to store the position of the first active child in the topology pool group one depth level lower. During \(N^3\)-tree construction, a node’s children are allocated in a single batch so that they are contiguous in memory. A child node can be accessed by adding children_idx to the sum of all active bits within children_mask up to the desired child node’s position.

At the leaf depth of the \(N^3\)-tree, the subdivisions are associated with the grid at its full resolution. Node children are replaced with function values produced by interpreter evaluation of the model’s defining function, sampled at each grid location spanned by the leaf node. The sample function values represent model geometry and are stored in small groups of voxels, or “voxel bricks”, containing a number of elements equal to the leaf depth subdivision factor. The Children property is replaced with the Geometry property to track leaf node voxel brick ownership. Voxel bricks are kept in a separate pool group, referred to as the geometry pool group, which is allocated along side the leaf depth topology pool group. Individual pools within the geometry pool group are referred to as channels, terminology we adopt from GVDB, and contain voxel bricks which store information describing a single attribute (e.g., function values representing model geometry). The Geometry property contains an index, geometry_idx, pointing into the geometry pool group representing node ownership of a voxel brick. As voxel bricks are assigned to nodes, we update the contents via the interpreter. Additionally, the \(N^3\)-tree can be extended to support multi-resolution grids by allocating additional geometry pool groups and including the Geometry property alongside depth levels outside the leaf depth.

4.2 Construction

\(N^3\)-tree construction involves identifying and selectively activating nodes within the tree which contain model geometry. We denote the set of active \(N^3\)-tree nodes which intersect the model as the model’s topology. To obtain a model-node intersection, we must determine if the surface level-set of the defining function intersects the node (i.e., a multi-dimensional equivalent to the root finding problem). The process of obtaining model-node intersections has historically limited usage of sparse volume structures due to the associated computational cost. Traditional methods involve evaluating a majority of the grid and activating regions which contain grid values within some user-definable range of the level-set defining the surface, a process which is computationally intractable for large grids.

Recall that an interval returned by a function’s IE contains all possible output values of the real function for any real value in the input interval. We leverage this property of IA to determine active regions of the \(N^3\)-tree using the IE of the model’s defining function. The output interval, \(\hat{F}\), produced by evaluating the defining function’s IE, contains all possible real values of the defining function within the input interval region \(\boldsymbol {\hat{p}}\). The interval range of \(\hat{F}\) then determines if the region \(\boldsymbol {\hat{p}}\) may contain model geometry.

To build the \(N^3\)-tree we implement a recursive algorithm on the GPU which takes advantage of the interpreter and model defining function to simultaneously prune the function composition tree and determine active nodes of the \(N^3\)-tree (Algorithm 8). Our algorithm, ConstructTopology, is launched to initially operate on the root node of the \(N^3\)-tree. For each potential child node, a thread is created on the GPU to evaluate the IE of the defining function using the interpreter. Evaluation of a topology node’s children occurs within a single thread-block on the GPU so that all threads in the block iterate over the same pruned function composition tree, eliminating thread divergence. The input \(\boldsymbol {\hat{p}}\) provided to the interpreter by each thread corresponds to an axis-aligned box defined by intervals which cover the region occupied by the child node. Evaluating \(\hat{F} = \hat{f}(\boldsymbol {\hat{p}})\) allows for complete classification of the child node. One of three possible outcomes is produced based on the output interval result:

  • \(0 \lt\) \(\, \hat{F}\) : The child node is fully outside the model and does not need to be allocated.

  • \(0 \in \hat{F}\) : The child node cannot be classified as fully interior or exterior to the model. It may contain a portion of the model so further subdivide is required.

  • \(0 \gt\) \(\hat{F}\) : The child node is fully interior to the model. If dynamic topology, introduced in Section 4.3, is enabled then the node is marked as dynamic to be constructed later if needed. Otherwise, the node is subdivided to further resolve the grid.

In each case where the grid must be subdivided, a recursive call to ConstructTopology is made in which the root node input is replaced by the newly constructed child node. A parent node’s interval range is subdivided such that each child node interval is a sub-interval of its parent, allowing the pruned function composition tree output of the parent node to be used in place of the non-pruned function composition tree during evaluation. Subdivision is repeated until a maximum depth is reached or all children have been fully classified. The minimal memory footprint of our bitmask-based pruned tree representation allows for all pruned trees generated during topology construction to be stored alongside the associated node by introducing a new pool property to the \(N^3\)-tree which stores each bitmask. Additionally, the node classification provided by \(\hat{F}\) is encoded within each \(N^3\)-tree node to differentiate nodes interior to the model from nodes which might contain a portion of the model’s surface.

Once model topology is determined, model geometry is evaluated on-demand at active \(N^3\)-tree leaf nodes. Voxel brick assignment and geometry evaluation occur when a value from the grid is requested. The \(N^3\)-tree leaf node containing the desired grid value reserves a voxel brick and updates the brick contents using the interpreter. The pruned function composition tree associated with the region spanned by the node is used in place of the unpruned tree to reduce evaluation time. When the number of active leaf nodes is less than or equal to the geometry pool group size, each leaf node can be assigned a unique voxel brick. In situations where the leaf node count exceeds geometry pool group capacity, voxel bricks are obtained from the geometry pool group in a least recently used (LRU) fashion. By using the interpreter to evaluate the geometry as-needed, we reduce the in-memory footprint of the sparse grid to that of just the grid topology.

4.3 Dynamic Nodes

Our \(N^3\)-tree structure optionally makes use of “dynamic nodes” which we introduce to decrease the memory required to store model topology by allocating and constructing a bulk of the \(N^3\)-tree dynamically. Models which approximately fill the space spanned by an \(N^3\)-tree occupy close to the entire volume, resulting in nearly every node at every level of the \(N^3\)-tree being allocated. In such scenarios, nodes interior to the model comprise a majority of the total node allocations. As a result, the handling of “interior nodes”, nodes fully inside the model, has the potential to drastically reduce memory requirements for storing model topology. Some sparse grid structures, including OpenVDB and GVDB, support alternative approaches to interior node representation. One such approach is to replace interior node children with a constant value, assumed to apply to all grid entries spanned by the node, indicating the grid should not be subdivided further. Another strategy is to disregard interior node allocations altogether and only allocate memory for nodes intersecting the model’s narrow band surface level-set. In rendering-related applications these approaches are often satisfactory as the focus is on the model’s surface geometry. In other applications like CAD, function values within the model can be sufficiently important as to require the grid be fully resolved interior to the model.

Existing internal node representations do not provide ways to reduce topology memory requirements without sacrificing grid resolution. Our solution to handling the issue of interior node allocation is to implement dynamic nodes, which provide a means to fully resolve the grid through dynamic construction of the model’s topology. To accommodate dynamic nodes the topology pool groups comprising the \(N^3\)-tree are initially sized to fit only non-dynamic, or “static”, nodes (i.e., nodes which cannot be classified as fully interior or exterior) and are extended by a user-controlled value, D, of new nodes which represent dynamic nodes. An additional property is added to each topology pool group to track dynamic children as well as dynamic node ownership.

During \(N^3\)-tree assembly, when dynamic topology is enabled, nodes determined to be interior to the model’s geometry are classified as dynamic and are not subdivided further. The newly added dynamic property contains a bitmask which uses set bits to indicate the dynamic status of child nodes. Once the \(N^3\)-tree is built, if a child node is requested which corresponds to an active entry in the parent node’s dynamic bitmask, the parent node reserves one of the allocated dynamic nodes, sets the dynamic node as its child, and records the node index to reflect dynamic node ownership. This process is repeated at each depth level until a dynamic leaf node is constructed, which can then be assigned a voxel brick to be evaluated by the interpreter. The result is a “dynamic branch” which extends from the first non-dynamic parent node down to a leaf node and remains active until any dynamic node within the dynamic branch is reassigned to another dynamic branch. By adjusting the value of D, internal node allocations can be directly controlled. Setting D to zero eliminates internal node allocations altogether, useful in situations where grid values inside the model are not needed (e.g., when rendering the surface of the model). This produces topology with a memory footprint comparable to existing alternative interior node representations. In situations where interior nodes are needed, the number of allocated dynamic node count can be set accordingly.

4.4 Visualization

We display models with a ray-casting rendering algorithm that makes use of the hierarchical 3D digital differential analyzer (DDA) developed by Museth [2014] and later adapted to the GPU by Hoetzlein.We divide the rendering process into two phases which are repeated until each ray intersects the model or exits the \(N^3\)-tree. We also allocate a node buffer and a ray buffer to track ray information between passes. The node buffer stores topology_idx values of leaf nodes while the distance buffer maintains information on total distance traveled along each ray.

The first rendering phase involves determining ray-model intersections. To start, rays are tested for intersection against the \(N^3\)-tree. Any intersection found results in traversal of the \(N^3\)-tree via the 3D DDA until a leaf node is found or the tree is exited. Successful leaf node intersections result in traversal of the leaf node’s voxel geometry brick. We assume only \(C^0\) continuity of the model’s defining function and use a uniform step size to march through a voxel brick. Interpolation is used to estimate values of the defining function at points along the ray until either the brick is exited or a ray-surface intersection is found. For shading purposes, surface normals are determined by computing the gradient of the function at the point of intersection (e.g., using finite differences). In cases where the class of defining function is limited, for example to signed distance functions, techniques like sphere tracing can be integrated into the DDA to dynamically set the voxel step size to enhance performance.

During traversal, active leaf nodes encountered which have not yet been assigned a voxel brick cause the ray to stop traversal and record the current ray distance and leaf node topology_idx in the distance and node buffer. To prevent a node from being added to the node buffer multiple times, one bit is reserved within the flag attribute of the Node property which is checked and atomically set before an attempt is made to record the node. The second rendering phase is triggered if the node buffer contains a non-zero number of entries after the first phase completes, indicating that active leaf nodes were hit that have not yet been assigned geometry. Unassigned voxel bricks, or voxel bricks belonging to nodes which have already been traversed during the first phase, are assigned to nodes recorded in the node buffer. The contents of the newly assigned voxel bricks are updated by the interpreter and the node flag is reset. Because the interpreter is implemented on the GPU, bricks can be updated directly on the GPU avoiding costly memory transfers between the CPU and GPU. The first phase is repeated and each ray uses the value recorded in the distance buffer to resume traversal from where it left off. These two rendering phases are repeated until the first phase finishes with an empty node buffer indicating that the rendering is completed. Because geometry is evaluated on-demand and evaluation is informed by ray-model intersection, only visible \(N^3\)-tree nodes have their voxel bricks contents updated, reducing total evaluation overhead when viewing large models.

4.5 Slice Decomposition

Model slice decomposition, or “slicing”, is the process by which a model is decomposed into a representation 3D printers can interpret to enable manufacturing of a physical copy of the model. For high-resolution volumetric models, a raster-based printing process is typically used as the means of manufacturing and takes as input a stack of images corresponding to horizontal slices of the model. Model slicing within our framework is enabled through the \(N^3\)-tree and consists of two steps. The first step involves iterating over active \(N^3\)-tree leaf nodes and collecting all voxels along a particular z-height into a slice buffer. Similar to visualization, all leaf nodes which have not yet been assigned geometry are assigned voxel bricks from the geometry pool group which are updated using the interpreter before their voxel contents are extracted. The second and final step is to move the slice buffer from the GPU to the CPU where the buffers are saved as images. Support can be extended to vector-based printers by introducing a third step which extracts the level-set contours defining the model’s surface from the slice buffer.

Printer resolution along the Cartesian axes of a raster-based additive manufacturing system are typically not identical, for example the Stratasys J750 PolyJet 3D printer has an x and y resolution of 42 microns while its z axis has a resolution of 14 microns. To accommodate axes with different resolutions, we implement coordinate-specific grid spacings to support sampling on non-cubic voxels. Evaluation of the defining function and its IE remain the same, all that is altered is the value p provided as input to the interpreter. The resulting slice images are distorted but recover the original geometry when printed due to the printer scaling along each axis.

4.6 Bulk Properties

A model’s physical characteristics, like center of mass, are useful for validating aspects of the model’s design. Computing these properties requires integrating over the model’s geometry. We evaluate volume, surface, and line integrals using methods presented by Yurtoglu et al. [2018], an integration approach which operates on arrays of function values sampled on a regular grid. Integral contributions arise from a stencil computation applied to each grid point. Non-zero contributions occur where the stencil crosses over the level-set defining the model’s surface. The node classification provided by IA during construction of the model topology allows for efficient filtering of topology nodes which do not contain the model’s surface. Only nodes which produce a non-zero integral contribution are evaluated.

Skip 5RESULTS Section

5 RESULTS

Waiting minutes to hours to construct and view models has a severely negative effect on productivity. The productivity consideration puts a significant emphasis not just on supporting large scale volumetric models, but doing so at reasonable rates on the kind of computing hardware typically used by designers and engineers. To validate our geometry kernel’s responsiveness we characterize its performance in two ways. First, we compare our geometry kernel with the kernel presented in Keeter [2020], which at the time was a state-of-the art GPU implementation of libfive [Keeter 2019], a library used as a basis for the kernel within the CAD software nTopology. Comparisons are made through a render-based workload involving complex models. Second, we benchmark the geometry kernel under a standard additive manufacturing workload involving generation of raster slice data of a high resolution model. In addition to benchmarking, we utilize our slicing pipeline to produce slice stack data of a wood screw which we fabricate using a 3D printer. All discussion and tables below describe performance on two systems: a lower end laptop equipped with a GTX 1650-Max Q GPU and Intel i7-1065G7 CPU as well as a higher end desktop computer containing an RTX 3090 and Intel i7-11700K. These systems are chosen to illustrate the capabilities of the geometry kernel on hardware corresponding to well-separated portions of the spectrum of computing power.

5.1 Performance Comparison

Due to substantial differences in geometry kernel implementation, direct comparisons of individual components of our kernel and the kernel presented by Keeter are difficult. The kernel implementation from Keeter requires that the interval pruning and geometry evaluation of a model be reevaluated every frame, and uses a screen-aligned and orthographic coordinate system which ties the model resolution directly to the resolution of the screen (e.g., a \(2048^3\) model volume requires a \(2048^2\) screen resolution to visualize). Within our kernel implementation the \(N^3\)-tree stores pruning information as well as model geometry, thus evaluation of these properties need not be performed each frame. The \(N^3\)-tree is built once, before rendering begins, and geometry is evaluated and stored as needed in the geometry pool group of the \(N^3\)-tree. Additionally, our rendering approach allows for a fixed dimension screen to render a model of any resolution. The \(N^3\)-tree also provides support for advanced rendering techniques like level-of-detail control which significantly improves performance for higher resolution models, when geometry voxel bricks approach the size of a pixel. Given these large differences, we use rendering frame-rate of the model at its maximum resolution as the metric for comparison as it best quantifies end-to-end performance of each geometry kernel.

Tables 1, 2, and 3 contain performance metrics for three different models, shown in Figure 7, from [Keeter 2020] using both Keeter’s and our geometry kernel. The bear head model supports limited pruning due to the small number of min and max functions, making it a good candidate for JIT optimization. The gear and architecture model contain larger numbers of min and max functions, limiting JIT optimization. Results for our kernel are presented under the table header “FVD” (function-based volumetric design) while Keeter’s are contained under the “MPR” (massively parallel rendering) header. We include results for each geometry kernel up to the point at which either the GPU runs out of memory or produces a frame-rate that is lower than one frame-per-second, denoted by a dash.

Table 1.

Table 1. Rendering Benchmark Results for the Bear Head Model over Various Grid Sizes and Topology Configurations

Table 2.

Table 2. Rendering Benchmark Results for the Gear Model

Table 3.

Table 3. Rendering Benchmark Results for the Architecture Model

Fig. 7.

Fig. 7. A rendering of an F-Rep wood screw designed and visualized using our geometry kernel. The top row shows two views of the wood screw rendered with shading based on a simple diffuse illumination model. The images in the bottom row show the same views of the wood screw but with shading based on a color-coding of the pruned function composition tree. Each colored region on the screw corresponds to a particular pruned tree configuration belonging to an \(N^3\) -tree leaf node. The color is generated by converting the bits of a node’s pruned tree into a single integer, scaling the integer to the range \([0, 1]\) , and mapping that range to the HSV color space. Prune bits which appear earlier in the pruned tree cause larger color differences from pruned bits which appear later in the tree due to occupying more significant bits of the resulting integer value.

Fig. 8.

Fig. 8. The bear head, gear, and architecture model from [Keeter 2020] used to benchmark our geometry kernel.

Table 1 includes an extensive parameter sweep of various topology configurations to characterize the affects of model topology on rendering performance and topology construction, while the remaining tables present more concise results covering only the topology configuration which produces the most favorable rendering performance for a given grid size. The “Configuration” header in Table 1 details the topology configuration, number of active \(N^3\)-tree leaf nodes, and occupancy rate specifying the percentage of the dense grid occupied by the model. To obtain the number of active voxel elements contained within the sparse grid, the grid size can be multiplied by the occupancy fraction. For our geometry kernel, in addition to frame-rate, we include timing results which detail additional work required for the first render frame. In particular, we present timing results for the sparse grid construction, as well as visible geometry identification and evaluation. Lastly, all rendering results for our kernel are determined using a fixed screen size of \(1280 \times 720\).

Comparing frame-rate results between geometry kernels shows that our kernel outperforms Keeter’s on each of the three models, with rendering improvement of over \(10 \times\). We are also able to visualize models at interactive rates at grid sizes previously unachievable. Variability in rendering performance of our geometry kernel is due in large part to geometry pool capacity and \(N^3\)-tree topology configuration which are discussed in detail in the following two paragraphs.

Geometry pool capacity dictates the number of geometry bricks which can be stored and reused for multiple rendering frames. Topology nodes which have not yet been assigned geometry prevent rays from progressing through the \(N^3\)-tree while rendering, requiring that new geometry be evaluated by the interpreter and that the tree be traversed again once the geometry has been updated, as described in Section 4.4. The timings associated with this process are provided under the “Traverse” and “Evaluate” table columns. Minimizing the number of tree traversal passes is key to achieving an optimal frame-rate. The ratio of visible geometry to geometry pool capacity should preferably be less than or equal to one so that complete tree traversal can be achieved in a single pass. The drop in frame-rate to below one frame-per-second when rendering each of the three models using our kernel is a result of visible geometry exceeding the allocated geometry pool capacity (and each GPU’s total memory capacity). Each frame requires close to a complete reevaluation of all visible geometry greatly reducing performance.

The \(N^3\)-tree topology configuration specifies the spatial subdivision of the grid, and thus how the underlying grid is partitioned. Varying the \(N^3\)-tree topology alters 3D DDA traversal time as well as geometry voxel brick dimension. When looking at variations in leaf depth topology configuration for a given grid size in Table 1, the downward trend in frame-rate timings indicate that smaller leaf node sizes support more responsive rendering. Larger leaf nodes correspond to larger geometry bricks and an increased grid occupancy. The larger geometry bricks are also more costly to traverse when intersected during rendering and limit geometry pool brick allocations due to requiring more memory. Conversely, smaller leaf nodes increase total \(N^3\)-tree node count, raising model topology memory requirements. \(N^3\)-tree build times, shown under the “Build” column of each table, also increase with smaller leaf node sizes as a result of having to evaluate the IE of the model’s defining function over more node regions. An optimal topology configuration is therefore application dependent, with smaller leaf nodes being more favorable for rendering and larger leaf nodes being more favorable to tree construction. It should be noted that we do not provide rendering results over varying internal node topology configurations. We use the 3D DDA algorithm from GVDB, for which metrics on how internal node size variations affects frame-rate are discussed by Hoetzlein [Hoetzlein 2016]. We concur with the conclusion that an internal node configurations on the order of \(N=3\) provide optimal results.

5.2 Model Slicing

The second metric by which we gauge performance of our geometry kernel is through the model slicing process described in Section 4.5. To demonstrate the effectiveness of our slicing pipeline and characterize how it scales with grid size we design and slice a wood screw model depicted in Figure 6. Table 4 presents slicing performance metrics for the screw model over several topology configurations and increasingly larger grid sizes. The “Evaluation” column reports time taken to evaluate all leaf node geometry and extract the geometry into slice buffers, while the “GPU to CPU” column reports time needed to move the slice buffers from the GPU to the CPU. Total slice time is obtained by adding the two columns together.

Table 4.
GridTopology# of NodesNode Occup.Evaluate (s)CPU to GPU (s)
StaticDynamic1650309016503090
\(128^2 \times 512\)\(\mathinner {\langle {3,3,3}\rangle }\quad\)4,5871,16835%0.1300.0550.0860.023
\(\mathinner {\langle {4,3,2}\rangle }\quad\)1,1704859%0.0660.032
\(\mathinner {\langle {5,3,1}\rangle }\quad\)228089%0.0650.032
\(256^2 \times 1024\)\(\mathinner {\langle {3,3,3,1}\rangle }\)17,31113,06823%0.3900.1200.2200.085
\(\mathinner {\langle {4,3,3}\rangle }\quad\)4,5871,16835%0.1900.065
\(\mathinner {\langle {5,3,2}\rangle }\quad\)1,1704859%0.1400.055
\(512^2 \times 2048\)\(\mathinner {\langle {3,3,3,2}\rangle }\)67,329122,10718%1.3000.4501.2000.409
\(\mathinner {\langle {4,3,3,1}\rangle }\)17,31113,06823%0.4700.160
\(\mathinner {\langle {5,3,3}\rangle }\quad\)4,5871,16835%0.3200.120
\(1024^2 \times 4096\)\(\mathinner {\langle {3,3,3,3}\rangle }\)265,8671,054,30616%3.9001.7007.5002.900
\(\mathinner {\langle {4,3,3,2}\rangle }\)67,329122,10718%2.1000.780
\(\mathinner {\langle {5,3,3,1}\rangle }\)17,31113,06823%1.4000.360
\(2048^2 \times 8192\)\(\mathinner {\langle {3,3,3,4}\rangle }\)1,056,3198,763,00715%23.0008.40059.00021.000
\(\mathinner {\langle {4,3,3,3}\rangle }\)265,8671,054,30616%12.0003.600
\(\mathinner {\langle {5,3,3,2}\rangle }\)67,329122,10718%12.0003.700
  • Performance was measured on a laptop equipped with a GTX 1650 Max-Q and a desktop containing a GTX 3090. Table entries containing a dash denote results which are omitted due to requiring an unreasonable time scale to complete.

Table 4. Slicing Benchmark Results for the Wood Screw Model over Various Grid Sizes and Topology Configurations

  • Performance was measured on a laptop equipped with a GTX 1650 Max-Q and a desktop containing a GTX 3090. Table entries containing a dash denote results which are omitted due to requiring an unreasonable time scale to complete.

Evaluating and filling the slice buffer dominates total slice time for smaller grid sizes while GPU to CPU transfer time becomes the dominant factor as the grid size increases. For a given grid size, slice evaluation time is affected by \(N^3\)-tree topology, primarily leaf node size. Leaf node geometry bricks with a voxel count that allows the GPU to achieve maximum compute throughput when evaluating a model’s defining function produce more optimal timings. This typically corresponds to larger leaf node sizes. However, as leaf node size increases voxel grid occupancy also rises, equating to more total evaluations of the model’s defining function. Evaluation count and compute throughput compete against one another resulting in grid-size dependent optimal topology configurations. For smaller grid sizes, where the total voxel count is low, larger leaf brick sizes perform more favorably. As the grid size increases, a leaf node size which strikes a balance between voxel grid occupancy and compute throughput produces the best timings. For both the GTX 1650 and RTX 3090, the preferred configuration initially corresponds to one with a leaf node size of \((2^5)^3\) but transitions to \((2^4)^3\) at the largest grid resolution. We also use the slicing pipeline to fabricate the wood screw using a Stratasys J750 PolyJet 3D printer and provide images of a representative slice stack as well as the finished print in Figure .

Skip 6FUTURE WORK Section

6 FUTURE WORK

Here we present several directions for future research with a focus on performance improvements to the geometry kernel.

6.1 Topology Evaluation

Our topology construction algorithm builds a model’s topology in a single pass. However, during visualization a majority of a model’s topology may be occluded. Ideally, only the visible topology would be constructed. Extending our view-dependent geometry evaluation scheme to model topology construction would reduce both the compute and memory requirements needed for visualization. A view-dependent evaluation would also aid in further reducing pruned tree memory requirements as only the visible topology regions would produce pruned tree data.

6.2 Geometry Evaluation

Geometry evaluation consumes a bulk of total model evaluation time. We use techniques such as tree pruning and view-dependent evaluation to significantly reduce this evaluation cost. However, after any edit has been made to the defining function we discard all geometry and rebuild the model from scratch. Incorporation of strategies which identify geometry that does not change between model edits would further reduce the compute overhead required when frequently modifying models, a common occurrence in CAD applications.

6.3 Interval Bounds

We use IA to inform topology construction so that model geometry is not needed to determine region occupancy. IA is conservative in its evaluation estimate resulting in spurious topology node allocations which do not contain model geometry. Replacing or augmenting IA with alternative techniques, for example affine arithmetic, which improve accuracy would aid in generating topology that more tightly fits to the model’s geometry, reducing both model evaluation time and memory requirements.

Skip 7CONCLUSION Section

7 CONCLUSION

We have presented an F-Rep geometry kernel designed for large scale volumetric modeling applications including large-volume, high-resolution CAD and additive manufacturing. The modeler consists of a novel F-Rep GPU interpreter and a sparse volume data structure. Our F-Rep geometry kernel builds from previous work by Keeter and provides improvements in the form of JIT compilation, increased function prunability, and a reduction of the memory footprint for storing pruned trees. The sparse volume data structure we develop is used to store interpreter output and provides the means to visualize and manipulate high resolution volumetric data. In addition, our geometry kernel framework performs well on GPU-enabled systems ranging from laptops to desktops, and further performance enhancements are expected as a result of ongoing improvements in GPU computing throughput, data transfer rates, and memory capacity.

REFERENCES

  1. Aleksandrov Mitko, Zlatanova Sisi, and Heslop David J.. 2021. Voxelisation algorithms and data structures: A review. Sensors 21, 24 (2021). Google ScholarGoogle ScholarCross RefCross Ref
  2. Bader Christoph, Kolb Dominik, Weaver James C., Sharma Sunanda, Hosny Ahmed, Costa João, and Oxman Neri. 2018. Making data matter: Voxel printing for the digital fabrication of data across scales and domains. Science Advances 4, 5 (2018). Google ScholarGoogle ScholarCross RefCross Ref
  3. Beyer Johanna, Hadwiger Markus, and Pfister Hanspeter. 2015. State-of-the-art in GPU-based large-scale volume visualization. Comput. Graph. Forum 34, 8 (Dec.2015), 1337. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Bloomenthal Jules and Wyvill Brian. 1997. Introduction to Implicit Surfaces. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA.Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Cook Shane. 2013. CUDA Programming: A Developer’s Guide to Parallel Computing with GPUs (1st ed.). Morgan Kaufmann Publishers Inc., San Francisco, CA, USA.Google ScholarGoogle Scholar
  6. Crassin Cyril, Neyret Fabrice, Lefebvre Sylvain, and Eisemann Elmar. 2009. GigaVoxels: Ray-guided streaming for efficient and detailed voxel rendering. In I3D ’09.Google ScholarGoogle Scholar
  7. Araújo B. R. de, Lopes Daniel S., Jepp Pauline, Jorge Joaquim A., and Wyvill Brian. 2015. A survey on implicit surface polygonization. ACM Comput. Surv. 47, 4, Article 60 (May2015), 39 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Cusatis A. De, Figueiredo L. H. De, and Gattass M.. 1999. Interval methods for ray casting implicit surfaces with affine arithmetic. In XII Brazilian Symposium on Computer Graphics and Image Processing (Cat. No.PR00481). 6571. Google ScholarGoogle ScholarCross RefCross Ref
  9. Figueiredo Luiz Henrique de and Stolfi Jorge. 2004. Affine arithmetic: Concepts and applications. Numerical Algorithms 37 (2004), 147158.Google ScholarGoogle ScholarCross RefCross Ref
  10. Duff Tom. 1992. Interval arithmetic recursive subdivision for implicit functions and constructive solid geometry. SIGGRAPH Comput. Graph. 26, 2 (July1992), 131138. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Fogal Thomas, Schiewe Alexander, and Krüger Jens. 2013. An analysis of scalable GPU-based ray-guided volume rendering. In 2013 IEEE Symposium on Large-Scale Data Analysis and Visualization (LDAV). Google ScholarGoogle ScholarCross RefCross Ref
  12. Fryazinov Oleg and Pasko Alexander. 2007. GPU-based real time FRep ray casting. GraphiCon 2007 - International Conference on Computer Graphics and Vision, Proceedings (012007).Google ScholarGoogle Scholar
  13. Galin Eric, Guérin Eric, Paris Axel, and Peytavie Adrien. 2020. Segment tracing using local Lipschitz bounds. Computer Graphics Forum 39, 2 (2020), 545554. . arXiv: https://onlinelibrary.wiley.com/doi/pdf/10.1111/cgf.13951Google ScholarGoogle ScholarCross RefCross Ref
  14. Ganter D. and Manzke M.. 2019. An analysis of region clustered BVH volume rendering on GPU. Computer Graphics Forum 38, 8 (2019), 1321. . arXiv:Google ScholarGoogle ScholarCross RefCross Ref
  15. Gobbetti Enrico, Marton Fabio, and Guitián José Iglesias. 2008. A single-pass GPU ray casting framework for interactive out-of-core rendering of massive volumetric datasets. The Visual Computer 24 (072008), 797806. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Génevaux Jean-David, Galin Eric, Peytavie Adrien, Guérin Eric, Briquet Cyril, Grosbellet Francois, and Benes Bedrich. 2015. Terrain modeling from feature primitives. Computer Graphics Forum 34 (052015). Google ScholarGoogle ScholarCross RefCross Ref
  17. Hart John. 1995. Sphere tracing: A geometric method for the antialiased ray tracing of implicit surfaces. The Visual Computer 12 (061995). Google ScholarGoogle ScholarCross RefCross Ref
  18. Hart J. C., Sandin D. J., and Kauffman L. H.. 1989. Ray tracing deterministic 3-D fractals. SIGGRAPH Comput. Graph. 23, 3 (July1989), 289296. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Hoetzlein Rama Karl. 2016. GVDB: Raytracing sparse voxel database structures on the GPU. In Eurographics/ ACM SIGGRAPH Symposium on High Performance Graphics, Assarsson Ulf and Hunt Warren (Eds.). The Eurographics Association. Google ScholarGoogle ScholarCross RefCross Ref
  20. Jazar Kavosh and Kry Paul. 2023. Temporal set inversion for animated implicits. ACM Trans. Graph. 42, 4 (82023). Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Ju Tao, Losasso Frank, Schaefer Scott, and Warren Joe. 2002. Dual contouring of Hermite data. ACM Trans. Graph. 21, 3 (2002). Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Kanamori Yoshihiro, Szego Zoltan, and Nishita Tomoyuki. 2008. GPU-based fast ray casting for a large number of metaballs. Computer Graphics Forum 27, 2 (2008), 351360. . arXiv: https://onlinelibrary.wiley.com/doi/pdf/10.1111/j.1467-8659.2008.01132.xGoogle ScholarGoogle ScholarCross RefCross Ref
  23. Keeter Matthew. 2019. Libfive. https://libfive.com/Google ScholarGoogle Scholar
  24. Keeter Matthew J.. 2020. Massively parallel rendering of complex closed-form implicit surfaces. ACM Trans. Graph. 39, 4, Article 141 (July2020), 10 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Keinert Benjamin, Schäfer Henry, Korndörfer Johann, Ganse Urs, and Stamminger Marc. 2014. Enhanced sphere tracing. In Smart Tools and Apps for Graphics - Eurographics Italian Chapter Conference, Giachetti Andrea (Ed.). The Eurographics Association. Google ScholarGoogle ScholarCross RefCross Ref
  26. Kim Doyub, Lee Minjae, and Museth Ken. 2022. NeuralVDB: High-resolution Sparse Volume Representation using Hierarchical Neural Networks. arXiv: 2208.04448 [cs.LG]Google ScholarGoogle Scholar
  27. Knoll Aaron, Hijazi Younis, Hansen Charles, Wald Ingo, and Hagen Hans. 2007. Interactive ray tracing of arbitrary implicits with SIMD interval arithmetic. In 2007 IEEE Symposium on Interactive Ray Tracing. 1118. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Knoll A., Hijazi Y., Kensler A., Schott M., Hansen C., and Hagen H.. 2009a. Fast ray tracing of arbitrary implicit surfaces with interval and affine arithmetic. Computer Graphics Forum 28, 1 (2009), 2640. . arXiv: https://onlinelibrary.wiley.com/doi/pdf/10.1111/j.1467-8659.2008.01189.xGoogle ScholarGoogle ScholarCross RefCross Ref
  29. Knoll Aaron, Wald Ingo, and Hansen Charles. 2009b. Coherent multiresolution isosurface ray tracing. The Visual Computer 25 (072009), 209225. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Knoll Aaron, Wald Ingo, Parker Steven, and Hansen Charles. 2006. Interactive isosurface ray tracing of large octree volumes. In 2006 IEEE Symposium on Interactive Ray Tracing. 115124. Google ScholarGoogle ScholarCross RefCross Ref
  31. Kruger J. and Westermann R.. 2003. Acceleration techniques for GPU-based volume rendering. In IEEE Visualization, 2003. VIS 2003.287292. Google ScholarGoogle ScholarCross RefCross Ref
  32. Liu Baoquan, Clapworthy Gordon J., Dong Feng, and Prakash Edmond C.. 2012. Octree rasterization: Accelerating high-quality out-of-core GPU volume rendering. IEEE Transactions on Visualization and Computer Graphics (2012). Google ScholarGoogle ScholarCross RefCross Ref
  33. Loop Charles and Blinn Jim. 2006. Real-time GPU rendering of piecewise algebraic surfaces. ACM Trans. Graph. 25, 3 (July2006), 664670. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Manson Josiah and Schaefer Scott. 2010. Isosurfaces over simplicial partitions of multiresolution grids. Computer Graphics Forum 29, 2 (2010), 377385. . arXiv: https://onlinelibrary.wiley.com/doi/pdf/10.1111/j.1467-8659.2009.01607.xGoogle ScholarGoogle ScholarCross RefCross Ref
  35. Maple C.. 2003. Geometric design and space planning using the marching squares and marching cube algorithms. In 2003 International Conference on Geometric Modeling and Graphics, 2003. Proceedings. 9095. Google ScholarGoogle ScholarCross RefCross Ref
  36. Mitchell D. P.. 1990. Robust ray intersection with interval arithmetic. In Proceedings on Graphics Interface ’90 (Halifax, Nova Scotia). Canadian Information Processing Society, CAN, 6874.Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Moore Ramon E.. 1966. Interval Analysis. Vol. 4. Prentice-Hall Englewood Cliffs.Google ScholarGoogle Scholar
  38. Moore Ramon E., Kearfott R. Baker, and Cloud Michael J.. 2009. Introduction to Interval Analysis. Society for Industrial and Applied Mathematics, USA.Google ScholarGoogle ScholarCross RefCross Ref
  39. Museth Ken. 2013. VDB: High-resolution sparse volumes with dynamic topology. ACM Trans. Graph. 32, 3, Article 27 (July2013), 22 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Museth Ken. 2014. Hierarchical digital differential analyzer for efficient ray-marching in OpenVDB. In ACM SIGGRAPH 2014 Talks (Vancouver, Canada) (SIGGRAPH ’14). Association for Computing Machinery, New York, NY, USA, Article 40, 1 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Museth Ken. 2021. NanoVDB: A GPU-friendly and portable VDB data structure for real-time rendering and simulation. In ACM SIGGRAPH 2021 Talks (Virtual Event, USA) (SIGGRAPH ’21). Association for Computing Machinery, New York, NY, USA, Article 1, 2 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Pasko Alexander, Adzhiev Valery, Sourin Alexei, and Savchenko Vladimir. 1995. Function representation in geometric modeling: Concepts, implementation and applications. The Visual Computer 11 (081995), 429446. Google ScholarGoogle ScholarCross RefCross Ref
  43. Pasko Alexander, Fryazinov Oleg, Vilbrandt Turlif, Fayolle P. A., and Adzhiev Valery. 2011. Procedural function-based modelling of volumetric microstructures. Graphical Models 73 (092011), 165181. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Ruijters D. and Vilanova Anna. 2006. Optimizing GPU volume rendering. In WSCG - Winter School of Computer Graphics, Vol. 14. 916. http://graphics.tudelft.nl/Publications-new/2006/RV06Google ScholarGoogle Scholar
  45. Seyb Dario, Jacobson Alec, Nowrouzezahrai Derek, and Jarosz Wojciech. 2019. Non-linear sphere tracing for rendering deformed signed distance fields. ACM Transactions on Graphics (TOG) 38 (2019), 112.Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Singh Jag Mohan and Narayanan P. J.. 2010. Real-time ray tracing of implicit surfaces on the GPU. IEEE Transactions on Visualization and Computer Graphics 16, 2 (2010), 261272. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Yurtoglu Mete, Carton Molly, and Storti Duane. 2018. Treat all integrals as volume integrals: A unified, parallel, grid-based method for evaluation of volume, surface, and path integrals on implicitly defined domains. Journal of Computing and Information Science in Engineering 18 (032018). Google ScholarGoogle ScholarCross RefCross Ref
  48. Zellmann Stefan. 2019. Comparing hierarchical data structures for sparse volume rendering with empty space skipping. ArXiv abs/1912.09596 (2019).Google ScholarGoogle Scholar

Index Terms

  1. A Function-Based Approach to Interactive High-Precision Volumetric Design and Fabrication

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Transactions on Graphics
      ACM Transactions on Graphics  Volume 43, Issue 1
      February 2024
      211 pages
      ISSN:0730-0301
      EISSN:1557-7368
      DOI:10.1145/3613512
      Issue’s Table of Contents

      Copyright © 2023 Copyright held by the owner/author(s).

      This work is licensed under a Creative Commons Attribution International 4.0 License.

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 29 September 2023
      • Online AM: 7 September 2023
      • Accepted: 22 August 2023
      • Revised: 19 August 2023
      • Received: 5 May 2023
      Published in tog Volume 43, Issue 1

      Check for updates

      Qualifiers

      • research-article
    • Article Metrics

      • Downloads (Last 12 months)1,500
      • Downloads (Last 6 weeks)161

      Other Metrics

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader