Thursday, September 25, 2008

FBX and XNA - Part 6 - 40 Million Triangles Per Second

Now that we have pretty 3D models on the screen, the first impression is "Golly thats slow".
One test model I have is about 18000 polygons or 35000 triangles. It displays at about 21 frames per second (FPS) on my PC. In a game that would mean that 2 models on the screen would be but-ugly slow. So lets move lots of the processing onto the Graphics Processing Unit (GPU).

The GPU uses a language call HLSL (High Level Shader Language). What you do is write a somthing.fx file, add it to your project and then load the Effect with the Content loader. Sounds easy. The trick is what to put in the .fx file and how to drive it. Fortunately Microsoft has already provided a SkinnedModelProcessor and it has a file called SkinnedModel.fx We just is that as-is.

HLSL defines method names in a .fx file for doing two things. First is converting a model X,Y,Z point to world space so it can be used as one of the points in a triangle or line drawing. Second is to determine the color of a point on a triangle as it is filled in on the screen (A Shader).

SkinnedModel.fx has a section like this...

// Vertex shader input structure.
struct VS_INPUT
float4 Position : POSITION0;
float3 Normal : NORMAL0;
float2 TexCoord : TEXCOORD0;
float4 BoneIndices : BLENDINDICES0;
float4 BoneWeights : BLENDWEIGHT0;
It indicates that each triangle corner is supposed to have a Position, a normal to the surface, a texture coordinate. That is just like our previous code. The BoneIndices is the index of the Deformer (Bone) that influences the point location with a weight of BoneWeights. The trick here is that a float4 has a X,Y,Z,W values meaning that there can be up to 4 deformers and their associated matrices that can influence the point.

Note: The Position has a vector4 which means it has a X,Y,Z and the mysterious W. Just set all the W's to 1.0f and it will work fine.

So we modify our code to generate this new structure in place of the previous DrawData point list. We use the structure...

public struct SkinnedModelRec
public Vector4 Position;
public Vector3 Normal;
/// The UV texture coord.

public Vector2 TexCoord;
/// We can have up to 4 bones influencing one vertex point and normal.

public Vector4 BoneIndices;
/// We can have up to 4 bones per vertex. These are the weights. Leave them 0 of no bone.

public Vector4 BoneWeights;

/// Description of the elements of this structure.

public static readonly VertexElement[] VertexElements =
new VertexElement[] {
new VertexElement(0,0,VertexElementFormat.Vector4, VertexElementMethod.Default, VertexElementUsage.Position, 0),
new VertexElement(0, sizeof(float) * 4, VertexElementFormat.Vector3, VertexElementMethod.Default, VertexElementUsage.Normal, 0),
new VertexElement(0, sizeof(float) * 7, VertexElementFormat.Vector2, VertexElementMethod.Default, VertexElementUsage.TextureCoordinate, 0),
new VertexElement(0, sizeof(float) * 9, VertexElementFormat.Vector4, VertexElementMethod.Default, VertexElementUsage.BlendIndices, 0),
new VertexElement(0, sizeof(float) * 13, VertexElementFormat.Vector4, VertexElementMethod.Default, VertexElementUsage.BlendWeight, 0)

... some methods ...
Notice the vertex element definition. It tells XNA where the elements in the structure go in relation to the defeinitions in SkinnedModel.fx

Next we load all the points in the model, and the draw indices into the GPU

            TheVertexBuffer = new VertexBuffer(gd, typeof(SkinnedModelRec), DrawData.Length, BufferUsage.WriteOnly);
TheVertexDeclaration = new VertexDeclaration(gd, SkinnedModelRec.VertexElements);
TheIndexBuffer = new IndexBuffer(gd, typeof(int), DrawIndices.Length, BufferUsage.WriteOnly);
And then we can draw them. I realize that all the code is not here. This is just to get the idea of the style.

private void DrawSingleMesh(GraphicsDevice gd, PTake CurrentTake, Effect effect, PMesh mesh, PModelTracker tracker)
//mesh.CalcCurrentState(CurrentTake, tracker);
mesh.CalcAllDeformerMatrices(CurrentTake, tracker);


gd.RenderState.CullMode = CullMode.None;
gd.VertexDeclaration = mesh.TheVertexDeclaration;
gd.Vertices[0].SetSource(mesh.TheVertexBuffer, 0, SkinnedModelRec.SizeInBytes);
gd.Indices = mesh.TheIndexBuffer;

// We draw all the triangles in a single texture at once. Then switch to next texture.
// Sometimes textures repeat later in the list.
foreach (PTextureRun tr in mesh.DrawDataTextureSegments)
if (tr.TextureId (lessthan messes up HTML) 0)
// Some polygons have no texture and are just white with illumination.
// We should probably not allow untextured polygons.
//effect.TextureEnabled = false;
//effect.TextureEnabled = true;

// Max bones in the SkinnedModel.fx is 59
Matrix[] bonesMatricies = new Matrix[59];
for (int i = 0; i lessthan mesh.DrawDeformers.Length; i++)
bonesMatricies[i] = mesh.DrawDeformers[i].Transform;

foreach (EffectPass pass in effect.CurrentTechnique.Passes)


Easy as that. Then I tested it with a loop. We can now put up lots of our model on the screen and hold 60 FPS. The result is 40 million triangles per second. Now that's where we need to be to play a game.


Thursday, September 18, 2008

FBX and XNA Part 5 - Animation Interpolation

One thing I glossed over was how you arrive at the 'current animation rotation' or translation.

The Key: entries represent instants in time that the LimbNode is at a particular translation and rotation. (And scale and color and transparency, but we ignore theses.) What we do is have a ModelTracker class that keeps track of the current values for the animation for every LimbNode.
We simply interpolate between the instants in time. But remember, the first entry is the beginning of the time line, and the last entry is the end so it is a tricky interpolation.

If there is only one entry in the Key: section we just use that value. If there are no Key: entries at all we use the default. If there is not even a take: section for that LimbNode in the given Take, we create a new key frame and initialize it's default values to the LimbNode Lcl Rotation and Translation.

This all happens in the Games Update section. Many ModelTrackers can share the same Model.

We also detect when the animation hits the end and provide a callback so the game can take action when the animation ends.

Not yet implemented bu soon to come, tracking where a given LimbNode is in world coordinates so we can detect when a blade strikes an enemy. Update: This is done. Turns out that on a LimbNode in Maya you can set arbitrary attributes, that then show up in the Properties60 section of the LimbNode entry. We just set a property "Attack" and look for those and track them during attack moves. Same for "Step" for tracking foot prints and putting prints in the snow.

Also, KeyFrame sequences can be specified on any property, so we can put an animation on the value of a Attack Attribute and vary the attack deadliness through the attack swing.

I have not yet accounted for the Initial Velocity Tangent and Final Velocity Tangent. This allows for smoother blending between key frame segments.


FBX and XNA Part 4 - Drawing the model.

Now that we know what all the parts of the model are in the FBX, lets combine them.


for each Mesh in the Model
make a copy of the Vertices
make an array of Matrices, one for each point in the Vertices.
They all get initialized to all zeros (not Identity)
for each SubDeformer
M = SubDeformer.Transform
if it has any Indexes (some do not)
find the animation in the current Take
for this Deformer.
It is the one in Connections that has the
same name as the SubDeformer
M = M * the animation current rotation * the PreRotation *
the animation current translation
if there is a parent LimbNode recursively do the
M = M * rot * prerot * trans
remember m for this deformer.

Now for all the remembered Ms influence the array of Matrices by
doing weighted sums of the elements in the matrices. Note that
this is not a matrix multiply, it is an element
by element weighted sum.
vMats[Indexes[i]] += Matrix.Multiply(Transform, Weights[i])

Now for every element in the copy of Vertices, do a matrix multiply by
the weighted sum matrices
to arrive at the position. Also do a TransformNormal to get the normals right.

Phew, that is how its done.

Now take the copy of Vertices that is now in world coordinates, and draw it on the screen.

foreach Polygon in the list of polygon indices, call
GraphicsDevice.DrawUserPrimitives. Like this
gd.DrawUserPrimitives(PrimitiveType.TriangleFan, drawVerticesXformed, polys[i].StartIdx, polys[i].PrimCount);

It's easy as that.


FBX and XNA Part 3 - Models cont.

Now we look at the LimbNodes.

A LimbNode is a bone in the model. Just like your bones they have a length (translation) and an angle (rotation). There is a section in LimbNodes called Properties60: and there are some important porperties.
Lcl Translation and Lcl Rotation are the default values to use for the limb if there is no animation information. There is also a PreRotation which defines a transform on the limb before translation is added in. This is necessary because limbs may have rotation in the joint.

The Deformer and SubDeformer are what connect the LimbNode to the mesh so the points in the mesh can actually be moved in world coordinates. A Deformer is just the parent of one or more SubDeformers and is most important because it links (with the Connections: section) to the LimbNode.
A SubDeformer simply has a series of indexes in the Vertices, and a series of weights. In all our models if there are multiple deformers for a given Vertex, the sum of weights will be 1.0
there also is a very important Transform. The Transform is where the Deformer is relative to the LimbNode.

I think we have all the pieces now. Next we actually combine them to get a model that moves in world coordinates.


FBX and XNA Part 2 - 3D Models

The general file format of FBX is divided into sections.

They are
  • Header (ignored)
  • Definition (ignored)
  • Objects
  • Relations (ignored)
  • Connections
  • Takes
  • Version5 (ignored)
This is the data about the model.
  • Cameras (ignored) Things like Quicktime will use these to position the view when showing the model. Quicktime is a great way to get a look at models, and "if it works in QuickTime then it should work in my code".
  • Mesh - The actual points that make up the model. More on this later.
  • LimbNode - the Bones of the file. These are definitions of the links between different parts of the mesh and Deformers and Animations. Just like your body has a skin (the mesh) and bones (the LimbNodes).
  • Pose - (ignored) In modeling tools there are pre-done poses for the model. You then animate as changes to the pose. We ignore this since it only applies to tools that will be editing and changing the model.
  • Deformer and SubDeformer - these are attached to the LimbNodes and tell which mesh points should be moved when the LimbNode moves, and by how much. Several Deformers may influence a single mesh point to give smooth bends to the mesh around the joints.
  • GlobalSettings (ignored) If I ever see a model that has these changed I'll have to account for them.
The connections section tells what the tree hierarchy is and which Deformers apply to which LimbNodes. We use this alot when loading the model to get everything stitched together.


Takes are the animations on the model. A single take is a single animation. It has a range of time in some strange units that is an unsigned long. I convert it to seconds by a simple conversion factor until the speed looks right.
The conversion is N / (1539538600.0 * 30.0) . Which may or may not be correct.
In a Take are Keyframes for various values on a LimbNode.
First there are the joints. One section is the Transform where it says Channel: "T" {
Then in the section there is X, Y and Z. There is a default value for each part, and then may or may not be keyframes with the tag Key:
Key: entries are a series of values at given times. The format is Time, Value, U, [s or a(?)], then if it was s there is a start velocity and end velocity (ignored), the the letter n.
So an entry might look like this:
Key: 1539538600,4.30184507369995,U,s,0,0,n,23093079000,4.30184507369995,U,s,0,0,n,46186158000
1539538600 is the start time of the first position, 4.3... is the value, since it is an s then the 0,0 are start and end velocity, and then the letter n.
There is also a KeyCount that I should use in future versions for a more efficient load.

Correction: Not really start and end velocity. Actually the tangent of the velocity curve at the beginning and end of the segment, which is almost the same thing. In the future I will use these to get smoother animations.

The only sections I use are the "T" section for translation, and "R" for rotation.

A note on take: If the take has a Key: entry, then that determines the translation and rotation. If it does not then you use the default values (Default: key). If there is no animation for any LimbNode, than you have to fill in a dummy animation with the LimbNode Lcl Translation and Lcl Rotation as the default values. Another way to do this would be to have the LimbNodes driving the drawing and looking to see if there is a corresponding animation data for that limb. I could do it that way, but my code already is driven by the animations, so I create dummy animations for Limbs that don't have one. After load and building the Mesh, Animation, and Deformers, I throw out all the other model data. This is more compact.

(Does any one know what "liw" stands for?)

The Mesh sections define the surface mesh of points in a model. They are the basis of the model.
the mesh has a Properties60: section that has lots of settings, all ignored. But then there is the Vertices: section. This is a list of X,Y,Z coordinates of all the points in the mesh and is often quite large. We read all these and keep them.

Then is the PolygonVertexIndex: list. This tells the index in Vertices of each point in a polygon on the model surface. Obviously points often are part of several polygons, usually 4 since the vertices are the corners of rectangular patches. The one trick here is that the end-of-polygon indicator is a negative sign on the index minus one. What really happens is a -1 is XOR'd with the index.

We ignore Edges: They are for showing the model as a wire frame which we never do.

Another section is the Normals: section. These are the XYZ normals at the Vertices points. This is used by the graphics processor to get the lighting shade correct when drawing the polygons. There are two flavors, ByPolygonVertex, and ByVertex. If it is ByPolygonVertex then every index in PolygonVertexIndex has a corresponding normal. If it is ByVertex the each point in the Vertices list has a single corresponding normal.

Next is UV: which is the texture coordinates. A UV entry for a given Vertex is the X,Y location in the texture that corresponds to that point. In a triangle of the mesh there will be 3 UV coordinates that form a triangle in the texture that is painted onto the triangle when it is drawn on the screen. UV can also be ByPolygonVertex or ByVertex.
The rest of the Mesh data gets ignored.

When we read in the Mesh we convert it to a flat list of VertexPositionNormalTexture in XNA.
Thus if it is a PerVertex Polygon and UV data it gets expanded out to PerPolygonVertex.

Yikes, that's a lot of info.
Next we talk about the other sections...


FBX and XNA Part 1 - FBX Format and Reader

So you have an FBX file and want to read it.

FBX Encoding
FBX files come in two flavors. ASCII and Binary. The ASCII format can be viewed in any text editor. The binary format is more compact. So far we only handle ASCII.

The file is encoded as tagged fields. For example, here is the definitions section of a file...

Definitions: {
Version: 100
Count: 33
ObjectType: "Model" {
Count: 14
ObjectType: "Geometry" {
Count: 3
ObjectType: "Material" {
Count: 1
ObjectType: "Deformer" {
Count: 12
ObjectType: "Pose" {
Count: 3
ObjectType: "GlobalSettings" {
Count: 1

The Definitions: part is a tag, and what follows it is the values. In this case the value of Definitions: is a compound set of values. The tag Version: is followed by a single value, 100.

Some entries have both single values and compound values following a tag.

The Parser
The parser is organized to read the general format of the FBX ASCII file but does not know what any of it means.
The lowest level of the parser is the FbxWordReader class. It knows how to read the next token in the FBX file. The method NextToken does this. There is also a PushBack and SkipWhite.

The next level up is converting tokens into nodes in a tree, representing the hierarchy of the file.
the base class is FbxNode, and it's two subclasses, StringNode, KeyNode and ListNode. The StringNode is a single element that may be a string, integer, or float, or several other types, but at this level is always seen as a string. The ListNode is a set of FbxNodes representing a compound entry. The KeyNode is a key (tag) and then a series of values.
These nodes know how to find sub-nodes (both single sub nodes, and lists of sub nodes.)

The KeyNode knows about the key and it's values, and how to extract a Matrix.

StringNode knows about String, float, int, ulong, time, and char.

The FbxReader is the top level and does the recursive work of parsing the entire FBX file into a tree of nodes.

Now we have the whole FBX file in memory, what does it all mean?

Partial Loads
When we load a model we check to see if there is a subdirectory called Animations. If there is we remove all the takes from the model, and load all the fbx files in Animations and extract their single take. Then we rename the take to match the file name (e.g. Run.fbx get the take named set to Run) and insert it into the main model. This way all our animations are single files in a subdirectory. This is way easier on the modeler. Of course, we have to be careful that all bone names remain the same and such. We do this insertion in the FBX parsed tree before building the model in memory.


XNA FBX and Custom Loader

At Sandswept Studios we are working on a Hack-n-slash side scroller.
We were using the SkinnedModelProcessor but is had some problems.
  • Some models displayed all twisted up. We then did lots of tweeks in Maya and sometimes it worked, and sometimes not.
  • We need our 3D modelers to be able to drop a model into the Content directory and immediately see the model in the game.
  • We need to Mix-n-Match meshes, animations, and textures. For example, the player may change weapons.
  • We need to detect end of animation, movement of bones (LimbNodes) and such for in game event and attack/collision detection.
  • We need to be able to dynamically modify models meshes, for example, exploding monsters.
  • We need to be able to use the same texture between models.
  • We need to be able to strip the models of any unnecessary data so we have the smallest possible data set.
  • It has to run on the PC and the XBox.
  • When run on the XBox the FBX file should be pre-built into a binary format that reads really fast and is compact. (Not yet implemented)
Well, that's a tall order. In short, we picked apart the FBX files format and learned how to load the ascii FBX file, and display the animated model on screen in the game. There is a dearth of info on the FBX file format. The next few posts here will be about how it is done.

Shameless commercial plug: If you are interested in using our library in your game, contact me and we can arrange it. We are thinking of a initial low fee to develop with the library, and a single lump sum when (and if) your game makes any money.