Box2D, Web workers, Better performance

Sponsor: Register today for New Game, the conference for HTML5 game developers. Learn from Mozilla, Opera, Google, Spil, Bocoup, Mandreel, Subsonic, Gamesalad, EA, Zynga, and others at this intimate and technically rich conference. Join us for two days of content from developers building HTML5 games today. Nov 1-2, 2011 in San Francisco. Register now!








WARNING: There is a bug with Chrome 15.0.874.5 dev (and probably earlier in the 15 dev channel series) that fires requestAnimationFrame twice per paint, which leads to very different results that you want. To see this correctly, use Canary or Chrome Stable.

(this post should have been called "Fixin' muh shit")

Following up on a recent post on Box2D, Web workers, and Page Visibility API, we will now take a closer look at performance with Box2D and Web workers. And we fixed my broken code.

Intro

I was initially pleased with the early work on Box2D and Web workers. Moving the physics simulation to the worker process makes sense, as it frees up our render loop. After a closer look, though, I could see hitching and jitters with the Web worker version, especially compared to the inline Box2D. Why was this happening?

Out of sync

After closer inspection, and some help from my awesome colleagues, we came to the conclusion that the workers were not synced with the renderer and everyone once in a while a frame would be skipped.

It turns out, and you probably knew this, that setInterval() and setTimeout() only handle integer time spans. This means a common call of setInterval(paint, 1000/60) will not in fact run at 16.6667 times per second (which just so happens to match the VSync of monitors and the requestAnimationFrame rate).

This means my requestAnimationFrame render loop is running at 60Hz while my worker thread was running at slightly more than 60Hz.  The paint thread and the update thread weren't in sync, and it showed.

How to verify

Chrome, it turns out, has a very nifty way to gain visibility into its render processes. Open a new tab and type about:gpu.  You'll find two tabs of interest.

The first tab lists your GPU configuration, including your graphics card, the current hardware accelerated functionalities (like WebGL or Canvas), and information about your driver.


This is handy information for verifying key components of your browser's graphics configuration.

The second tab available in about:gpu is where you really get to see how the sausage is made. And by sausage I clearly mean "getting stuff drawn onto the screen." Welcome to GPU Profiling!

GPU Tracing

You can record your browser's GPU calls for analysis, and drill down into how long the browser has spent on each call.

Below is an example of a trace from my MacBook Air:


You can see a highlighted WebViewImpl::animate call, which is the requestAnimationFrame call itself. This is where the browser is waiting for your particular code.  You can also see, in text at the bottom, a Duration of 12.237 ms for the highlighted animate call.

I've turned on the red vertical lines by clicking a call and pressing 'g'.  If you want the red line to appear at the end of the call, press shift-g.  These red lines mark a frequency of 16.66667 ms. Note that while these lines don't specifically mark the VSync cycle itself, you can use them to visualize one frame. If you never want to miss a frame, ensure that your animate method, and the supporting Chrome methods which follow (like the WebViewImpl::composite above) do not span these red vertical lines.

This was a wake up call for me. As you can see, even if my code was awesome and was under 16.6667 ms (the length of a frame), I may still end up hitching because the browser has more compositing and other tasks to perform.

Of this is is different if your Canvas happens to be hardware accelerated (doesn't mean it'll go faster, though I hope it does, just that it will be different).

The profiling doesn't look like the above when you start out. In fact, it looks like the below:


Exactly... huh?

The keyboard commands will come in very handy:

Keyboard shortcuts:
 w/s   : Zoom in/out
 a/d   : Pan left/right
 e     : Center on mouse g/G   : Shows grid at the start/end of the selected task

Using the w key, you can zoom in enough to see the pattern of frames.

We saw the hitches easily when we used the Tracing tool. The hitches show up as blank spots where a GPU call would be. You'll see call after call fairly regularly, and then some open space. Boom. Hitch.

Dev Tools to the rescue

After we used the Tracing tool to confirm our hitching, we wanted to find the culprit. Using another tool, the Timing tab in the Webkit Developer Tools, confirmed our suspicion.


The above screenshot is from the (mostly) fixed version, but you can even see a problem here. Notice the nice Paint and Function Call sequence (the Function Call is the worker's onmessage on the page, signaling the arrival of new state from Box2D). Normally everything is fine, Paint and Function Call, over and over. This is a good sequence, in lock step with one another.

Notice, though, the two Paint calls in a row. This means that Box2D took too long to send a message with the new state of the bodies, and the render loop just painted again. This is a small hitch, as the render loop painted the same scene twice.

(this reminds me, if the code doesn't receive an updated message from the worker, why should it paint? we can make our code smarter by only painting if it has received a message since the last time it painted. I'll go try that now...)

Success!

By not updating (admittedly a very quick op in the main thread) and not painting (costly) unless there's new state from Box2D via a message, I've seemingly made the main page much more consistent and less hitchy. I no longer see the double Paints or double Function Calls. Of course, YMMV.

Example

The stand alone example is too big to fit here, as I was upping my game and made a 1024x768 demo. I encourage you to check it out. You can run the example with the simulation in either a worker or inline so you can compare the performance and results. The source code is also available.

Summary

By using tools such as the GPU Tracing system and the Dev Tools Timeline, we can gain a better understanding of the browser's performance, call sequences, and insight into what's happening under the covers.

By syncing our app's worker process to compute physics simulations on demand instead of continually, we've greatly improved performance and reduced hitching. By only drawing when there's new information, we've reduced the amount of work the main thread has to perform as it waits for new state from the worker process. This seems to further reduce hitching.

Bonus, my calculating physics on demand, there's no need for the Page Visibility API to enable or disable the simulation in the worker.

Next Steps

I've gone too far down the rabbit hole here with frame rates and workers. We'll next turn our attention back to pure Box2D with a look at polygons. Stay tuned, and please leave your questions and thoughts in the comments below.

Popular posts from this blog

Lists and arrays in Dart

Converting Array to List in Scala

Null-aware operators in Dart