Tuesday, March 3, 2015

Improving V8's performance using the serializer/deserializer

V8 has long been using so-called snapshots to improve start-up time. By capturing a snapshot of a fresh V8 instance, initializing Javascript globals no longer require executing native scripts where they are defined, but simply deserializing the snapshot. A d8 shell built with snapshot starts within 15ms as opposed to 55ms without.

A minor sometimes overlooked detail is that the snapshot not only speeds up creating a V8 isolate via v8::Isolate::New, but also contexts created via v8::Context::New, as the snapshot is used as a template.

Recently, the serializer and deserializer implementing the start-up snapshot have been improved to support new features. In the following I want to give a brief introduction to two new embedder-facing APIs that - if used correctly - can improve performance significantly.

For code examples, which I'm not going to provide here out of laziness, please take a look at the related test cases.

Custom start-up snapshot

Applications embedding V8 often run additional scripts at start-up to set up the global object, define utility functions, load libraries etc. This is not much different than V8's native scripts executed to initialize Javascript globals, so we might as well use the start-up snapshot to skip this as well.

v8::V8::CreateSnapshotDataBlob introduced since V8 4.3 is intended to do exactly that. Internally, it creates a new isolate and executes the custom script source passed as argument inside a new context. Then it captures a snapshot. The snapshot does not only include functions defined in the custom script, but the entire heap state, for example changes to the global object.

To create a new isolate using the snapshot, you can pass the snapshot to v8::Isolate::New via v8::Isolate::CreateParams.

If for example a look-up table is meticulously calculated at start-up of the application, then by creating a snapshot with that start-up script, the calculated table is included in the snapshot. Starting up with the snapshot bypasses calculating the look-up table completely.

Limitations
The deserializer can only replay changes to the heap. So by a rule of thumb, everything outside of V8 cannot be captured in the snapshot. This includes for example
  • API callbacks during start-up, for example functions created via v8::FunctionTemplate.
  • Typed array objects, as the backing store is allocated outside of V8.
  • Return values of Math.random(), once captured in the snapshot, are not really random anymore.

Tip
Functions are usually compiled lazily in V8, i.e. the code for it is not compiled until the function is actually called. If you define functions in the start-up script, but don't call it, the code for it is not compiled and consequently not included in the snapshot. You can force it to be compiled by calling it though, wrapped in a try-catch block if necessary, to catch errors when these functions attempt to call to the API. There is a trade-off though: including compiled code in the snapshot saves compilation time later on, but at the price of a larger snapshot, which takes longer to deserialize.


Performance
There is little point in discussing the performance gains here. If the start-up script is small, but takes a long time to execute, for example by calculating the first billion digits of pi, then of course using a snapshot yields a huge boost.

Code cache

The serializer/deserializer can also be used to bypass code compilation since V8 4.2. Scripts are compiled by calling v8::ScriptCompiler::Compile inside an existing isolate and context. The result is deterministic and therefore an ideal subject for caching.

By using v8::ScriptCompiler::kProduceCodeCache as cache option, the source object can then be used to obtain a cache data blob. To use it, attach that cache to the source object and use v8::ScriptCompiler::kConsumeCodeCache.

By providing cached data, V8 can bypass parsing/compiling, and simply deserialize the compiled code.

Limitations
As we are preserving code compiled in one context to use it in another context, the code has to be context-independent. That means that optimized code cannot be cached, since it ties deeply into objects that only exist in the context it has been compiled for. This currently also applies to code compiled by Turbofan. Only unoptimized code can be cached.

Tip
Similar to the start-up snapshot above, function literals are usually lazily compiled are not included in the cache. An exception are functions literals that are put between brackets, for example

var foo = (function() {
  alert();
});

Undefined references at compile time are not an issue, since reference errors occur only at run time and don't cause compilation to fail.

The cached data is check-summed with V8's version, flag configuration, CPU features, and source string. If there are mismatches or the data is corrupted, V8 falls back to actually parsing/compiling the source code. The v8::ScriptCompiler::CachedData object contains a rejected field to indicate if this happens.

Performance
Again the performance depends largely on what script you intend to cache. Some scripts sources are large, but don't actually compile to a lot of code. Here are some numbers of caching code of popular libraries

compile w/o cache (ms)compile w/ cache (ms)cache size (bytes)zipped (bytes)
angular.js19.50.47132,87329,190
angular.min.js11.80.43121,35824,329
backbone.js1.10.083,3691,486
backbone-min.js1.00.083,4081,481
ember.js41.50.4789,16516,453
ember.prod.js38.20.4486,28616,016
jquery.js5.10.032,6861,231
jquery-min.js3.10.021,446745
react.js6.70.02988552
react-min.js3.40.02996561

The time it takes to serialize code is roughly the same as it takes to deserialize.

Data compression

The serialization format V8 uses is not particularly compact, as one can see in the above table. It's mainly tuned for deserialization performance. In some cases however it does make sense to compress the data to save disk space. On mobile devices in particular compression might even speed up reading from the slow flash drive.

4 comments:

  1. Thanks for the writeup!
    Do you know if there is any plan to cache optimization result in the future? It seems like a difficult problem as optimization results are tightly coupled with context and heap layout.

    ReplyDelete
    Replies
    1. Unfortunately there are no plans to implement that in short to medium term. Porting code embedding context specific objects is an unsolved problem.

      Delete
  2. We, as an embedder, have global variables and global functions exposed through the global template, which slows down the context creation. Reading your first item in the "Limitations" section, I wonder whether we could still leverage the custom snapshot to speed up context creation.

    Similarly, since we have yet to move up to 4.3, using a 3.x version, I wonder whether we could use the mechanism implemented in mksnapshot to speed up context creation by capturing a snapshot with a global object containing functions with callbacks, and later using that to initialize v8.

    Thanks in advance.

    ReplyDelete
    Replies
    1. Jane: you definitely can. When you create a context from snapshot, a global object is deserialized from the snapshotted context, but a second global object is also created from the object template you passed in. Both global objects are then merged, so you end up with a single global object that has properties from both the object template and the snapshot.

      The statement in the limitations section refers to actually executing those template functions, since those executions cannot be replayed. But defining them is fine.

      Delete