Tuesday, January 29, 2013

Minification is not enough, you need tree shaking

In which the virtues of automated mechanical arboreal pruning are extolled over quaint manual labor, as applied to web development build processes.

The setup

Ever notice how the primary bit of marketing for many traditional web programming libraries is their download size? Why is that?

Check this out:

jQuery claims it is only 32kb minified.

Zepto claims it is less than a quarter the size of jQuery.

Dojo claims its nano core is 3.8kb.

Why does size matter so much for these libraries? Your first instinct is probably, "because the more bytes you shuttle across the wire, the slower the app starts up." Yes, this is true. I'd also say you're wrong. The primary reason that size matters for these libraries is because traditional web development has no intelligent or automated way to prune unused code so you can ship only the code that is used over the wire.

The web is full of links, yet web dev has no linker

The web development workflow is missing a linking step. A linker's job is to combine distinct project files into a single executable. A smart linker will only include the symbols and code that are actually used by the application, thus pruning unused code. The traditional web developer does not have an intelligent linker.

It's 2013, and the job of micro-managing web development libraries is still being done by humans. Humans: the same people that brought you this little gem. Web developers need machines and tools to take care of linking and minifying so they can get back to comparing traditional web development libraries based on actual feature sets instead of how femto they are.

I want my web programming language to offer enough structure and intelligent tools to take care of pruning, minification, and more. This is why I dig Dart, because it has the structure (classes, libraries, packages, type annotations, metadata, etc) and the tools (dart2js) for a modern development workflow.

Don't just prune unused code, shake it off

Dart tools support tree shaking, a technique to "shake" off unused code, thus shrinking the size of the deployed application. I can import rich libraries chock full of useful goodness into my application, but only the functions I actually use will be included in my generated output. Awesome!

The source application goes through a tree-shaking compiler and its output is smaller.

Real code example, shaken not stirred

Consider this simple Dart library, ironically named embiggen. There are two top-level functions in this library, embiggen and unembiggen.

 library embiggen;  
   
 String embiggen(String msg) {  
  if (msg == null) {  
   throw new ArgumentError("must not be null");  
  }  
    
  return msg.toUpperCase();  
 }  
   
 String unembiggen(String msg) {  
  if (msg == null) {  
   throw new ArgumentError("must not be null");  
  }  
    
  return msg.toLowerCase();  
 }  

Here is the main program, which uses only embiggen:

 import 'package:embiggen/embiggen.dart';  
   
 main() {  
  var args = new Options().arguments;  
  if (args.length == 0) {  
   print("Usage: dart embiggen.dart phrase");  
   return;  
  }  
    
  var phrase = args[0];  
    
  print(embiggen(phrase));  
 }  
   

I love embiggen, but I'm less entralled with unembiggen and will never use it. Do I have to search for nano-embiggen?! Nay! Let the linker do it's tree-shaking magic.

Run the main application through the dart2js tool, which supports tree-shaking for both JavaScript and Dart outputs. Note there is no command-line option for tree shaking, because dart2js is always tree shaking. For simplicity's sake, let's generate Dart.

 dart2js --output-type=dart embiggen.dart  

Gaze into the tree-shook generated output (reformatted to make it easy to read):

main() {
  var args=new Options().arguments;
  if (args.length == 0) {
    print("Usage: dart embiggen.dart phrase");
    return;
  }
  var phrase=args[0];print(embiggen(phrase));
}

String embiggen(String msg) {
  if (msg == null) {
    throw new ArgumentError("must not be null");
  }
  return msg.toUpperCase();
}

(note: The actual output is actually all on one line, with white space removed. I reformatted the code above to make it easy to read.)

Notice how the embiggen function is included, but unembiggen is nowhere to be seen, even though I imported the library. The tree, it is shaken!

But is this the best we can do? The dart2js tool also supports minification with the --minify flag.

 dart2js --minify --output-type=dart embiggen.dart  

The minified, single-line, tree-shook generated output:

 main(){var A=new Options().arguments;if(A.length==0){print("Usage: dart
embiggen.dart phrase");return;}var C=A[0];print(B(C));}
B( A){if(A==null){throw new ArgumentError("must not be null");}
return A.toUpperCase();}  

Both outputs have unused code eliminated, and the minified version also replaces variable names. This is exactly the kind of help I want from my tools.

Why this works

The structure of Dart programs cannot change after compilation. In other words, Dart does not support altering class structure during runtime. Dart also does not have extreme dynamism like eval(). Dart compilers and linkers can assume more about the structure of the program, and thus can be more aggressive about tree shaking and minifications.

Moral of the story

I believe that web developers need a better workflow that automates tree shaking, dead code elimination, minification, and more. Stop caring how big a library is, and instead let a tool or build step produce the smallest output possible for you, ideally by tree shaking the application.

One option to consider is Dart, with its structured language and intelligent tools, like a tree-shaking and minifying compiler. With dart2js, you can import entire libraries, regardless of size, and generate only the code that is required to run the program.

Regardless of what language you use, demand more from your tools.

 

Acknowledgements

Thanks to Bob Nystrom's OSCON presentation from 2012, from which I humbly embraced-and-extended the setup of this post.
Post a Comment

Disclaimer

I'm probably required to say that the views expressed in this blog are my own, and do not necessarily reflect those of my employer. Also, except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 3.0 License, and code samples are licensed under the BSD License.