## December 15, 2014

### Web Engines Hackfest 2014

#### Gustavo Noronha

For the 6th year in a row, Igalia has organized a hackfest focused on web engines. The 5 years before this one were actually focused on the GTK+ port of WebKit, but the number of web engines that matter to us as Free Software developers and consultancies has grown, and so has the scope of the hackfest.

It was a very productive and exciting event. It has already been covered by Manuel RegoPhilippe Normand, Sebastian Dröge and Andy Wingo! I am sure more blog posts will pop up. We had Martin Robinson telling us about the new Servo engine that Mozilla has been developing as a proof of concept for both Rust as a language for building big, complex products and for doing layout in parallel. Andy gave us a very good summary of where JS engines are in terms of performance and features. We had talks about CSS grid layouts, TyGL – a GL-powered implementation of the 2D painting backend in WebKit, the new Wayland port, announced by Zan Dobersek, and a lot more.

With help from my colleague ChangSeok OH, I presented a description of how a team at Collabora led by Marco Barisione made the combination of WebKitGTK+ and GNOME’s web browser a pretty good experience for the Raspberry Pi. It took a not so small amount of both pragmatic limitations and hacks to get to a multi-tab browser that can play youtube videos and be quite responsive, but we were very happy with how well WebKitGTK+ worked as a base for that.

One of my main goals for the hackfest was to help drive features that were lingering in the bug tracker for WebKitGTK+. I picked up a patch that had gone through a number of iterations and rewrites: the HTML5 notifications support, and with help from Carlos Garcia, managed to finish it and land it at the last day of the hackfest! It provides new signals that can be used to authorize notifications, show and close them.

To make notifications work in the best case scenario, the only thing that the API user needs to do is handle the permission request, since we provide a default implementation for the show and close signals that uses libnotify if it is available when building WebKitGTK+. Originally our intention was to use GNotification for the default implementation of those signals in WebKitGTK+, but it turned out to be a pain to use for our purposes.

GNotification is tied to GApplication. This allows for some interesting features, like notifications being persistent and able to reactivate the application, but those make no sense in our current use case, although that may change once service workers become a thing. It can also be a bit problematic given we are a library and thus have no GApplication of our own. That was easily overcome by using the default GApplication of the process for notifications, though.

The show stopper for us using GNotification was the way GNOME Shell currently deals with notifications sent using this mechanism. It will look for a .desktop file named after the application ID used to initialize the GApplication instance and reject the notification if it cannot find that. Besides making this a pain to test – our test browser would need a .desktop file to be installed, that would not work for our main API user! The application ID used for all Web instances is org.gnome.Epiphany at the moment, and that is not the same as any of the desktop files used either by the main browser or by the web apps created with it.

For the future we will probably move Epiphany towards this new era, and all users of the WebKitGTK+ API as well, but the strictness of GNOME Shell would hurt the usefulness of our default implementation right now, so we decided to stick to libnotify for the time being.

Other than that, I managed to review a bunch of patches during the hackfest, and took part in many interesting discussions regarding the next steps for GNOME Web and the GTK+ and Wayland ports of WebKit, such as the potential introduction of a threaded compositor, which is pretty exciting. We also tried to have Bastien Nocera as a guest participant for one of our sessions, but it turns out that requires more than a notebook on top of a bench hooked up to   a TV to work well. We could think of something next time ;D.

I’d like to thank Igalia for organizing and sponsoring the event, Collabora for sponsoring and sending ChangSeok and myself over to Spain from far away Brazil and South Korea, and Adobe for also sponsoring the event! Hope to see you all next year!

### Philippe Normand: Web Engines Hackfest 2014

#### Igalia WebKit

Last week I attended the Web Engines Hackfest. The event was sponsored by Igalia (also hosting the event), Adobe and Collabora.

As usual I spent most of the time working on the WebKitGTK+ GStreamer backend and Sebastian Dröge kindly joined and helped out quite a bit, make sure to read his post about the event!

We first worked on the WebAudio GStreamer backend, Sebastian cleaned up various parts of the code, including the playback pipeline and the source element we use to bridge the WebCore AudioBus with the playback pipeline. On my side I finished the AudioSourceProvider patch that was abandoned for a few months (years) in Bugzilla. It’s an interesting feature to have so that web apps can use the WebAudio API with raw audio coming from Media elements.

I also hacked on GstGL support for video rendering. It’s quite interesting to be able to share the GL context of WebKit with GStreamer! The patch is not ready yet for landing but thanks to the reviews from Sebastian, Mathew Waters and Julien Isorce I’ll improve it and hopefully commit it soon in WebKit ToT.

Sebastian also worked on Media Source Extensions support. We had a very basic, non-working, backend that required… a rewrite, basically :) I hope we will have this reworked backend soon in trunk. Sebastian already has it working on Youtube!

The event was interesting in general, with discussions about rendering engines, rendering and JavaScript.

## December 11, 2014

### Javier Fernández: Web Engines Hackfest 2014

#### Igalia WebKit

An awesome week is coming to the end and I’d like to thanks the sponsors for helping us to make possible that such an amazing group of hackers could work together to improve the web. We focused on some of the consolidated web engines but also about the most promising ones and, of course, hacking on them producing a good amount of patches.

The talks were great, and even better the breakout sessions about many hot topics for the future of the web engines.

Because of the work I’ve been doing lately, I was specially involved in the CSS Features session, which among other things, it complemented the talk Sergio Villar gave us about the Grid Layout implementation we are doing at Igalia in collaboration with Bloomberg.  I introduced as well the work I’ve been doing on the implementation of the Box Alignment specification in both Blink and WebKit web engines; we evaluated how it would impact other layout models, like MathML, CSS Regions, CSS Flexible Box, to ease the logic of blocks and content alignment. We also discussed about the future of CSS regarding new layout models, which is a bit uncertain; there is actually an interesting discussion inside the W3C about this topic, so we will see how it evolves. We talked about graphics and CSS and the SVG specification  (the 2.0 version is being defined) which  will have an important role in the future, as I could personally notice during the last CSSConfEU conference in Berlin; it was also an important topic in other conferences along this year.

This week was a nice opportunity to discuss with other web core hackers the issues I’ve found to properly implement the CSS Box Alignment specification in WebKit, see discussion in the bugzilla for details. We have concluded  that is not an easy problem that should be discussed in the mailing list, as it would imply assuming a performance overhead in CSS parsing. The ‘auto’ value defined by the spec for all the Box Alignment properties, to be resolved during the cascade depending on the type of elements, is affecting the current implementation of Flexible Box and MathML so we will have to find a solution.

I also produced a bunch of patches for WebKit to improve the way we were managing margins and float elements, properly identifying the elements creating a new formatting context and applying some refactoring to make the code clearer; these improvements fixed several issues in Grid Layout as well. Besides, borders, margin and padding was finally adapted in Grid Layout to the different writing-modes, which was a patch I’ve been working for some weeks already and had the opportunity to complete during this hackfest.

I think that’s all for now, hope to see you all in the next Web Engines Hackfest 2015.

## December 10, 2014

### Manuel Rego: Web Engines Hackfest 2014

#### Igalia WebKit

The Hackfest

Web Engines Hackfest 2014 (picture by Adrián Pérez)

Today we’re finishing the first edition of the Web Engines Hackfest. This is like the continuation of the “WebKitGTK+ Hackfest” that has been organized by Igalia since 2009.

Checking some numbers, we’ve been 30 people working together for 4 full-days (even nights in some cases). On top of that, there’ve been 6 nice talks and several interesting discussions around the open web platform.

As usual the hackfest started with an initial brainstorming where everybody introduced themselves and their plans for the following days. Martin Robinson led the discussion and identified the main topics, that were later debated in detail on the breakout sessions.

The Talks

Despite of being an un-conference format event, this year we’ve arranged a few nice talks about different topics (you can find the slides in the hackfest wiki page):

• Raspberry Pi Browser by Gustavo Noronha & ChangSeok Oh: They talked about the work done to have a full-browser working in a low specs device like the RPi. They used WebKitGTK+, but not WebKit2 yet, and they need to do some cool tricks to avoid the hardware limitations.
• CSS Grid Layout by Sergio Villar: Sergio explained the work we’ve been doing around CSS Grid Layout on Blink and WebKit. Apart from explaining some of the nice things of the feature, he revealed our plans to ship it on Chrome during the first quarter of 2015.
• State of JS Implementations by Andy Wingo: As usual a great talk by Wingo summarizing the evolution of JavaScript during the last years in the open source engines (JSC, SpiderMonkey and V8) and what we can expect in the future. Next year we’ll bring us further optimizations and a faster web for the users.

TyGL presentation by Zoltán Herczeg (picture by Adrián Pérez)

• GPU Based 2D Rendering With TyGL by Zoltán Herczeg: Good introduction to TyGL port explaining all the internals and architecture. The different technical challenges and the solutions provided by TyGL to face them.
• WebKit for Wayland by Žan Doberšek: Žan presented a new WebKit port for Wayland platform. The main use case for this port is to have a just one web view full-screen (like in a set top box). He explained the internal details and how it’s done the integration with Wayland.
• A Short Introduction to Servo by Martin Robinson: Martin explained the new parallel browser engine Servo and the Rust programming language behind the scenes. He talked about the current status and the next steps, it should be soon in dog-foodable state.

CSS Features breakout session (picture by Adrián Pérez)

My personal experience

From the personal point of view, I’ve been mostly focused on CSS stuff. We’ve a nice discussing regarding the new layout models (from multi-column, flexbox and grid to regions and shapes) and how long they take to be a major feature widely used on the web.

On the one hand, I’ve been carrying on with the regular work I’m lately doing related to CSS Grid Layout. I found and fixed a crash in the absolutely positioned grid children support that landed past week. In addition I managed to get few more tests landed on the grid W3C test suite that we’re starting to create (thanks to Gérard Talbot for the reviews).

A Coruña from San Pedro (picture by Adrián Pérez)

On the other hand, I’ve been talking with Mihnea Ovidenie about CSS Regions and how CSS Grid Layout could help to avoid the criticism regarding empty HTML elements required to create regions.
Grid allows you to divide the webpage in areas directly from CSS, each area is not associated with an HTML element (a DOM node). So if regions are defined using those areas, it won’t be needed to create empty elements on the DOM tree.
Basically, this seems to match with what is defined in the CSS Pagination Templates module that is being drafted by Alan Stearns.

Thanks

Finally I’d like to thanks to all the people attending (this event wouldn’t be possible without all of you). We hope you enjoined the hackfest and also your stay in A Coruña. See you next year!

Also big thanks the sponsors Adobe, Collabora and Igalia for making possible such a great event.

## December 09, 2014

### Žan Doberšek: Announcing WebKit for Wayland

#### Igalia WebKit

Here at Igalia we are happy to announce the Wayland port of WebKit.

This port avoids using traditional GUI toolkits in favor of directly operating with the Wayland display protocol. Leveraging the WebKit2 multi-process architecture, the UIProcess is implemented as a shared library and loaded by the Wayland compositor, enabling the WebProcess to act as a direct client of the compositor while still being controlled by the UIProcess.

EGL, the Wayland EGL platform, and OpenGL ES are used for hardware-accelerated compositing of the rendered Web content. GLib, Libsoup and Cairo are used under the hood.

The port serves as a good base for building systems and environments that are mostly or completely relying on the Web platform technologies for building the desired interface.

Overall the port is still in its early days, with some basic functionality (e.g. functional keyboard and mouse input support) and many other Web platform features still not supported. But with Wayland EGL support constantly growing in graphics drivers for different GPUs, it can already be tested on devices like the Raspberry Pi or the Jetson TK1 development board.

In terms of supported Wayland compositors, for the moment we only support Weston (the reference Wayland compositor implementation), which is also used for development purposes. It’s also used for running the layout tests by again pushing WebKitTestRunner functionality into a shared library, though all that is still in very early stages.

The code is available on GitHub. There are also short instructions for building the dependencies and the port, and how to run it.

There’s also additional repositories there (for Cairo, Weston), containing changes that haven’t yet been pushed upstream. In the following days we’ll also be providing Buildroot configurations that can be used for cross-compiling the whole software stack for the different supported hardware.

We look forward to continuing evolving this work, enabling further features and improving performance on the software side and adding support for additional devices.

## December 08, 2014

### How to build TyGL

#### University of Szeged

This is a follow-up blog post of our announcement of TyGL - the 2D-accelerated GPU rendering port of WebKit.

We have been received lots of feedback about TyGL and we would like to thank you for all questions, suggestions and comments. As we promised lets get into some technical details.

## December 04, 2014

### Gwang Yoon Hwang: Introducing Threaded Compositor

#### Igalia WebKit

Since CSS Animation appeared, the graphics performance of WebKit has become an important issue, especially in embedded systems world. Many users (including me) want fancy and stunning user interfaces, even in a small handset device which have very restricted CPU and GPU.

Luckily, I could work through the years to improve the graphics performance of WebKit. And with the support from colleagues in Igalia, I just finished to make the Threaded Compositor for WebKitGTK+ as a review-ready state.

The Threaded Compositor focused to improve the performance of WebKitGTK+ for CSS Animation using dedicated thread for compositing. But not only CSS Animation, it also provides a way to improve the performance of scrolling, scaling, rendering of canvas and video element.

Before diving into the detail, it would be good to watch a video which introducing the Threaded Compositor.

The example used in this video can be accessed from here. I modified the original version to run automatically for benchmarking purpose.

Also, the current implementation of Threaded Compositor is available on the github. It is based on the WebKit r176538.

## Motivation

The main-thread is where everything gets executed, including layout, JavaScript execution, and other heavy operations. Thus, running accelerated compositing in the main-thread severely limits responsiveness and rendering performance. By having a separate thread for compositing, we can bring a significant performance improvement in scrolling, zooming, and rendering.

Currently, several ports have already implemented a way to perform compositing off the main-thread. Coordinate Graphics, which is used by Qt and EFL, runs accelerated compositing in UI Process, and Chromium has implemented Compositor Thread that runs off the main render thread.

Threaded Compositor is a threaded model of Coordinated Graphics. Unlike Coordinated Graphics, it composites contents in the dedicated thread of Web Process and doesn’t use complicated inter-process shareable textures. Because of that, it is much easier to implement other multi-threaded rendering techniques which is covered at Compositing the Media and the Canvas.

Compositing a contents using OpenGL[ES] is a basic technique to accelerate CSS animations. However, even we are using GPU accelerated compositing, we can face a V-sync pitfall. ( This happens more frequently in the embedded systems. ) Most of GPU uses V-sync (Vertical synchronization) to prevent the screen tearing problem. It may blocks OpenGL’s SwapBuffer call at most 16 ms - if the monitor’s refresh rate is 60 Hz - until the next vblank interrupt.

As you can see in the below diagram, this problem can waste the main-thread’s CPU time.

Again, when the main thread executing OpenGL[ES]’s SwapBuffer call, the driver should wait V-sync interrupt from the DRM/KMS module. The above diagram shows the worst case scenario. The main-thread could render 6 frames in given time if there was no V-sync problem. But with the V-sync, it renders just 4 frames. It can be happened more frequently in embedded environments which have less powerful CPU and GPU.

To overcome this problem, we uses a very common technique to use a different thread to update screen. As visualized at above diagram, the compositing thread takes all overheads from the V-sync problem. The main-thread can do other job while the compositing thread handles OpenGL calls. (Purple rectangles in the main-thread indicate inter-thread overheads.)

Moreover, the Threaded Compositor accelerates CSS Animations, also.

Because all of the layer properties (including GraphicsLayerAnimation) are passed to the compositing thread, it can make available to run CSS animations on the compositing thread without interrupting the main-thread. When CoordinatedGraphicsScene finishes to render a scene, it will check is there any running animations in the its layer tree. If there are active one, it sets a timer to render a scene itself.

Attached sequence diagram can be helpful to understand this idea.

### Scrolling and Scaling

Scrolling and Scaling are also expensive job especially in the embedded device.

When we are scaling a web page, WebCore needs to update whole layout of its web page. As all of us knows, it is really expensive operation in the embedded platform.

Let’s assume that a user tries to scale a web page using a two finger gesture. The WebCore in the Web Process tries to produce scaled web page with a exact layout as soon as possible. However, if it is a embedded device, it can need more than 100 ms - 10 fps.

The Threaded Compositor renders a web page with a requested scale using its cached layer tree. By doing like that, we can give a immediate visual feed back to the user. And re-layouted web page will be updated later.

It is similar in the scrolling operation. When we are scrolling the web page, this operation doesn’t need a huge re-layout operation. But it needs re-painting and bliting operation to update a web page.

Because the Threaded Compositor uses TILED_BACKING_STORE, it can render the scrolled web page with cached tiles with a minimal latency. During ThreadedCompositor renders a web page, the main-thread scrolls "actual" view. Whenever the main-thread finishes "actual" scroll, these changes are collected by CompositingCoordinator. And these changes are passed to the ThreadedCompositor to update the view.

Unfortunately, these optimization is only supported when we are using fixed layout. To support this without fixed layout, we need to refactor TILED_BACKING_STORE in WebKit and implement proper overlay scrollbars in WebKitGTK+.

### Compositing the Media and the Canvas

Not only CSS3 Animation and scaling but also the video and canvas can be accelerated by the Threaded Compositor. Because it is little bit out of scope of this post - Because it is not implemented for upstreaming - I’ll only describe about it briefly here.

In the TextureMapper, Media (Plugins, Video element, …) and Canvas (WebGL, 2D Canvas) can be rendered by implementing TextureMapperPlatformLayer. Most important interface of the TextureMapperPlatformLayer is TextureMapperPlatformLayerClient::setPlatformLayerNeedsDisplay which requests the compositor to composite the its contents.

If the actual renderer of a Media and a Canvas element runs on off-the-main-thread, it is possible to bypass the main-thread entirely. The renderer can calls TextureMapperPlatformLayerClient::setPlatformLayerNeedsDisplay when it updates its result from its own thread. And compositing thread will composite the result without using the main-thread.

Also, if the target platform supports streams of texture, we can avoid pipeline hazards when drawing video layers in modern mobile GPUs which uses tile based rendering.

## Performance

This performance comparison was presented in Linux.Conf.AU 2013. It is based on pretty old implementation but it shows meaningful performance improvement compare to plain TextureMapper method. I could not regenerate this result using my current laptop because it is too fast to make stressed condition. I hope I can share updated the performance comparison using the embedded device soon.

• Test Cases [*]

• leaves n

• Modified the famous accelerated compositing example, WebKit Leaves, to draw n leaves.
• 3DCube n

• n cubes rotating using CSS3 animation.
• Test Environment

• Prototype of Threaded Compositor Implementation based on r134883
• WebKit2 Gtk+ r140386
• Intel Core i5-2400 3.10Ghz, Geforce GTS450, Ubuntu 12.04 x86_64
Tests ThreadedCompositor (fps) WebKit2 Gtk+ (fps) Improved %
leaves500 58.96 35.86 64.42
leaves1000 58.98 25.88 127.90
leaves2000 32.98 18.04 82.82
3DCube360 57.27 32.09 78.47
3DCube640 33.64 23.52 43.03
 [*] These tests are made by Company 100

## Class Diagram

I made this diagram to help my understanding during the implementation, but It would be good to share to help others who are interested in it.

• CompositingCoordinator

A class takes the responsibility of managing compositing resources in the main-thread.

This class has a dedicate thread for compositing. It owns ViewportController and CoordinatedGraphicsScene to render scene on the actual surface.

A concrete class of LayerTreeHost and CompositingCoordinator::Client for CoordinatedGraphics. And it has ThreadedCompositor to use the Threaded Compositor.

• CoordinatedGraphicsScene

A class which has a complete scene graph and rendering functionality. It synchronizes its own scene graph with a graphics layer tree in compositing coordinator.

This class implements a surface using ImageBuffer as a backend to use in the Web Process.

• TextureMapperLayer

It has actual properties to render a layer.

• ViewportController

This class is responsible to handle scale factor and scrolling position in the compositing thread.

In the main-thread, all of the visual changes of each GraphicsLayer are passed to CompositingCoordinator via CoordinatedGraphicsLayer. And CompositingCoordinator collects state changes from the GraphicsLayer tree when it is requested to do so.

From this point, Threaded Compositor and Coordinated Graphics behaves differently.

In Coordinated Graphics, the collected state changes of the layer tree is encoded and passed to CoordinatedLayerTreeHostProxy in UIProcess. And CoordinatedGraphicsScene applies these state changes to its own TextureMapperLayer tree and renders these on the frame buffer.

But in Threaded Compositor, these states are passed to CoordinatedGraphicsScene which owned by ThreadedCompositor. ThreadedCompositor has its own RunLoop and thread, all of the actual compositing operations are executed in the dedicated compositing thread in Web Process.

## Current Status

Most of re-factoring for Threaded Compositor in Coordinated Graphics side was done in the last year.

However, upstreaming of Threaded Compositor was delayed to various issues. I had to simplify the design (which was quite complicate at that time) and resolve various issues Before starting upstreaming process.

In this year, WebKitGTK+ decided to deprecate WebKit1. Because of that it was possible to make much less complicated design. Also, I fixed the Threaded Compositor not to break current behavior of WebKitGTK+ without fixed layout.

I’m happy that I can start upstreaming process of this implementation, from now on.

## Things To Do

• Reduce memory uses when updating texture

• RenderObjects draw its contents to UpdateAtlas to pass bitmaps to the compositing thread. However, The size of each UpdateAtlas is 4MB. It is quite big memory consumption in the embedded device. Still, I couldn’t find a way to solve it without using proprietary GPU driver.
• Avoid pipeline hazards in the mobile GPU

• Modern mobile GPUs uses tile-based [deferred] rendering. These GPUs can render multiple frames (~ 3 frames) concurrently to overcome its low performance. It is well working technique in the game industry which uses static textures.
• In the WebKit, most of textures are dynamically generated. Whenever we are updating a texture using texSubImage2D, GPU would be stalled because it would be used in the previous rendering call.
• To avoid this problem, chromium uses producer/consumer model of texture
• We can create a wrapper for a texture which provides multiple buffering. In this way we can avoid the pipeline hazards.
• Reduce the maintenance burden

• we need to re-factor Coordinated Graphics to share codes with Apple’s UI SIDE COMPOSITING codes. Most of messages are duplicated. So it can be easily done by defining platform specific parts.
• Accelerate the performance of 2D Canvas element and Video element

• Check the performance of Threaded Compositor in the embedded device

• I hope I can share about it soon.
• Most of all, upstreaming!

## November 12, 2014

### Announcing the TyGL-WebKit port to accelerate 2D web rendering with GPU

#### University of Szeged

We are proud to announce the TyGL port (link: http://github.com/szeged/TyGL) on the top of EFL-WebKit. TyGL (pronounced as tigel) is part of WebKit and provides 2D-accelerated GPU rendering on embedded systems. The engine is purely GPU based. It has been developed on and tested against ARM-Mali GPU, but it is designed to work on any GPU conforming to OpenGL ES 2.0 or higher.

The GPU involvement on future graphics is inevitable considering the pixel growth rate of displays, but harnessing the GPU power requires a different approach than CPU-based optimizations.

## October 28, 2014

### Manuel Rego: Presenting the Web Engines Hackfest

#### Igalia WebKit

After the Google’s fork back in April 2013, WebKit and Blink communities have been working independently, however patches often move from one project to another. In addition, a fair amount of the source code continues to be similar. Thus, it seems interesting to have a common place to discuss topics of shared interest and make plans for the future.

For that reason, Igalia is announcing the Web Engines Hackfest that would happen in December 7-10 in our main office in A Coruña (Spain). This is like a new edition of the WebKitGTK+ Hackfest that we’ve been organizing since 2009, but this year changing the focus in order to make a more inclusive event.

The Hackfest

The new event will include members from all parts of the Web Platform community, not restricted to WebKit and Blink, but open to the different Web (Gecko and Servo) and JavaScript (V8, JavaScriptCore and SpiderMoney) free software engines.

Past year hackfest picture by Claudio Saavedra

The idea is to bring together implementors from the Open Web Platform community in a 4-day hacking/discussion/unconference event with the goal of moving forward different topics, discuss common issues, make plans for the future, etc.

And that’s what makes this event something different. It’s a completely hacking oriented event, where developers sit together in small groups to work for a few days pursuing some specific goals. These groups of interest are defined at the beginning of the hackfest and they agree the different tasks to be done on during the event.
Thereby, don’t expect a full scheduling in advance with a list of talks and speakers like in a regular conference, this is something totally different focused on real hacking.

After the first round of invitations we’ve already a nice list of people from different companies that will be attending the hackfest this year.

Thanks to Adobe and Igalia for sponsoring the Web Engines Hackfest and make possible such an exciting event.

Looking forward to meet you there!

## October 22, 2014

#### University of Szeged

It's been a while since I last (and actually first) posted about Fuzzinator. Now I think that I have enough new experiences worth sharing.

More than a year ago, when I started fuzzing, I was mostly focusing on mutation-based fuzzer technologies since they were easy to build and pretty effective. Having a nice error-prone test suite (e.g. LayoutTests) was the warrant for fresh new bugs. At least for a while.

## What is ASM.JS?

Now that mobile computers and cloud services become part of our lives, more and more developers see the potential of the web and online applications. ASM.JS, a strict subset of JavaScript, is a technology that provides a way to achieve near native speed in browsers, without the need of any plugin or extension. It is also possible to cross-compile C/C++ programs to it and running them directly in your browser.

In this post we will compare the JavaScript and ASM.JS performance in different browsers, trying out various kinds of web applications and benchmarks.

## August 28, 2014

### CSS Shapes now available in Chrome 37 release

Support for CSS Shapes is now available in the latest Google Chrome 37 release.

### What can I do with CSS Shapes?

CSS Shapes lets you think out of the box! It gives you the ability to wrap content outside any shape. Shapes can be defined by geometric shapes, images, and even gradients. Using Shapes as part of your website design takes a visitor’s visual and reading experience to the next level. If you want to start with some tutorials, please go visit Sarah Soueidan’s article about Shapes.

### Demo

The following shapes use case is from the Good Looking Shapes Gallery blog post.

Without CSS Shapes
With CSS Shapes

In the first picture, we don’t use CSS Shapes. The text wraps around the rectangular image container, which leads to a lot of empty space between the text and the visible part of the image.

In the second picture, we use CSS Shapes. You can see the wrapping behavior around the image. In this case the white parts of the image are transparent, thus the browser can automatically wrap the content around the visible part, which leads to this nice and clean, visually more appealing wrapping behavior.

### How do I get CSS Shapes?

I’d like to thank the collaboration of WebKit and Blink engineers, and everyone else in the community who has contributed to this feature. The fact that Shapes is shipping in two production browsers — Chrome 37 now and Safari 8 later this year — is the upshot of the open source collaboration between the people who believe in a better, more expressive web. Although Shapes will be available in these browsers, you’ll need another solution for the other browsers. The CSS Shapes Polyfill is one method of achieving consistent behavior across browsers.

### Where should I start?

Let us know your thoughts or if you have nice demos, here or on Twitter: @AdobeWeb and @ZoltanWebKit.

## August 16, 2014

### A Quick'n'Dirty Set-up of an Aarch64 Ubuntu 14.04 VM with QEMU

#### University of Szeged

Lately, I came up with the idea to do some development on Aarch64. However, I couldn't get my hands on real hardware easily so I started to look for alternatives (i.e., emulators). The ARMv8 Foundation Model seemed to be the trivial solution but I've heard that QEMU is somewhat faster so I gave it a try. My goal was to set up the VM as quick as possible: reuse whatever is already "out there" and rebuild only what's utterly necessary. In the end it turned out that it's quite easy to get such a VM working ... once you know what you need.

## June 10, 2014

### Using ARIA 1.0 and the WebKit Accessibility Node Inspector

#### Surfin’ Safari

On the heels of the 25th birthday of the Web, WAI-ARIA 1.0—the specification for Accessible Rich Internet Applications—is a W3C Recommendation, thanks in part to WebKit’s implementation. Most major web applications use ARIA and related techniques to improve accessibility and general usability.

Many web developers are familiar with the simple parts of ARIA, such as retrofitting roles in legacy or otherwise non-semantic uses of HTML like <div role="button" tabindex="0">, but ARIA has other equally important uses:

• Improving languages like SVG where no accessibility semantics exist natively.
• Augmenting technologies like EPUB that build on existing HTML semantics.
• Allowing accessibility in native implementations, like the sub-DOM controls of <video> elements.
• Supporting accessibility and full keyboard access when HTML is insufficient, such as with data grids, tree controls, or application dialogs.
• Enabling accessible solutions where there is no equivalent semantic or functionality. For example, HTML has no concept similar to “live” regions.

More on these topics below, including how to diagnose and debug problems using new accessibility inspection features in the WebKit Web Inspector.

## Example 1: ARIA in an SVG Map of Africa

The Scalable Vector Graphics (SVG) language does not include semantics to indicate what type of content is represented, such as a chart, illustration, or interactive application control. However, ARIA roles and attributes can be used in SVG today for both raster- and vector-based images, and the SVG Working Group recently adopted ARIA officially into SVG 2.

The following video shows VoiceOver’s touchscreen navigation of an accessible map. It uses a simple role="img" on each country path, and an aria-labelledby relationship to associate that country path with the text label. After watching the video, view the source of the test case SVG map of Africa to see how it works.

Prior to WebKit’s implementation of ARIA in SVG, the best opportunity for a blind individual to experience spatial data like charts and maps was to print expensive tactile graphics on swell paper or with a modified braille embosser. Along with WebKit’s first native implementation of accessible MathML, accessible graphics represent new possibilities in the category of study collectively referred to as STEM: science, technology, engineering, and math.

Note: The test case SVG map of Africa is based on an original by Peter Collingridge, with accessibility added for the demo.

## Introducing the Accessibility Node Inspector

Recent nightly builds of WebKit include a new accessibility section in the node properties of the Web Inspector. Select any DOM element to see relevant accessibility information, such as the computed role.

The properties and relationships displayed come from WebCore. Accessibility children and parent nodes cannot be detected through a JavaScript interface and are not a one-to-one mapping to the DOM, so these relationships have not previously been available to web developers. Many other accessibility properties are likewise not detectable through the rendered DOM or a JavaScript interface.

We’ll use the WebKit Accessibility Node Inspector to diagnose and inspect the examples below.

## Complex ARIA Widget Examples

Many of the original features of ARIA (such as dialogs, landmarks, and menus) have been adopted into recent versions of HTML as elements. However, there are interaction patterns, common on the Web since the 1990s, that still have no native support or unreliable rendering support in HTML, such as date pickers, combo boxes, tab sets, data grids, tree controls, and more. Web developers must render all these controls themselves, using custom code or a JavaScript framework.

Making ARIA widgets can be challenging. The examples below illustrate some of the trickier bits of implementation. Debugging these controls is made easier by observing accessibility properties in the Web Inspector.

### Example 2: Selectable List Box with Full Keyboard Support Using Native Focus Events

This demo was created in 2009 for Apple’s World Wide Developer Conference (WWDC) and uses the “roaming tabindex” technique on a custom list box.

Assistive technologies do not change the DOM, so there’s no hidden magic happening. JavaScript running in the page uses standard event handling and DOM interfaces like setAttribute() and focus(). View the source or step through in the WebKit debugger for a deeper understanding.

For a full explanation of the techniques and test case roaming tabindex demo used in the video, see WWDC 2009: Improving Accessibility in Web Applications.

### Example 3: Combo Box with a “Status” Live Region

During the life cycle of a web application, there may be multiple points of user interest. In the list box example, the web application moves focus to the updated item, but moving focus is not always appropriate. In a combo box, keyboard focus should remain on the text field so the user can continue typing. The selected item in the related drop-down list is conveyed to the API when the selection changes, allowing a a screen reader to speak updates to both elements. Some combo boxes have an additional status display to indicate the total number of results. In this demo, we’ll use an ARIA “live region” for the status updates.

As with the previous example, there’s no hidden magic occurring in the DOM. JavaScript running in the page uses standard event handling and DOM interfaces like setAttribute(). View the source or step through in the WebKit debugger for a deeper understanding of the techniques.

As this combo box demo shows, the ability for an assistive technology to simultaneously follow and report changes to multiple points of user interest was never possible in web applications prior to ARIA.

## Major Contributors to WebKit’s ARIA Implementation

WebKit’s implementation of ARIA played a significant part in the ARIA 1.0 Recommendation milestone, and many individuals collaborated on the work.

The initial implementation was completed in 2008 by Alice Liu Barraclough and Beth Dakin. Much of the remaining ARIA implementation in WebCore, as well as the Mac and iOS platform mapping, was completed by Chris Fleizach. Sam White made major improvements to WebKit’s accessibility architecture. Jon Honeycutt, Brent Fulgham, Dominic Mazzoni, Mario Sánchez Prada, and others completed the platform mapping to the Windows and Linux accessibility APIs. Credit for the ARIA test harness and WebKit test results goes to Michael Cooper, Jon Gunderson, Mona Heath, Joseph Scheuhammer, Rich Schwerdtfeger, and others. The full list of working group acknowledgements is available in the ARIA spec.

The Web is a more enabling resource for everyone because of the efforts of these individuals. Thank you!

## Future Direction of ARIA

ARIA 1.0 has much room for improvement, but it’s an incredibly important step toward ensuring equal access, opportunity, and usability of the web platform.

Future work on ARIA will cover additional semantics for HTML, SVG, and EPUB, and some of the work proposed includes non-declarative JavaScript accessibility API support for custom view technologies like WebGL and Canvas. There is also work to be done for rich text editing that is beyond the capability of contenteditable, such as the custom display and input-proxied views that are used on application suites like Google Docs and iWork for iCloud.

## A Call to Action

Many of the widget libraries available in JavaScript frameworks do NOT include support for accessibility and keyboard behavior. If you are a front-end engineer, you have an opportunity to change this situation.

When you contribute to JavaScript UI libraries, include support for accessibility. Test your code for accessibility and keyboard behavior using focus() where appropriate. Detect and update your web app state based on user focus events. Don’t just style the CSS of controls to look focused; use real keyboard focus.

The amount of effort it takes to add and test for accessibility is well worth the fit-and-finish it will bring to your web app. You’ll improve the experience for all users.

Each of these videos is about an hour in length. They cover ARIA and related techniques in detail.

## June 02, 2014

### Introducing the JetStream benchmark suite

#### Surfin’ Safari

Today we are introducing a new WebKit JavaScript benchmark test suite, JetStream. JetStream codifies what our de facto process has been — to combine latency and throughput benchmarks with roughly equal weighting, and capturing both metrics of traditional JavaScript programming styles as well as new JavaScript-based technologies that have captured our imaginations. Scores on JetStream are a good indicator of the performance users would see in advanced web applications like games.

Optimizing the performance of our JavaScript engine is a high priority for the WebKit project. Examples of some of the improvements we introduced in the last year include concurrent compilation, generational GC, and the FTL JIT. Engineering such improvements requires focus: we try to prioritize high-impact projects over building and maintaining complex optimizations that have smaller benefits. Thus, we motivate performance work with benchmarks that illustrate the kinds of workloads that WebKit users will likely encounter. This philosophy of benchmark-driven development has long been part of WebKit.

## The previous state of JavaScript benchmarking

As we made enhancements to the WebKit JavaScript engine, we found that no single benchmark suite was entirely representative of the scenarios that we wanted to improve. We like that JSBench measures the performance of JavaScript code on popular websites, but WebKit already does very well on this benchmark. We like SunSpider for its coverage of commonly-used language constructs and for the fact that its running time is representative of the running time of code on the web, but it falls short for measuring peak throughput. We like Octane, but it skews too far in the other direction: it’s useful for determining our engine’s peak throughput but isn’t sensitive enough to the performance you’d be most likely to see on typical web workloads. It also downplays novel JavaScript technologies like asm.js; only one of Octane’s 15 benchmarks was an asm.js test, and this test ignores floating point performance.

Finding good asm.js benchmarks is difficult. Even though Emscripten is gaining mindshare, its tests are long-running and until recently, lacked a web harness. So we built our own asm.js benchmarks by using tests from the LLVM test suite. These C and C++ tests are used by LLVM developers to track performance improvements of the clang/LLVM compiler stack. Emscripten itself uses LLVM to generate JavaScript code. This makes the LLVM test suite particularly appropriate for testing how well a JavaScript engine handles native code. Another benefit of our new tests is that they are much quicker to run than the Emscripten test suite.

Having good JavaScript benchmarks allows us to confidently pursue ambitious improvements to WebKit. For example, SunSpider guided our concurrent compilation work, while the asm.js tests and Octane’s throughput tests motivated our work on the FTL JIT. But allowing our testing to be based on a hodgepodge of these different benchmark suites has become impractical. It’s difficult to tell contributors what they should be testing if there is no unified test suite that can tell them if their change had the desired effect on performance. We want one test suite that can report one score in the end, and we want this one score to be representative of WebKit’s future direction.

## Designing the new JetStream benchmark suite

Different WebKit components require different approaches to measuring performance. For example, for DOM performance, we just introduced the Speedometer benchmark. In some cases, the obvious approach works pretty well: for example, many layout and rendering optimizations can be driven by measuring page load time on representative web pages. But measuring the performance of a programming language implementation requires more subtlety. We want to increase the benchmarks’ sensitivity to core engine improvements, but not so much so that we lose perspective on how those engine improvements play out in real web sites. We want to minimize the opportunities for system noise to throw off our measurements, but anytime a workload is inherently prone to noise, we want a benchmark to show this. We want our benchmarks to represent a high-fidelity approximation of the workloads that WebKit users are likely to care about.

JetStream includes benchmarks from the SunSpider 1.0.2 and Octane 2 JavaScript benchmark suites. It also includes benchmarks from the LLVM compiler open source project, compiled to JavaScript using Emscripten 1.13. It also includes a benchmark based on the Apache Harmony open source project’s HashMap, hand-translated to JavaScript. More information about the benchmarks included in JetStream is available on the JetStream In Depth page.

We’re excited to be introducing this new benchmark. To run it, simply visit browserbench.org/JetStream. You can file bugs against the benchmark using WebKit’s bug management system under the Tools/Tests component.

### Speedometer: Benchmark for Web App Responsiveness

#### Surfin’ Safari

Today we are pleased to announce Speedometer, a new benchmark that measures the responsiveness of web applications.

## Benchmark

Speedometer measures simulated user interactions in web applications. Version 1.0 of Speedometer uses TodoMVC to simulate user actions for adding, completing, and removing to-do items. Speedometer repeats the same actions using DOM APIs — a core set of web platform APIs used extensively in web applications — as well as six popular JavaScript frameworks: Ember.js, Backbone.js, jQueryAngularJS, React, and Flight. Many of these frameworks are used on the most popular websites in the world, such as Facebook and Twitter. The performance of these types of operations depends on the speed of the DOM APIs, the JavaScript engine, CSS style resolution, layout, and other technologies.

## Motivation

When we set out to improve the performance of interactive web applications in WebKit last year, we looked for a benchmark to guide our work. However, many browser benchmarks we checked were micro-benchmarks, and didn’t reflect how DOM APIs were used in the real world, or how individual APIs interacted with the rest of the web browser engine.

For example, one popular DOM benchmark assigns the value of element.id to a global variable repeatedly inside a loop:

for (var i = 0; i < count; i++)
globalVariable = element.id; 

We like these micro-benchmarks for tracking regressions in heavily used DOM APIs like element.id. However, we couldn’t use them to guide our performance work because they don’t tell us the relative importance of each DOM API. With these micro-benchmarks, we could have easily over optimized APIs that don’t matter as much in actual web applications.

These micro-benchmarks can also encourage browser vendors to implement optimizations that don’t translate into any real-world benefit. In the example above for instance, some browser engines detect that element.id doesn’t have any side effect and eliminate the loop entirely; assigning the value exactly once. However, real-world websites rarely access element.id repeatedly without ever using the result or modifying the DOM in between.

Hence we decided to write a new benchmark for the end-to-end performance of a complete web application instead of testing individual DOM calls.

## Mechanics

We tried to make Speedometer faithfully simulate a typical workload on a demo application by replaying a sequence of user interactions. We did have to work around certain limitations of the browser, however. For instance, we call click() on each checkbox in order to simulate a mouse click since many browsers don’t allow web content to create fake mouse or keyboard events. To make the run time long enough to measure with the limited precision, we synchronously execute a large number of the operations, such as adding one hundred to-do items.

We also noticed that some browser engines have used an optimization strategy of doing some work asynchronously to reduce the run time of synchronous operations. Returning control back to JavaScript execution as soon as possible is worth pursuing. However, a holistic, accurate measurement of web application performance involves measuring when these related, asynchronous computations actually complete since they could still eat up a big proportion of the 16-millisecond-per-frame budget required to achieve smooth, 60 frames-per-second user interaction. Thus, we measure the time browsers spend executing those asynchronous tasks in Speedometer, estimated as the time between when a zero-second delay timer is scheduled and when it is fired.

It is worth noting that Speedometer is not meant to compare the performance of different JavaScript frameworks. The mechanism by which we simulate user actions is different for each framework, and we’re forcing frameworks to do more work synchronously than needed in some cases to ensure that run time can be measured.

## Optimizations in WebKit

Over the past eight months since we introduced the first beta version of Speedometer under a different name, we have been optimizing WebKit’s performance on this benchmark.

One of our biggest improvements on Speedometer came from making render tree creation lazy. In WebKit, we create render objects for each element that gets displayed on screen in order to compute each element’s style and position. To avoid thrash, we changed WebKit to only create an element’s render object when it was needed, giving WebKit a huge performance boost. We also made many DOM APIs not trigger synchronous style resolutions or synchronous layout so that we get further benefit from lazily creating the render tree.

Another area we made substantial improvements was JavaScript bindings, the layer that sits between C++ browser code and JavaScript. We removed redundant layers of abstraction and made more properties and member functions on DOM objects inline cacheable (See Introducing the WebKit FTL JIT). For example, we added a new JavaScriptCore feature to deal with named properties on the document object so that its attributes could be inline cached. We also optimized node lists returned by getElementsByTagName as well as inline caching the length property.

Finally, WebKit’s performance on Speedometer benefited from two major architectural changes. JavaScriptCore’s concurrent and parallel JIT (See Introducing the WebKit FTL JIT), which allows JavaScript to be compiled while the main thread is running other code, reduced the run time of JavaScripts in Speedometer. The CSS JIT, which allows CSS selectors to be compiled up front and rapidly checked against elements, reduced time spent in style resolution and made querySelector and querySelectorAll much faster.

Because Speedometer is an end-to-end benchmark that uses popular JavaScript frameworks, it also helped us catch surprising performance regressions. For instance, when we tried to optimize calls to bound functions in JavaScript by doing more work up front in Function.prototype.bind, we saw a few percent performance regression on Speedometer because many bound functions were called only once before being discarded. We initially doubted that this result reflected the behavior of real web applications, so we collected statistics on popular websites like Facebook and Twitter. To our surprise, we found exactly the same behavior: the average bound function was called only once or twice on the websites we studied.

## Future Plans

In future versions, we hope to add more variations of web applications and frameworks. If you know of any good demo web applications distributed under MIT license or BSD license that could be incorporated into Speedometer, please let us know.

With Speedometer, the web browser community now has a benchmark that measures the responsiveness of real-world web applications. We are looking forward to making further performance improvements in WebKit using Speedometer, and we hope other browser vendors will join us.

## May 13, 2014

### Introducing the WebKit FTL JIT

#### Surfin’ Safari

Just a decade ago, JavaScript – the programming language used to drive web page interactions – was thought to be too slow for serious application development. But thanks to continuous optimization efforts, it’s now possible to write sophisticated, high-performance applications – even graphics-intensive games – using portable standards-compliant JavaScript and HTML5. This post describes a new advancement in JavaScript optimization: the WebKit project has unified its existing JavaScript compilation infrastructure with the state-of-the-art LLVM optimizer. This will allow JavaScript programs to leverage sophisticated optimizations that were previously only available to native applications written in languages like C++ or Objective-C.

All major browser engines feature sophisticated JavaScript optimizations. In WebKit, we struck a balance between optimizations for the JavaScript applications that we see on the web today, and optimizations for the next generation of web content. Websites today serve large amounts of highly dynamic JavaScript code that typically runs for a relatively short time. The dominant cost in such code is the time spent loading it and the memory used to store it. These overheads are illustrated in JSBench, a benchmark directly based on existing web applications like Facebook and Twitter. WebKit has a history of performing very well on this benchmark.

WebKit also features a more advanced compiler that eliminates dynamic language overhead. This compiler benefits programs that are long-running, and where a large portion of the execution time is concentrated in a relatively small amount of code. Examples of such code include image filters, compression codecs, and gaming engines.

As we worked to improve WebKit’s optimizing compiler, we found that we were increasingly duplicating logic that would already be found in traditional ahead-of-time (AOT) compilers. Rather than continue replicating decades of compiler know-how, we instead investigated unifying WebKit’s compiler infrastructure with LLVM – an existing low-level compiler infrastructure. As of r167958, this project is no longer an investigation. I’m happy to report that our LLVM-based just-in-time (JIT) compiler, dubbed the FTL – short for Fourth Tier LLVM – has been enabled by default on the Mac and iOS ports.

This post summarizes the FTL engineering that was undertaken over the past year. It first reviews how WebKit’s JIT compilers worked prior to the FTL. Then it describes the FTL architecture along with how we solved some of the fundamental challenges of using LLVM as a dynamic language JIT. Finally, this post shows how the FTL enables a couple of JavaScript-specific optimizations.

## Overview of the WebKit JavaScript Engine

This section outlines how WebKit’s JavaScript engine worked prior to the addition of the FTL JIT. If you are already familiar with concepts such as profile-directed type inference and tiered compilation, you can skip this section.

Figure 1. The WebKit three-tier architecture. Arrows indicate on-stack replacement, or OSR for short.

Prior to the FTL JIT, WebKit used a three-tier strategy for optimizing JavaScript, as shown in Figure 1. The runtime can pick from three execution engines, or tiers, on a per-function basis. Each tier takes our internal JavaScript bytecode as input and either interprets or compiles the bytecode with varying levels of optimization. Three tiers are available: the LLInt (Low Level Interpreter), Baseline JIT, and DFG JIT (Data Flow Graph JIT). The LLInt is optimized for low latency start-up, while the DFG is optimized for high throughput. The first execution of any function always starts in the interpreter tier. As soon as any statement in the function executes more than 100 times, or the function is called more than 6 times (whichever comes first), execution is diverted into code compiled by the Baseline JIT. This eliminates some of the interpreter’s overhead but lacks any serious compiler optimizations. Once any statement executes more than 1000 times in Baseline code, or the Baseline function is invoked more than 66 times, we divert execution again to the DFG JIT. Execution can be diverted at almost any statement boundary by using a technique called on-stack replacement, or OSR for short. OSR allows us to deconstruct the bytecode state at bytecode instruction boundaries regardless of the execution engine, and reconstructs it for any other engine to continue execution using that engine. As Figure 1 shows, we can use this to transition execution to higher tiers (OSR entry) or to deoptimize down to a lower tier (OSR exit; more on that below).

The three-tier strategy provides a good balance between low latency and high throughput. Running an optimizing JIT such as the DFG on every JavaScript function would be prohibitively expensive; it would be like having to compile an application from source each time you wanted to run it. Optimizing compilers take time to run and JavaScript optimizing JIT compilers are no exception. The multi-tier strategy ensures that the amount of CPU time – and memory – that we commit to optimizing a JavaScript function is always proportional to the amount of CPU time that this function has previously used. The only a priori assumption about web content that our engine makes is that past execution frequency of individual functions is a good predictor for those functions’ future execution frequency. The three tiers each bring unique benefits. Thanks to the LLInt, we don’t waste time compiling, and space storing, code that only runs once. This might be initialization code in a top-level <script> tag or some JSONP content that sneaked past our specialized JSON path. This conserves total execution time since compiling takes longer than interpreting if each statement only executes once; the JIT compiler itself would still have to loop and switch over all of the instructions in the bytecode, just as the interpreter would do. The LLInt’s advantage over a JIT starts to diminish as soon as any instruction executes more than once. For frequently executed code, about three quarters of the execution time in the LLInt is spent on indirect branches for dispatching to the next bytecode instruction. This is why we tier up to the Baseline JIT after just 6 function invocations or if a statement reexecutes 100 times: this JIT is responsible primarily for removing interpreter dispatch overhead, but otherwise the code that it generates closely resembles the code for the interpreter. This enables the Baseline JIT to generate code very rapidly, but it leaves some performance optimizations on the table, such as register allocation and type specialization. We leave the advanced optimizations to the DFG JIT, which is invoked only after the code reexecutes in the Baseline JIT enough times.

Figure 2. The DFG JIT optimization pipeline. The DFG starts by converting bytecode into the DFG CPS form, which reveals data flow relationships between variables and temporaries. Then profiling information is used to infer guesses about types, and those guesses are used to insert a minimal set of type checks. Traditional compiler optimizations follow. The compiler finishes by generating machine code directly from the DFG CPS form.

The DFG JIT converts bytecode into a more optimizable form that attempts to reveal the data flow relationships between operations; the details of the DFG’s intermediate code representation are closest to the classic continuation-passing style (CPS) form that is often used in compilers for functional languages. The DFG does a combination of traditional compiler optimizations such as register allocation, control flow graph simplification, common subexpression elimination, dead code elimination, and sparse conditional constant propagation. But serious compilers require knowledge about the types of variables, and the structures of objects in the heap, to perform optimizations. Hence the DFG JIT uses profiling feedback from the LLInt and Baseline JIT to build predictions about the types of variables. This allows the DFG to construct a guess about the type of any incoming value. Consider the following simple program as an example:

function foo(a, b) { return a + b + 42; }

Looking at the source code alone, it is impossible to tell if a and b are numbers, strings, or objects; and if they are numbers, we cannot tell if they will be integers or doubles. Hence we have the LLInt and Baseline JIT collect value profiles for each variable whose value originates in something non-obvious, like a function parameter or load from the heap. The DFG won’t compile foo until it has executed lots of times, and each execution will contribute values to the profiles of these parameters. For example, if the parameters are integers, then the DFG will be able to compile this function with cheap ‘are you an integer?’ checks for both parameters, and overflow checks on the two additions. No paths for doubles, string concatenation, or calls to object methods like valueOf() will need to be generated, which saves both space and total execution time. If these checks fail, the DFG will use on-stack replacement (OSR) to transfer execution back to the Baseline JIT. In this way, the Baseline JIT serves as a permanent fall-back for whenever any of the DFG’s speculative optimizations are found to be invalid.

An additional form of profiling feedback are the Baseline JIT’s polymorphic inline caches. Polymorphic inline caches are a classic technique for optimizing dynamic dispatch that originated in the Smalltalk community. Consider the following JavaScript function:

function foo(o) { return o.f + o.g; }

In this example, the property accesses may lead to anything from a simple load from well-known locations in the heap to invocations of getters or even complicated DOM traps, like if o were a document object and “f” were the name of an element in the page. The Baseline JIT will initially execute these property accesses as fully polymorphic dispatch. But as it does so it will record the steps it takes, and will then modify the heap accesses in-place to be caches of the necessary steps to repeat a similar access in the future. For example, if the object has a property “f” at offset 16 from the base of the object, then the code will be modified to first quickly check if the incoming object consists of a property “f” at offset 16 and then perform the load. These caches are said to be inline because they are represented entirely as generated machine code. They are said to be polymorphic because if different object structures are encountered, the machine code is modified to switch on the previously encountered object types before doing a fully dynamic property lookup. When the DFG compiles this code, it will check if the inline cache is monomorphic – optimized for just one object structure – and if so, it will emit just a check for that object structure followed by the direct load. In this example, if o was always an object with properties “f” and “g” at invariant offsets, then the DFG would only need to emit one type check for o followed by two direct loads.

Figure 3. Relative speed-up (higher is better) on the Richards benchmark from each of the three tiers.

Figure 2 shows an overview of the DFG optimization pipeline. The DFG uses value profiles and inline caches as a starting point for type inference. This allows it to generate code that is specialized just for the types that we had previously observed. The DFG then inserts a minimal set of type checks to guard against the program using different types in the future. When these checks fail, the DFG will OSR exit back to Baseline JIT’s code. Since any failing check causes control to leave DFG code, the DFG compiler can optimize the code following the check under the assumption that the check does not need to be repeated. This enables the DFG to aggressively minimize the set of type checks. Most reuses of a variable will not need any checks.

Figure 3 shows the relative speed-up of each of the tiers on one representative benchmark, an OS simulation by Martin Richards. Each tier takes longer to generate code than the tiers below it, but the increase in throughput due to each tier more than makes up for it.

The three-tier strategy allows WebKit to adapt to the requirements and characteristics of JavaScript code. We choose the tier based on the amount of time the code is expected to spend running. We use the lower tiers to generate profiling information that serves to bootstrap type inference in the DFG JIT. This works well – but even with three tiers, we are forced to make trade-offs. We frequently found that tuning the DFG JIT to do more aggressive optimizations would improve performance on long-running code but would hurt short-running programs where the DFG’s compile time played a larger role in total run-time than the quality of the code that the DFG generated. This led us to believe that we should add a fourth tier.

## Architecting a Fourth Tier JIT

WebKit’s strengths arise from its ability to dynamically adapt to different JavaScript workloads. But this adaptability is only as powerful as the tiers that we have to choose from. The DFG couldn’t simultaneously be a low-latency optimizing compiler and a compiler that produced high-throughput code – any attempts to make the DFG produce better code would slow it down, and would increase latency for shorter-running code. Adding a fourth tier and using it for the heavier optimizations while keeping the DFG relatively light-weight allows us to balance the requirements of longer- and shorter-running code.

Figure 4. The DFG and FTL JIT optimization pipelines. We reuse most of the DFG phases including its CPS-based optimizations. The new FTL pipeline is a drop-in replacement for the third-tier DFG backend. It involves additional JavaScript-aware optimizations over DFG SSA form, followed by a phase that lowers DFG IR (intermediate representation) to LLVM IR. We then invoke LLVM’s optimization pipeline and LLVM’s MCJIT backend to generate machine code.

The FTL JIT is designed to bring aggressive C-like optimizations to JavaScript. We wanted to ensure that the work we put into the FTL impacted the largest variety of JavaScript programs – not just ones written in restricted subsets – so we reused our existing type inference engine from the DFG JIT and our existing OSR exit and inline cache infrastructure for dealing with dynamic types. But we also wanted the most aggressive and comprehensive low-level compiler optimizations. For this reason, we chose the Low Level Virtual Machine (LLVM) as our compiler backend. LLVM has a sophisticated production-quality optimization pipeline that features many of the things we knew we wanted, such as global value numbering, a mature instruction selector, and a sophisticated register allocator.

Figure 5. Relative speed-up (higher is better) on the Richards benchmark with all four tiers.

Figure 6. Execution time in milliseconds (lower is better) on a selection of asm.js benchmarks with the DFG and FTL. The FTL is on average 35% faster.

Figure 4 shows an overview of the FTL architecture. The FTL exists primarily in the form of an alternate backend for the DFG. Instead of invoking the DFG’s machine code generator – which is very fast but does almost no low-level optimizations – we convert the DFG’s representation of the JavaScript function into static single assignment (SSA) form and perform additional optimizations. Conversion to SSA is more expensive than the DFG’s original CPS conversion but it enables more powerful optimizations, like loop-invariant code motion. Once those optimizations are done, we convert the code from the DFG SSA form to LLVM IR. Since LLVM IR is also based on SSA, this is a straight-forward linear and one-to-many conversion. It only requires one pass over the code, and each instruction within the DFG SSA form results in one or more LLVM IR instructions. At this point, we eliminate any JavaScript-specific knowledge from the code. All JavaScript idioms are lowered to either an inlined implementation of the JavaScript idiom using LLVM IR, or a procedure call if we know that the code path is uncommon.

This architecture gives us freedom to add heavy-weight optimizations to the FTL while retaining the old DFG compiler as a third tier. We even have the ability to add new optimizations that work within both the DFG and FTL by adding them before the “fork” in the compilation pipeline. It also integrates well with the design of the LLVM IR: LLVM IR is statically typed, so we need to perform our dynamic language optimizations before we transition to LLVM IR. Fortunately, the DFG is already good at inferring types – so by the time that LLVM sees the code, we will have ascribed types to all variables. Converting JavaScript to LLVM IR turns out to be quite natural with our indirect approach – first the source code turns into WebKit’s bytecode, then into DFG CPS IR, then DFG SSA IR, and only then into LLVM IR. Each transformation is responsible for removing some of JavaScript’s dynamism and by the time we get to LLVM IR we will have eliminated all of it. This allows us to leverage LLVM’s optimization capabilities even for ordinary human-written JavaScript; for example on the Richards benchmark from Figure 3, the FTL provides us with an additional 40% performance boost over the DFG as shown in Figure 5. Figure 6 shows a summary of what the FTL gives us on asm.js benchmarks. Note that the FTL isn’t special-casing for asm.js by recognizing the "use asm" pragma. All of the performance is from the DFG’s type inference and LLVM’s low-level optimizing power.

The WebKit FTL JIT is the first major project to use the LLVM JIT infrastructure for profile-directed compilation of a dynamic language. To make this work, we needed to make some big changes – in WebKit and LLVM. LLVM needs significantly more time to compile code compared to our existing JITs. WebKit uses a sophisticated generational garbage collector, but LLVM does not support intrusive GC algorithms. Profile-driven compilation implies that we might invoke an optimizing compiler while the function is running and we may want to transfer the function’s execution into optimized code in the middle of a loop; to our knowledge the FTL is the first compiler to do on-stack-replacement for hot-loop transfer into LLVM-compiled code. Finally, LLVM previously had no support for the self-modifying code and deoptimization tricks that we rely on to handle dynamic code. These issues are described in detail below.

### Side-stepping LLVM compile times

Figure 7. The FTL is invoked only after the code had run in the LLInt, Baseline JIT, and DFG JIT for a while. By the time that we invoke the FTL compiler from a concurrent thread, the function in question is already executing relatively fast code generated by the DFG JIT. The FTL can then pull in profiling collected from the LLInt, Baseline JIT, and DFG JIT to maximize the precision with which it devirtualizes JavaScript idioms.

Compiler design is subject to a classic latency-throughput trade-off. It’s hard to design a compiler that can generate excellent code quickly – the compilers that generate code most rapidly also generate the worst code, while compilers like LLVM that generate excellent code are usually slow. Moreover, where a compiler lands in the latency-throughput continuum tends to be systemic; it’s hard to make a compiler that can morph from low-latency to high-throughput or vice-versa.

For example, the DFG’s backend generates mediocre code really quickly. It’s fast at generating code precisely because it does just a single pass over a relatively high-level representation; but this same feature means that it leaves lots of optimization opportunities unexploited. LLVM is the opposite. To maximize the number of optimization opportunities, LLVM operates over very low-level code that has the best chance of revealing redundancies and inefficient idioms. In the process of converting that code to machine code, LLVM performs multiple lowering transformations through several stages that involve different representations. Each representation reveals different opportunities for speed-ups. But this also means that there is no way to short-circuit LLVM’s slowness compared to the DFG. Just converting code to LLVM’s IR is more costly than running the entire DFG backend, because LLVM IR is so low-level. Such is the trade-off: compiler architects must choose a priori whether their compiler will have low latency or high throughput.

The FTL project is all about combining the benefits of high-throughput and low-latency compilation. For short running code, we ensure that we don’t pay the price of invoking LLVM unless it’s warranted by execution count profiling. For long-running code, we hide the cost of invoking LLVM by using a combination of concurrency and parallelism.

Our second and third tiers – the Baseline and DFG JITs – are invoked only when the function gets sufficiently “hot”. The FTL is no different. No function will be compiled by the FTL unless it has already been compiled by the DFG. The DFG then reuses the execution count profiling scheme from the Baseline JIT. Each loop back-edge adds 1 to an internal per-function counter. Returning adds 15 to the same counter. If the counter crosses zero, we trigger compilation. The counter starts out as a negative value that corresponds to an estimate of the number of “executions” of a function we desire before triggering a higher-tier compiler. For Baseline-to-DFG tier-up, we set the counter to -1000 × C where C is a function of the size of the compilation unit and the amount of available executable memory. C is usually close to 1. DFG-to-FTL tier-up is more aggressive; we set the counter to -100000 × C. This ensures that short-running code never results in an expensive LLVM-based compile. Any function that runs for more than approximately 10 milliseconds on modern hardware will get compiled by the FTL.

In addition to a conservative compilation policy, we minimize the FTL’s impact on start-up time by using concurrent and parallel compilation. The DFG has been a concurrent compiler for almost a year now. Every part of the DFG’s pipeline is run concurrently, including the bytecode parsing and profiling. With the FTL, we take this to the next level: we launch multiple FTL compiler threads and the LLVM portion of the FTL’s pipeline can even run concurrently to WebKit’s garbage collector. We also keep a dedicated DFG compiler thread that runs at a higher priority, ensuring that even if we have FTL compilations in the queue, any newly discovered DFG compilation opportunities take higher priority.

Figure 7 illustrates the timeline of a function as it tiers-up from the LLInt all the way to the FTL. Parsing and Baseline JIT compilation are still not concurrent. Once Baseline JIT code starts running, the main thread will never wait for the function to be compiled since all subsequent compilation happens concurrently. Also, the function will not get compiled by the DFG, much less the FTL, if it doesn’t run frequently enough.

### Integrating LLVM with garbage collection

Figure 8. The WebKit object model for JavaScript objects. Fixed-size objects and their inferred properties are stored in a small non-moving cell in a segregated free-list mark-sweep space. The payloads of variable-size objects such as arrays and dictionaries are stored in a movable butterfly in a bump-pointer quick-release mark-region copy space. We allow any of the code in WebKit – both the C++ code for our runtime and code generated by any of our JITs including LLVM – to directly reference both the cell and the butterfly and to have interior pointers to both.

WebKit uses a generational garbage collector with a non-moving space for object cells, which hold fixed-size data, and a hybrid mark-region/copy space for variable-size data (see Figure 8). The generation of objects is tracked by a sticky mark bit. Our collector also features additional optimizations such as parallel collection and carefully tuned allocation and barrier fast paths.

Our approach to integrating garbage collection with LLVM is designed to maximize throughput and collector precision while eliminating the need for LLVM to know anything about our garbage collector. The approach that we use, due to Joel Bartlett, has been known since the late 1980’s and we have already been using it for years. We believe, for high-performance systems like WebKit, that this approach is preferable to a fully conservative collector like Boehm-Demers-Weiser or various accurate collection strategies based on forcing variables to be spilled onto the stack. Full Boehm-style conservatism means that every pointer-sized word in the heap and on the stack is conservatively considered to possibly be a pointer. This is both slow – each word needs to be put through a pointer testing algorithm – and very imprecise, risking retention of lots of dead objects. On the other hand, forcing anything – pointers or values suspected to be pointers – onto the stack will result in slow-downs due to additional instructions to store those values to memory. One of our reasons for using LLVM is to get better register allocation; placing things on the stack goes against this goal. Fortunately, Bartlett garbage collection provides an elegant middle ground that still allows us to use a sophisticated generational copying collector.

In a Bartlett collector, objects in the heap are expected to have type maps that tell us which of their fields have pointers and how those pointers should be decoded. Objects that are only reachable from other heap objects may be moved and pointers to them may be updated. But the stack is not required to have a stack map and objects thought to be referenced from the stack must be pinned. The collector’s stack scan must: (1) ensure that it reads all registers in addition to the stack itself, (2) conservatively consider every pointer-width word to possibly be a pointer, and (3) pin any object that looks like it might be reachable from the stack. This means that the compilers we use to compile JavaScript are allowed to do anything they want with values and they do not have to know whether those values are pointers or not.

Because program stacks are ordinarily several orders of magnitude smaller than the rest of the heap, the likelihood of rogue values on the stack accidentally causing a large number of dead heap objects to be retained is very low. In fact, over 99% of objects in a typical heap are only referenced from other heap objects, which allows a Bartlett collector to move those objects – either for improved cache locality or to defragment the heap – just as a fully accurate collector would. Retention of some dead objects is of course possible, and we can tolerate it so long as it corresponds to only a small amount of space overhead. In practice, we’ve found that so long as we use additional stack sanitization tricks – such as ensuring that our compilers use compacted stack frames and occasionally zeroing regions of the stack that are known to be unused – the amount of dead retention is negligible and not worth worrying about.

We had initially opted to go with the Bartlett approach to garbage collection because it allowed our runtime to be written in ordinary C++ that directly referenced garbage collected objects in local variables. This is both more efficient – no overhead from object handles – and easier to maintain. The fact that this approach also allowed us to adopt LLVM as a compiler backend without requiring us to handicap our performance with stack spilling has only reinforced our confidence in this well-known algorithm.

### Hot-loop transfer

One of the challenges of using multiple execution engines for running JavaScript code is replacing code compiled by one compiler with code compiled by another, such as when we decide to tier-up from the DFG to the FTL. This is usually easy. Most functions have a short lifecycle – they will tend to return shortly after they are called. This is true even in the case of functions that account for a large fraction of total execution time. Such functions tend to be called frequently rather than running for a long time during a single invocation. Hence our primary mechanism for replacing code is to edit the data structures used for virtual call resolution. Additionally, for any calls that were devirtualized, we unlink all incoming calls to the function’s old code and then relinking them to the newly compiled code. For most functions this is enough: shortly after the newly optimized code is available, all previous calls into the function return and all future calls go into optimized code.

While this is effective for most functions, sometimes a function will run for a long time without returning. This happens most frequently in benchmarks – particularly microbenchmarks. But sometimes it affects real code as well. One example that we found was a typed array initialization loop, which runs 10× faster in the FTL than in the DFG, and the difference is so large that its total run-time in the DFG is significantly greater than the time it takes to compile the function with the FTL. For such functions, it makes sense to transfer execution from DFG-compiled code to FTL-compiled code while the function is running. This problem is called hot-loop transfer (since this will only happen if the function has a loop) and in WebKit’s code we refer to this as OSR entry.

It’s worth noting that LLInt-to-Baseline and Baseline-to-DFG tier-up already supports OSR entry. OSR entry into the Baseline JIT takes advantage of the fact that we trivially know how the Baseline JIT represents all variables at every instruction boundary. In fact, its representation is identical to the LLInt, so LLInt-to-Baseline OSR entry just involves jumping to the appropriate machine code address. OSR entry into the DFG JIT is slightly more complicated since the DFG JIT will typically represent state differently than the Baseline JIT. So the DFG makes OSR entry possible by treating the control flow graph of a function as if it had multiple entrypoints: one entrypoint for the start of the function, and one entrypoint at every loop header. OSR entry into the DFG operates much like a special function call. This inhibits many optimizations but it makes OSR entry very cheap.

Unfortunately the FTL JIT cannot use either approach. We want LLVM to have maximum freedom in how it represents state and there doesn’t exist any mechanism for asking LLVM how it represents state at arbitrary loop headers. This means we cannot do the Baseline style of OSR entry. Also, LLVM assumes that functions have a single entrypoint and changing this would require significant architectural changes. So we cannot do the DFG style of OSR entry, either. But more fundamentally, having LLVM assume that execution may enter a function through any path other than the function’s start would make some optimizations – like loop-invariant code motion, or LICM for short – harder. We want the FTL JIT to be able to move code out of loops. If the loop has two entrypoints – one along a path from the function’s start and another via OSR entry – then LICM would have to duplicate any code it moves out of loops, and hoist it into both possible entrypoints. This would represent a significant architectural challenge and would, in practice, mean that adding new optimizations to the FTL would be more challenging than adding those optimizations in a compiler that can make the single-entrypoint assumption. In short, the complexity of multiple entrypoints combined with the fact that LLVM currently doesn’t support it means that we’d prefer not to use such an approach in the FTL.

The approach we ended up using for hot-loop transfer is to have a separate copy of a function for each entrypoint that we want to use. When a function gets hot enough to warrant FTL compilation, we opportunistically assume that the function will eventually return and reenter the FTL by being called again. But, if the DFG version of the function continues to get hot after we already have an FTL compilation and we detect that it is getting hot because of execution counting in a loop, then we assume that it would be most profitable to enter through that loop. We then trigger a second compile of that function in a special mode called FTLForOSREntry, where we use the loop pre-header as the function’s entrypoint. This involves a transformation over DFG IR where we create a new starting block for the function and have that block load all of the state at the loop from a global scratch buffer. We then have the block jump into the loop. All of the code that previously preceded the loop ends up being dead. Thus, from LLVM’s standpoint, the function takes no arguments. Our runtime’s OSR entry machinery then has the DFG dump its state at the loop header into that global scratch buffer and then we “call” the LLVM-compiled function, which will load the state from that scratch buffer and continue execution.

FTL OSR entry may involve invoking LLVM multiple times, but we only let this happen in cases where we have strong evidence to suggest that the function absolutely needs it. The approach that we take requires zero new features from LLVM and it maximizes the optimization opportunities available to the FTL.

### Deoptimization and inline caches

Self-modifying code is a cornerstone of high-performance virtual machine design. It arises in three scenarios that we want our optimizing JavaScript compilers to be able to handle:

Partial compilation. It’s common for us to leave some parts of a function uncompiled. The most obvious benefit is saving memory and compile times. This comes into play when using speculative optimizations that involve type checks. In the DFG, each operation in the original JavaScript code usually results in at least one speculative check – either to validate some value’s type or to ensure that some condition, like an array index being in-bounds, is known to hold. In some carefully written JavaScript programs, or in programs generated by a transpiler like Emscripten, it’s common for the majority of these checks to be optimized out. But in plain human-written JavaScript code many of these checks will remain and having these checks is still better than not performing any speculative optimizations. Therefore it’s important for the FTL to be able to handle code that may have thousands of “side exit” paths that trigger OSR to the Baseline JIT. Having LLVM compile these paths eagerly wouldn’t make sense. Instead, we want to patch these paths in lazily by using self-modifying code.

Invalidation. The DFG optimization pipeline is capable of installing watchpoints on objects in the heap and various aspects of the runtime’s state. For example, objects that are used as prototypes may be watchpointed so that any accesses to instances of the prototype can constant-fold the prototype’s entries. We also use this technique for inferring which variables are constants and this makes it easy for developers to use JavaScript variables the same way that they would use “static final” variables in Java. When these watchpoints fire – for example because an inferred-constant variable is written to &dash we invalidate any optimized code that had made assumptions about those variables’ values. Invalidation means unlinking any calls to those functions and making sure that any callsites within the function immediately trigger OSR exit after their callees return. Both invalidation and the partial compilation are broadly part of our deoptimization strategy.

Polymorphic inline caches. The FTL has many different strategies for generating code for a heap access or function call. Both of these JavaScript operations may be polymorphic. They may operate over objects of many types, and the set of types that flow into the operation may be infinite – in JavaScript it is possible to create new objects or functions programmatically and our type inference may ascribe a different type to each object if a common type cannot be inferred. Most heap accesses and calls end up with a finite (and small) set of types that flow into it, and so the FTL can usually emit relatively straight-forward code. But our profiling may tell us that the set of types that flow into an operation is large. In that case, we have found that the best way of handling the operation is with an inline cache, just like what the Baseline JIT would do. Inline caches are a kind of dynamic devirtualization where the code we emit for the operation may be repatched as the types that the operation uses change.

Each of these scenarios can be thought of as self-modifying inline assembly, but at a lower level with a programmatic interface to patch the machine code contents and select which registers or stack locations should be used. Much like inline assembly, the actual content is decided by the client of LLVM rather than by LLVM itself. We refer to this new capability as a patchpoint and it is exposed in LLVM as the llvm.experimental.patchpoint intrinsic. The workflow of using the intrinsic looks as follows:

1. WebKit (or any LLVM client) decides which values need to be available to the patchpoint and chooses the constraints on those values’ representation. It’s possible to specify that the values could be given in any form, in which case they may end up as compile-time constants, registers, register with addend (the value is recoverable by taking some register and adding a constant value to it), or memory locations (the value is recoverable by loading from the address computed by taking some register and adding a constant to it). It’s also possible to specify more fine-grained constraints by having the values passed by calling convention, and then selecting the calling convention. One of the available calling conventions is anyreg, which is a very lightweight constraint – the values are always made available in registers but LLVM can choose which registers are used.

WebKit must also choose the size, in bytes, of the machine code snippet, along with the return type.

2. LLVM emits a nop sled – a sequence of nop instructions – whose length is the size that WebKit chose. Using a separate data section that LLVM hands over to WebKit, LLVM describes where the nop sled appears in the code along with the locations that WebKit can use to find all of the arguments that were passed to the patchpoint. Locations may be constants, registers with addends, or memory locations.

LLVM also reports which registers are definitely in use at each patchpoint. WebKit is allowed to use any of the unused registers without worrying about saving their values.

3. WebKit emits whatever machine code it wants within the nop sleds for each patchpoint. WebKit may later repatch that machine code in whatever way it wants.

In addition to patchpoints, we have a more constrained intrinsic called llvm.experimental.stackmap. This intrinsic is useful as an optimization for deoptimization. It serves as a guarantee to LLVM that we will only overwrite the nop sled with an unconditional jump out to external code. Either the stackmap is not patched, in which case it does nothing, or it is patched in which case execution will not fall through to the instructions after the stackmap. This could allow LLVM to optimize away the nop sled entirely. If WebKit overwrites the stackmap, it will be overwriting the machine code that comes immediately after the point where the stackmap would have been. We envision the “no fall-through” rule having other potential uses as we continue to work with the LLVM project to refine the stackmap intrinsic.

Armed with patchpoints and stackmaps, the FTL can use all of the self-modifying code tricks that the DFG and Baseline JITs already use for dealing with fully polymorphic code. This guarantees that tiering-up from the DFG to the FTL is never a slow-down.

## FTL-specific high-level optimizations

So far this post has given details on how we integrated with LLVM and managed to leverage its low-level optimization capabilities without losing the capabilities of our DFG JIT. But adding a higher-tier JIT also empowers us to do optimizations that would have been impossible if our tiering strategy ended with the DFG. The sections that follow show two capabilities that a fourth tier makes possible, that aren’t specific to LLVM.

### Polyvariant Devirtualization

Figure 9. JS idioms such as the inheritance.js or Prototype will funnel execution through helpers, such as the object constructor in this figure. This causes the helper to appear polymorphic. Note that it would not be polymorphic if it was inlined: inlining constructor at the new Point(...) callsite causes the call to initialize to always call Point’s initialize method.

To understand the power of polyvariance, let’s just consider the following common idiom for creating classes in JavaScript:

var Class = {
create: function() {
function constructor() {
this.initialize.apply(this, arguments);
}
return constructor;
}
}

Frameworks such as Prototype use a more sophisticated form of this idiom. It allows the creation of class-like objects by saying “Class.create()”. The returned “class” can then be used as a constructor. For example:

var Point = Class.create();
Point.prototype = {
initialize: function(x, y) {
this.x = x;
this.y = y;
}
};
var p = new Point(1, 2); // Creates a new object p such that p.x == 1, p.y == 2

Every object instantiated with this idiom will involve a call to constructor, and the constructor’s use of this.initialize will appear to be polymorphic. It may invoke the Point class’s initialize method, or it may invoke the initialize methods of any of the other classes created this idiom. Figure 9 illustrates this problem. The call to initialize appears polymorphic, thereby preventing us from completely inlining the body of Point’s constructor at the point where we call new Point(1, 2). The DFG will succeed in inlining the body of constructor(), but it will stop there. It will use a virtual call for invoking this.initialize.

But consider what would happen if we then asked the DFG to profile its virtual calls in the same way that the Baseline JIT does. In that case, the DFG would be able to report that if constructor is inlined into an expression that says new Point(), then the call to this.initialize always calls Point.prototype.initialize. We call this polyvariant profiling and it’s one of the reasons why we retain the DFG JIT even though the FTL JIT is more powerful – the DFG JIT acts as a polyvariant profiler by running profiling after inlining. This means that the FTL JIT doesn’t just do more traditional compiler optimizations than the DFG. It also sees a richer profile of the JavaScript code, which allows it to do more devirtualization and inlining.

It’s worth noting that having two different compiler tiers, like DFG and FTL, is not strictly necessary for getting polyvariant profiling. For example, we could have eliminated the DFG compiler entirely and just had the FTL “tier up” to itself after gathering additional profiling. We could even have infinite recompilation with the same optimizing compiler, with any polymorphic idioms compiled with profiling instrumentation. As soon as that instrumentation observed opportunities for any additional optimizations, we could compile the code again. This could work, but it relies on polymorphic code always having profiling in case it might eventually be found to be monomorphic after all. Profiling can be very cheap – such as in the case of an inline cache – but there is no such thing as a free lunch. The next section shows the benefits of aggressively “locking in” all of the paths of a polymorphic access and allowing LLVM to treat it as ordinary code. This is why we prefer to use an approach where after some number of executions, a function will no longer have profiling for any of its code paths and we use a low-latency optimizing compiler (the DFG) as the “last chance” for profiling.

Figure 10. Execution time in milliseconds (lower is better) with and without polyvariant devirtualization on the Raytrace benchmark.

Figure 10 shows one of the benchmark speed-ups we get from polyvariant devirtualization. This benchmark uses the Prototype-style class system and does many calls to constructors that are trivially inlineable if we can devirtualize through the constructor helper. Polyvariant devirtualization is a 38% speed-up on this benchmark.

### Polymorphic Inlining

The DFG has two strategies for optimizing heap accesses. Either the access is known to be completely monomorphic in which case the code for the access is inlined, or the access is turned into the same kind of inline cache that the Baseline JIT would have used. Inlined accesses are profitable because DFG IR can explicitly represent all of the steps needed for the access, such as the type checking, the loads of any intermediate pointers such as the butterfly (see Figure 8), any GC barriers, and finally the load or store. But if the access is believed to be polymorphic based on the available profiling, it’s better for the DFG to use an inline cache, for two reasons:

• The access may have been thought to be polymorphic but may actually end up being monomorphic. An inline cache can be very fast if the access ends up being monomorphic, perhaps because of imprecise profiling. As the previous section noted, the DFG sometimes sees imprecise profiling because the Baseline JIT knows nothing of calling context.
• The FTL wants to have additional profiling for polymorphic accesses in order to perform polyvariant devirtualization. Polymorphic inline caches are a natural form of profiling – we lazily emit only the code that we know we need for the types we know we saw, and so enumerating over the set of seen types is as simple as looking at what cases the inline cache ended up emitting. If we inlined the polymorphic access, we would lose this information, and so polyvariant profiling would be more difficult.

For these reasons, the DFG represents polymorphic operations – even ones where the number of incoming types is believed to be small and finite – as inline caches. But the FTL can inline the polymorphic operations just fine. In fact, doing so leads to some interesting optimization opportunities in LLVM.

We represent polymorphic heap accesses as single instructions in DFG IR. These instructions carry meta-data that summarizes all of the possible things that may happen if that instruction executes, but the actual control and data flow of the polymorphic operation is not revealed. This allows the DFG to cheaply perform “macro” optimizations on polymorphic heap accesses. For example, a polymorphic heap load is known to the DFG to be pure and so may be hoisted out of loops without any need to reason about its constituent control flow. But when we lower DFG IR to LLVM IR, we represent the polymorphic heap accesses as a switch over the predicted types and a basic block for each type. LLVM can then do optimizations on this switch. It can employ one of its many techniques for efficiently emitting code for switches. If we do multiple accesses on the same object in a row, LLVM’s control flow simplification can thread the control flow and eliminate redundant switches. More often, LLVM will find common code in the different switch cases. Consider an access like o.f where we know that o either has type T and “f” is at offset 24 or it has type U and “f” is at offset 32. We have seen LLVM be clever enough where it turns the access into something like:

o + 24 + ((o->type == U) < < 3)

In other words, what our compiler would have thought of as control flow, LLVM can convert to data flow, leading to faster and simpler code.

Figure 11. Execution time in milliseconds (lower is better) with and without polymorphic inlining on the Deltablue benchmark.

One of the benchmarks that benefited from our polymorphic inlining was Deltablue, shown in Figure 11. This program performs many heap accesses on objects with different structures. Revealing the nature of this polymorphism and the specific code paths for all of the different access modes on each field allows LLVM to perform additional optimizations, leading to a 18% speed-up.

## Conclusion

Unifying the compiler infrastructures of WebKit and LLVM has been an extraordinary ride, and we are excited to finally enable the FTL in WebKit trunk as of r167958. In many ways, work on the FTL is just beginning – we still need to increase the set of JavaScript operations that the FTL can compile and we still have unexplored performance opportunities. In a future post I will show more details on the FTL’s implementation along with open opportunities for improvement. For now, feel free to try it out yourself in a WebKit nightly. And, as always, be sure to file bugs!

### Good-Looking Shapes Gallery

As a modern consumer of media, you rarely crack open a magazine or a pamphlet or anything that would be characterized as “printed”. Let me suggest that you take a walk on the wild side. The next time you are in a doctor’s office, or a supermarket checkout lane, or a library, thumb though a magazine. Most of the layouts you’ll find inside can also be found on the web, but not all of them. Layouts where content hugs the boundaries of illustrations are common in print and rare on the web. One of the reasons non-rectangular contour-hugging layouts are uncommon on the web is that they are difficult to produce.

They are not difficult to produce anymore.

The CSS Shapes specification is now in the final stages of standardization. This feature enables flowing content around geometric shapes (like circles and polygons), as well as around shapes defined by an image’s alpha channel. Shapes make it easy to produce the kinds of layouts you can find in print today, with all the added flexibility and power that modern online media affords. You can use CSS Shapes right now with the latest builds of WebKit and Blink based browsers, like Safari and Chrome.

Development of CSS Shapes has been underway for about two years, and we’ve been regularly heralding its progress here. Many of those reports have focused on the evolution of the spec and implementations, and they’ve included examples that emphasized basics over beauty. This article is an attempt to tilt the balance back towards good-looking. Listed below are simple shapes demos that we think look pretty good. Everyone on Adobe’s CSS Shapes engineering team contributed at least one.

There’s a live CodePen.io version of each demo in the gallery. Click on the demo screenshot or one of the handy links to take a look. You’ll want to view the demos with a browser that supports Shapes and you’ll need to enable CSS Shapes in that browser. For example you can use a nightly build of the Safari browser or you can enable shapes in Chrome or Chrome Canary like this:

1. Copy and paste chrome://flags/#enable-experimental-web-platform-features into the address bar, then press enter.
2. Click the ‘Enable’ link within that section.
3. Click the ‘Relaunch Now’ button at the bottom of the browser window.

A few of the demos use the new Shapes Polyfill and will work in most browsers.

And now, without further ado, please have a look through our good-looking shapes gallery.

## Ozma of Oz

This demo reproduces the layout style that opens many of the chapters of the L. Frank Baum books, including Ozma of Oz.  The first page is often dominated by an illustration on the left or right. The chapter’s text conforms to the illustration, but not too tightly. The books were published over 100 years ago and they still look good print.  With CSS Shapes they can still look good on the web.

## Top Cap

The conventional “drop-cap” opens a paragraph by enlarging and highlighting the first letter, word or phrase. The drop-cap’s goal is to draw your attention to where you can start reading. This demo delivers the same effect by crowning the entire opening paragraph with a “top cap” that funnels your attention into the article. In both cases, what’s going on is a segue from a graphic element to the text.

## Violator

A violator is small element that “violates” rectangular text layout by encroaching on a corner or a small part of an edge. This layout idiom is common in short-form magazines and product packaging. That “new and improved” banner which blazes through the corner of thousands of consumer products (whether or not they are new or improved) – it’s a violator.

## Column Interest

When a print magazine feels the need to incorporate some column layout melodrama, they often reach for this idiom. The shape spans a pair of columns, which creates visual interest in the middle of the page. Without it you’d be faced with a wall of attention sapping text and more than likely turn the page.

## Caption

The old-school approach for including a caption with an image is to put the caption text alongside or below the image. Putting a caption on top of an image requires a little more finesse, since you have to ensure that the text doesn’t obscure anything important and that the text is rendered in a way that preserves readability.  The result can be relatively attractive.

This photograph was taken by Zoltan Horvath who has pointed out that I’ve combined a quote about tea with a picture of a ceremonial wine jug.  I apologize for briefly breaching that beverage boundary. It’s just a demo.

## Paging

With a layout like this, one could simple let the content wrap and around the shape on the right and then expand into the usual rectangle.  In this demo the content is served up a paragraph at a time, in response to the left and right arrow keys.

Note also: yes in fact the mate gourd is perched on exactly the same windowsill as the previous demo. Zoltan and Pope Francis are among the many fans of yerba mate tea.

## Ersatz shape-inside

Originally the CSS Shapes spec included shape-inside as well as shape-outside. Sadly, shape-inside was promoted to “Level 2″ of the spec and isn’t available in the current implementations. Fortunately for shape insiders everywhere, it’s still sometimes possible to mimic shape-inside with an adjacent pair of carefully designed shape-outside floats. This demo is a nice example of that, where the text appears inside a bowl of oatmeal.

## Animation

This is an animated demo, so to appreciate it you’ll really need to take a look at the live version. It is an example of using an animated shape to draw the user’s attention to a particular message.  Of course one must use this approach with restraint, since an animated loop on a web page doesn’t just gently tug at the user’s attention. It drags at their attention like a tractor beam.

## Performance

Advertisements are intended to grab the user’s attention and a second or two of animation will do that. In this demo a series of transition motions have been strung together into a tiny performance that will temporarily get the reader’s attention. The highlight of the performance is – of course – the text snapping into the robot’s contour for the finale. Try and imagine a soundtrack that punctuates the action with some whirring and clanking noises, it’s even better that way.

## April 24, 2014

### Adobe Web Platform Goes to the 2014 WebKit Contributors’ Meeting

Last week, Apple hosted the 2014 WebKit Contributors’ Meeting at their campus in Cupertino. As usual it was an unconference-style event, with session scheduling happening on the morning of the first day. While much of the session content was very specific to WebKit implementation, there were topics covered that are interesting to the wider web community. This post is a roundup of some of these topics from the sessions that Adobe Web Platform Team members attended.

### CSS Custom Properties for Cascading Variables

Alan Stearns suggested a session on planning a new implementation of CSS Custom Properties for Cascading Variables. While implementations of this spec have been attempted in WebKit in the past, they never got past the experimental stage. Despite this, there is still much interest in implementing this feature. In addition, the current version of the spec has addressed many of the issues that WebKit contributors had previously expressed. We talked about a possible issue with using variables in custom property values, which Alan is investigating. More detail is available in the notes from the Custom Properties session.

### CSS Regions

Andrei Bucur presented the current state of the CSS Regions implementation in WebKit. The presentation was well received and well attended. Notably, this was one of the few sessions with enough interest that it had a time slot all to itself.

While CSS Regions shipped last year in iOS 7 and Safari 6.1 and 7, the implementation in WebKit hasn’t been standing still. Andrei mentioned the following short list of changes in WebKit since the last Safari release:

• correct painting of fragments and overflow
• scrollable regions
• accelerated content inside regions
• position: fixed elements
• the regionoversetchange event
• better selection
• better WebInspector integration
• and more…

Andrei’s slides outlining the state of CSS Regions also contain a roadmap for the feature’s future in WebKit as well as a nice demo of the fix to fragment and overflow handling. If you are following the progress of CSS Regions in WebKit, the slides are definitely worth a look. (As of this writing, the Regions demo in the slides only works in Safari and WebKit Nightly.)

### CSS Shapes

Zoltan Horvath, Bear Travis, and I covered the current state of CSS Shapes in WebKit. We are almost done implementing the functionality in Level 1 of the CSS Shapes Specification (which is itself a Candidate Recommendation, the last step before becoming an official W3C standard). The discussion in this session was very positive. We received good feedback on use cases for shape-outside and even talked a bit about the possibilities for when shape-inside is revisited as part of CSS Shapes Level 2. While I don’t have any slides or demos to share at the moment, we will soon be publishing a blog post to bring everyone up to date on the latest in CSS Shapes. So watch this space for more!

### Subpixel Layout

This session was mostly about implementation. However, Zalan Bujtas drew an interesting distinction between subpixel layout and subpixel painting. Subpixel layout allows for better space utilization when laying out elements on the page, as boxes can be sized and positioned more precisely using fractional units. Subpixel painting allows for better utilization of high DPI displays by actually drawing elements on the screen using fractional CSS pixels (For example: on a 2x “Retina” display, half of a CSS pixel is one device pixel). Subpixel painting allows for much cleaner lines and smoother animations on high DPI displays when combined with subpixel layout. While subpixel layout is currently implemented in WebKit, subpixel painting is currently a work in progress.

### Web Inspector

The Web Inspector is full of shiny new features. The front-end continues to shift to a new design, while the back-end gets cleaned up to remove cruft. The architecture for custom visual property editors is in place and will hopefully enable quick and intuitive editing of gradients, transforms, and animations in the future. Other goodies include new breakpoint actions (like value logging), a redesigned timeline, and IndexedDB debugging support. The Web Inspector still has room for new features, and you can always check out the #webkit-inspector channel on freenode IRC for the latest and greatest.

### Web Components

The Web Components set of features continues to gather interest from the browser community. Web Components is made up of four different features: HTML Components, HTML Imports, Shadow DOM, and HTML Templates. The general gist of the talk was that the Web Components concepts are desirable, but there are concerns that the features’ complexity may make implementation difficult. The main concerns seemed to center around performance and encapsulation with Shadow DOM, and will hopefully be addressed with a prototype implementation of the feature (in the works). You can also take a look at the slides from the Web Components session.

### CSS Grid Layout

The WebKit implementation of the CSS Grid Layout specification is relatively advanced. After learning in this session that the only way to test out Grid Layout in WebKit was to make a custom build with it enabled, session attendees concluded that it should be turned on by default in the WebKit Nightlies. So in the near future, experimenting with Grid Layout in WebKit should be as easy as installing a nightly build.

### More?

As I mentioned earlier, this was just a high-level overview of a few of the topics at this year’s WebKit Contributors’ Meeting. Notes and slides for some of the topics not mentioned here are available on the 2014 WebKit Meeting page in the wiki. The WebKit project is always welcoming new contributors, so if you happen to see a topic on that wiki page that interests you, feel free to get in touch with the community and see how you can get involved.

### Acknowledgements

This post would not have been possible without the notes and editing assistance of my colleagues on the Adobe Web Platform Team that attended the meeting along with me: Alan Stearns, Andrei Bucur, Bear Travis, and Zoltan Horvath.

## March 26, 2014

### Little overview of WebKit’s CSS JIT Compiler

#### Surfin’ Safari

When it comes to performance, JavaScript generally gets the headlines. But if you look carefully, web pages are not all JavaScript, the performance of all the other parts also has a dramatic impact on the user experience. Observing only JavaScript gives a single point of view over the complex mosaic that is web performance.

Making CSS faster and more scalable is an area of research in the WebKit project. The DOM and CSS do not always scale very well and it is sadly still common to see missed frames during complex animations.

A technique we use to speed things up in JavaScript is eliminating slow paths with Just In Time Compilers (JIT). CSS is a great candidate for JIT compilation, and since the end of last year, WebKit compiles certain CSS Selectors on the fly.

This blog introduces the CSS Selector JIT, its pros and cons, and how you can help us make CSS better.

## How and why compile CSS?

CSS is the language telling the engine how to present the DOM tree on screen. Every web engine uses a complex machinery that goes over everything in the DOM and applies the rules defined by CSS.

There is no direct link between the DOM tree and the style rules. For each element, the engine needs to collect a list of rules that apply to it. The way CSS and the DOM tree are connected is through the CSS selectors.

Each selector describes in a compact format what properties are required for an element to get a particular style. The engine has to figure out which elements have the required properties, and this is done by testing the selectors on the elements. As you can imagine, this is a complicated task for anything but the most trivial DOM tree (fortunately WebKit has many optimizations in this area).

So how did we make this faster? A simple solution we picked is to make testing a selector on an element very fast.

The way selector matching used to work is through a software machine, the SelectorChecker instance, taking two inputs: a selector and an input element. Given the inputs, a SelectorChecker goes over each part of the selector, and tries to find the required properties in the tree ending with the input element.

The following illustrates a simplified version of how selector testing used to work:

The problem with SelectorChecker is that it needs to be completely generic. We had a complicated selector interpreter, capable of handling any combination of difficult cases for any selector. Unfortunately, big generic machines are not exactly fast.

When using the CSS JIT, the task of matching a selector is split in two: first compiling, then testing. A JIT compiler takes the selector, does all the complicated computations when compiling, and generates a tiny binary blob corresponding to the input selector: a compiled selector. When it is time to find if an element that matches the selector, WebKit can just invoke the compiled selector.

The following animation illustrates the same process as above, but using the CSS Selector JIT:

Obviously, all the complexity related to CSS Selectors is still there. The Selector JIT Compiler is a big generic machine, just like SelectorChecker was. What changed is that most of that complexity has been moved to compilation time, and it only happens once. The binary generated at runtime is only as complex as the input selector.

## There is beauty in simplicity

Although one might think that employing a JIT always makes execution faster, it is a fallacy. The truth is adding a compiler starts by making everything slower, and the compiler makes up for it by creating very fast machine code. The overall process is only a gain when the combined execution time of the compiler and compiled code is smaller than the execution time of the compiler.

When the workload is small, the time taken by the compiler is bigger than the gain. For example, let’s say we have a JIT compiler that is 4 times slower than SelectorChecker, but the compiled code is 4 times as fast as SelectorChecker. Here is the time diagram of one execution:

With this kind of timing, we can run 5 full queries on the old C++ selector checker and still be faster than the JIT.

When the JIT compiler is fast enough and the workload is large enough, the compiled version wins:

This constraint is also a reason why benchmarks running for a long time can be misleading, they can hide slow compilers. JIT compilers can help to have a great throughput for long running programs, but no real web page behaves that way. The latency introduced by compilation also has the potential to become a disaster for animations.

Does this mean we shot ourselves in the foot by making something that is only fast in benchmarks? Not really, we fixed that problem too

There are several ways to mitigate the latency introduced by a JIT compiler. JavaScriptCore uses multiple advanced subsystems to reach that goal. So far, the Selector JIT can get away with a simple solution: make the compiler extremely fast.

There are two key parts to the speed of this compiler.

1. First, the compiler is very simple. Making optimizations can take a lot of time, so we decided to optimize very little. The generated binary is not perfect but it is fast to produce.
2. The second trick is to use very fast binary generation. To do that, the compiler is built on top of JavaScriptCore’s infrastructure. JavaScriptCore has tools to generate binaries extremely quickly, and we use that directly in WebCore.

In the most recent versions of the JIT, the compilation phase is within one order of magnitude of a single execution of SelectorChecker. Given that even small pages have dozen of selectors and hundreds of elements, it becomes easy to reclaim the time taken by the compiler.

## How fast is it?

To give an idea of order of magnitude, I have prepared a little microbenchmark for this blog. It tests various use cases, including things that used to be slow on WebKit.

On my Retina Macbook Pro, the benchmark runs in about 1100 milliseconds on a WebKit from December, and in less than 500 milliseconds on today’s WebKit nightly. A gain of 2x is generally what we expect on common selectors.

Obviously, the speed gains depends greatly on the page. Gains are sometimes much larger if the old WebKit was hitting one of the slow paths, or could be smaller for selectors that are either trivial or not compiled. I expect a lot to change in the future and I hope we will get even more feedback to help shaping the future of CSS performance.

The functions querySelector() and querySelectorAll() currently share a large part of infrastructure with style resolution. In many cases, both functions will also enjoy the CSS JIT Compiler.

Typically, the querySelector API is used quite differently from style resolution. As a result, we optimize it separately so that each subsystem can be the fastest for its specific use cases. A side effect of this is that querySelector does not always give a good picture of selector performance for style resolution, and vice versa.

## How can you help?

There is ongoing work to support everything SelectorChecker can handle. Currently, some pseudo types are not supported by the JIT compiler and WebKit fall backs to the old code. The missing pieces are being added little by little.

There are many opportunities to help making CSS faster. The existing benchmarks for CSS are extremely limited, there is nothing like JSBench for CSS. As a result, the input we get from performance problems on real websites is immensely valuable.

If you are a web developer, or a WebKit enthusiast, try your favorite website with WebKit Nigthly. If you run into performance problems with CSS, please file a bug on WebKit’s bug tracker. So far, every single bug that has been filed about the CSS JIT has be hugely useful.

Finally, if you are interested in the implementation details, everything is open source and available on webkit.org. You are welcome to help making the web better

You can send me questions to @awfulben on twitter. For more in-depth discussions, you can write an email on webkit-help (or maybe file a bug report).

## March 18, 2014

### QtWebKit is no more, what now?

#### Gustavo Noronha

Driven by the technical choices of some of our early clients, QtWebKit was one of the first web engines Collabora worked on, building the initial support for NPAPI plugins and more. Since then we had kept in touch with the project from time to time when helping clients with specific issues, hardware or software integration, and particularly GStreamer-related work.

With Google forking Blink off WebKit, a decision had to be made by all vendors of browsers and platform APIs based on WebKit on whether to stay or follow Google instead. After quite a bit of consideration and prototyping, the Qt team decided to take the second option and build the QtWebEngine library to replace QtWebKit.

The main advantage of WebKit over Blink for engine vendors is the ability to implement custom platform support. That meant QtWebKit was able to use Qt graphics and networking APIs and other Qt technologies for all of the platform-integration needs. It also enjoyed the great flexibility of using GStreamer to implement HTML5 media. GStreamer brings hardware-acceleration capabilities, support for several media formats and the ability to expand that support without having to change the engine itself.

People who are using QtWebKit because of its being Gstreamer-powered will probably be better served by switching to one of the remaining GStreamer-based ports, such as WebKitGTK+. Those who don’t care about the underlying technologies but really need or want to use Qt APIs will be better served by porting to the new QtWebEngine.

It’s important to note though that QtWebEngine drops support for Android and iOS as well as several features that allowed tight integration with the Qt platform, such as DOM manipulation through the QWebElement APIs, making QObject instances available to web applications, and the ability to set the QNetworkAccessManager used for downloading resources, which allowed for fine-grained control of the requests and sharing of cookies and cache.

It might also make sense to go Chromium/Blink, either by using the Chrome Content API, or switching to one its siblings (QtWebEngine included) if the goal is to make a browser which needs no integration with existing toolkits or environments. You will be limited to the formats supported by Chrome and the hardware platforms targeted by Google. Blink does not allow multiple implementations of the platform support layer, so you are stuck with what upstream decides to ship, or with a fork to maintain.

It is a good alternative when Android itself is the main target. That is the technology used to build its main browser. The main advantage here is you get to follow Chrome’s fast-paced development and great support for the targeted hardware out of the box. If you need to support custom hardware or to be flexible on the kinds of media you would like to support, then WebKit still makes more sense in the long run, since that support can be maintained upstream.

At Collabora we’ve dealt with several WebKit ports over the years, and still actively maintain the custom WebKit Clutter port out of tree for clients. We have also done quite a bit of work on Chromium-powered projects. Some of the decisions you have to make are not easy and we believe we can help. Not sure what to do next? If you have that on your plate, get in touch!

## February 25, 2014

### Improving your site’s visual details: CSS3 text-align-last

In this post, I want to give a status report regarding the text-align-last CSS3 property. If you are interested in taking control of the small visual details of your site with CSS, I encourage you to keep reading.

The problem

First, let’s talk about why we need this property. You’ve probably already seen many text blocks on pages that don’t quite seem visually correct, because the last line isn’t justified with the previous lines. Check out the example paragraph below:

In the first column, the last line isn’t justified. This is the expected behavior, when you apply the ‘text-align: justify’ CSS property on a container. On the other hand, in the second column, the content is entirely justified, including the last line.

The solution

This magic is the ‘text-align-last’ CSS3 property, which is set to justify on the second container. The text-align-last property is part of the CSS Text Module Level 3 specification, which is currently a working draft. The text-align-last property describes how the last line of a block or a line right before a forced line break is aligned when ‘text-align’ is ‘justify’, which means you gain full control over the alignment of the last line of a block. The property allows several more options, which you can read about on WebPlatform.org docs, or the CSS Text Module Level 3 W3C Specification.

A possible use case (Added April – 2014)

After looking at the previous example (which was rather focusing on the functionality of the property), let’s move on to a more realistic use case. The feature is perfect to make our multi-line captions look better. Check out the centered, and the justified image caption examples below.

And now, compare them with a justified, multi-line caption, where the last line has been centered by text-align-last: center.

I think the proper alignment of the last line gives a better overlook to the caption.

Browser Support

I recently added rendering support for the property in WebKit (Safari) based on the latest specification. Dongwoo Joshua Im from Samsung added rendering support in Blink (Chrome). If you like to try it out in WebKit, you’ll need to make a custom developer build and use the CSS3 text support build flag (--css3-text).

The property is already included in Blink’s developer nightlies by default, so after launching your latest Chrome Canary, you only need to enable ‘Enable experimental Web Platform features’ under chrome://flags, and enjoy the full control over your last lines.

Developer note

Please keep in mind that both the W3C specification and the implementations are under experimental status. I’ll keep blogging about the feature and let you know if anything changes, including when the feature ships for production use!

## January 05, 2014

### Funding MathML Developments in Gecko and WebKit (part 2)

#### Frédéric Wang

As I mentioned three months ago, I wanted to start a crowdfunding campaign so that I can have more time to devote to MathML developments in browsers and (at least for Mozilla) continue to mentor volunteer contributors. Rather than doing several crowdfunding campaigns for small features, I finally decided to do a single crowdfunding campaign with Ulule so that I only have to worry only once about the funding. This also sounded more convenient for me to rely on some French/EU website regarding legal issues, taxes etc. Also, just like Kickstarter it's possible with Ulule to offer some "rewards" to backers according to the level of contributions, so that gives a better way to motivate them.

As everybody following MathML activities noticed, big companies/organizations do not want to significantly invest in funding MathML developments at the moment. So the rationale for a crowdfunding campaign is to rely on the support of the current community and on the help of smaller companies/organizations that have business interest in it. Each one can give a small contribution and these contributions sum up in enough money to fund the project. Of course this model is probably not viable for a long term perspective, but at least this allows to start something instead of complaining without acting ; and to show bigger actors that there is a demand for these developments. As indicated on the Ulule Website, this is a way to start some relationship and to build a community around a common project. My hope is that it could lead to a long term funding of MathML developments and better partnership between the various actors.

Because one of the main demand for MathML (besides accessibility) is in EPUB, I've included in the project goals a collection of documents that demonstrate advanced Web features with native MathML. That way I can offer more concrete rewards to people and federate them around the project. Indeed, many of the work needed to improve the MathML rendering requires some preliminary "code refactoring" which is not really exciting or immediately visible to users...

Hence I launched the crowdfunding campaign the 19th of November and we reached 1/3 of the minimal funding goal in only three days! This was mainly thanks to the support of individuals from the MathML community. In mid december we reached the minimal funding goal after a significant contribution from the KWARC Group (Jacobs University Bremen, Germany) with which I have been in communication since the launch of the campaign. Currently, we are at 125% and this means that, minus the Ulule commision and my social/fiscal obligations, I will be able to work on the project during about 3 months.

I'd like to thank again all the companies, organizations and people who have supported the project so far! The crowdfunding campaign continues until the end of January so I hope more people will get involved. If you want better MathML in Web rendering engines and ebooks then please support this project, even a symbolic contribution. If you want to do a more significant contribution as a company/organization then note that Ulule is only providing a service to organize the crowdfunding campaign but otherwise the funding is legally treated the same as required by my self-employed status; feel free to contact me for any questions on the project or funding and discuss the long term perspective.

Finally, note that I've used my savings and I plan to continue like that until the official project launch in February. Below is a summary of what have been done during the five weeks before the holiday season. This is based on my weekly updates for supporters where you can also find references to the Bugzilla entries. Thanks to the Apple & Mozilla developers who spent time to review my patches!

## Collection of documents

The goal is to show how to use existing tools (LaTeXML, itex2MML, tex4ht etc) to build EPUB books for science and education using Web standards. The idea is to cover various domains (maths, physics, chemistry, education, engineering...) as well as Web features. Given that many scientific circles are too much biased by "math on paper / PDF" and closed research practices, it may look innovative to use the Open Web but to be honest the MathML language and its integration with other Web formats is well established for a long time. Hence in theory it should "just work" once you have native MathML support, without any circonvolutions or hacks. Here are a couple of features that are tested in the sample EPUB books that I wrote:

• Rendering of MathML equations (of course!). Since the screen size and resolution vary for e-readers, automatic line breaking / reflowing of the page is "naturally" tested and is an important distinction with respect to paper / PDF documents.
• CSS styling of the page and equations. This includes using (Web) fonts, which are very important for mathematical publishing.
• Using SVG schemas and how they can be mixed with MathML equations.
• Using non-ASCII (Arabic) characters and RTL/LTR rendering of both the text and equations.
• Interactive document using Javascript and <maction>, <input>, <button> etc. For those who are curious, I've created some videos for an algebra course and a lab practical.
• Using the <video> element to include short sequences of an experiment in a physics course.
• Using the <canvas> element to draw graphs of functions or of physical measurements.
• Using WebGL to draw interactive 3D schemas. At the moment, I've only adapted a chemistry course and used ChemDoodle to load Crystallographic Information Files (CIF) and provide 3D-representation of crystal structures. But of course, there is not any problem to put MathML equations in WebGL to create other kinds of scientific 3D schemas.

## WebKit

I've finished some work started as a MathJax developer, including the maction support requested by the KWARC Group. I then tried to focus on the main goals: rendering of token elements and more specifically operators (spacing and stretching).

• I improved LTR/RTL handling of equations (full RTL support is not implemented yet and not part of the project goal).
• I improved the maction elements and implemented the toggle actiontype.
• I refactored the code of some "mrow-like" elements to make them all behave like an <mrow> element. For example while WebKit stretched (some) operators in <mrow> elements it could not stretch them in <mstyle>, <merror> etc Similarly, this will be needed to implement correct spacing around operators in <mrow> and other "mrow-like" elements.
• I analyzed more carefully the vertical stretching of operators. I see at least two serious bugs to fix: baseline alignment and stretch size. I've uploaded an experimental patch to improve that.
• Preliminary work on the MathML Operator Dictionary. This dictionary contains various properties of operators like spacing and stretchiness and is fundamental for later work on operators.
• I have started to refactor the code for mi, mo and mfenced elements. This is also necessary for many serious bugs like the operator dictionary and the style of mi elements.
• I have written a patch to restore support for foreign objects in annotation-xml elements and to implement the same selection algorithm as Gecko.

## Gecko

I've continued to clean up the MathML code and to mentor volunteer contributors. The main goal is the support for the Open Type MATH table, at least for operator stretching.

• Xuan Hu's work on the <mpadded> element landed in trunk. This element is used to modify the spacing of equations, for example by some TeX-to-MathML generators.
• On Linux, I fixed a bug with preferred widths of MathML token elements. Concretely, when equations are used inside table cells or similar containers there is a bug that makes equations overflow the containers. Unfortunately, this bug is still present on Mac and Windows...
• James Kitchener implemented the mathvariant attribute (e.g used by some tools to write symbols like double-struck, fraktur etc). This also fixed remaining issues with preferred widths of MathML token elements. Khaled Hosny started to update his Amiri and XITS fonts to add the glyphs for Arabic mathvariants.
• I finished Quentin Headen's code refactoring of mtable. This allowed to fix some bugs like bad alignment with columnalign. This is also a preparation for future support for rowspacing and columnspacing.
• After the two previous points, it was finally possible to remove the private "_moz-" attributes. These were visible in the DOM or when manipulating MathML via Javascript (e.g. in editors, tree inspector, the html5lib etc)
• Khaled Hosny fixed a regression with script alignments. He started to work on improvements regarding italic correction when positioning scripts. Also, James Kitchener made some progress on script size correction via the Open Type "ssty" feature.
• I've refactored the stretchy operator code and prepared some patches to read the OpenType MATH table. You can try experimental support for new math fonts with e.g. Bill Gianopoulos' builds and the MathML Torture Tests.

MathML developments in Chrome or Internet Explorer is not part of the project goal, even if obviously MathML improvements to WebKit could hopefully be imported to Blink in the future. Users keep asking for MathML in IE and I hope that a solution will be found to save MathPlayer's work. In the meantime, I've sent a proposal to Google and Microsoft to implement fallback content (alttext and semantics annotation) so that authors can use it. This is just a couple of CSS rules that could be integrated in the user agent style sheet. Let's see which of the two companies is the most reactive...

## December 11, 2013

### WebKitGTK+ hackfest 5.0 (2013)!

#### Gustavo Noronha

For the fifth year in a row the fearless WebKitGTK+ hackers have gathered in A Coruña to bring GNOME and the web closer. Igalia has organized and hosted it as usual, welcoming a record 30 people to its office. The GNOME foundation has sponsored my trip allowing me to fly the cool 18 seats propeller airplane from Lisbon to A Coruña, which is a nice adventure, and have pulpo a feira for dinner, which I simply love! That in addition to enjoying the company of so many great hackers.

Web with wider tabs and the new prefs dialog

The goals for the hackfest have been ambitious, as usual, but we made good headway on them. Web the browser (AKA Epiphany) has seen a ton of little improvements, with Carlos splitting the shell search provider to a separate binary, which allowed us to remove some hacks from the session management code from the browser. It also makes testing changes to Web more convenient again. Jon McCan has been pounding at Web’s UI making it more sleek, with tabs that expand to make better use of available horizontal space in the tab bar, new dialogs for preferences, cookies and password handling. I have made my tiny contribution by making it not keep tabs that were created just for what turned out to be a download around. For this last day of hackfest I plan to also fix an issue with text encoding detection and help track down a hang that happens upon page load.

Martin Robinson and Dan Winship hack

Martin Robinson and myself have as usual dived into the more disgusting and wide-reaching maintainership tasks that we have lots of trouble pushing forward on our day-to-day lives. Porting our build system to CMake has been one of these long-term goals, not because we love CMake (we don’t) or because we hate autotools (we do), but because it should make people’s lives easier when adding new files to the build, and should also make our build less hacky and quicker – it is sad to see how slow our build can be when compared to something like Chromium, and we think a big part of the problem lies on how complex and dumb autotools and make can be. We have picked up a few of our old branches, brought them up-to-date and landed, which now lets us build the main WebKit2GTK+ library through cmake in trunk. This is an important first step, but there’s plenty to do.

Hackers take advantage of the icecream network for faster builds

Under the hood, Dan Winship has been pushing HTTP2 support for libsoup forward, with a dead-tree version of the spec by his side. He is refactoring libsoup internals to accomodate the new code paths. Still on the HTTP front, I have been updating soup’s MIME type sniffing support to match the newest living specification, which includes specification for several new types and a new security feature introduced by Internet Explorer and later adopted by other browsers. The huge task of preparing the ground for a one process per tab (or other kinds of process separation, this will still be topic for discussion for a while) has been pushed forward by several hackers, with Carlos Garcia and Andy Wingo leading the charge.

Jon and Guillaume battling code

Other than that I have been putting in some more work on improving the integration of the new Web Inspector with WebKitGTK+. Carlos has reviewed the patch to allow attaching the inspector to the right side of the window, but we have decided to split it in two, one providing the functionality and one the API that will allow browsers to customize how that is done. There’s a lot of work to be done here, I plan to land at least this first patch durign the hackfest. I have also fought one more battle in the never-ending User-Agent sniffing war, in which we cannot win, it looks like.

Hackers chillin’ at A Coruña

I am very happy to be here for the fifth year in a row, and I hope we will be meeting here for many more years to come! Thanks a lot to Igalia for sponsoring and hosting the hackfest, and to the GNOME foundation for making it possible for me to attend! See you in 2014!

## October 30, 2013

#### Surfin’ Safari

Co-written by Beth Dakin and Mihnea-Vlad Ovidenie

CSS regions are an exciting technology that make it easier than ever to create rich, magazine-like layouts within web content. Regions have been under development in WebKit for a while now, and we’re delighted to tell you that they are available for use in Safari on iOS 7, Safari 7 on Mavericks, and Safari 6.1 on Mountain Lion.

Magazine-like layout

So I wrote this little article for my personal blog:

That’s cool and all, but wouldn’t it be so much cooler if it had a more interesting layout like this?

So fab! Without regions, achieving a layout like this is a pain. You have to figure out exactly which parts of the article can fit into each box and then hard-code the article content into the appropriate boxes. And after all that work, the design will get totally messed up if the user changes the font size! The layout looks cool, but doing it this way is a lot of work, and it isn’t even a little bit flexible.

Regions make achieving this layout as easy as pie. They allow authors to indicate that some sections of content are intended to define an overall layout template for a portion of the document and that other sections of markup represent the content that is intended to fill that template. The semantically-related content that will flow through the template is called a “named flow.” In our example above, the named flow is the text of my article. Once it has been named, the named flow is distributed into disjointed containers called regions, which can be positioned in any way to achieve the desired layout.

Our simple example only scratches the surface of what you can do with regions. We’ll get to more sophisticated applications later, but first let’s take a closer look at the code.

What is a named flow?

A named flow is a collection of HTML elements extracted from the document’s normal flow and displayed separately. Any HTML element can be part of a named flow. When an element is collected in a named flow, all of its children are collected with it.

You identify a collection of HTML elements as a named flow by using the CSS property -webkit-flow-into. In our example, the named flow will be the elements that contain the text of our article:

<style>
#flow-content { -webkit-flow-into: pizza-manifesto; }
</style>
<div id=”flow-content”><h1>Pizza is amazing</h1>...</div>

Our example only has one, but a document can have any number of named flows, each with its own name.

Flowing into regions

A region is a block-level element that displays content from a named flow instead of its own content. Regions can have any size and can be positioned anywhere in the document. They are not required to be siblings or to be positioned next to each other in the layout.

A region consumes content from a single named flow. Most of the time, to achieve an interesting layout, there will be more than one region associated with a named flow, and when that is the case those regions form a region chain. When content from a named flow does not fit into a region, the content simply flows into the next region in the chain.

Making an element a region is as easy as adding the -webkit-flow-from CSS property. In our example, the regions are the elements that form the layout template for the document’s overall design:

<style>
.region { -webkit-flow-from: pizza-manifesto; }
</style>
<div class="region" id="region-1"></div>
<img src="pizza.jpg" width=512 height=342>
<div class="region" id="region-2"></div>
<div class="region" id="region-3"></div>
<div class="region" id="region-4"></div>
<div class="region" id="region-5"></div>

Take a look at the code for the actual document to see the code for the regions side-by-side with the code for the named flow.

One key thing to remember about regions is that they are only visual containers. Region elements do not become the DOM parents of the elements flowed inside them; they only establish the bounding boxes that visually constrain the flow content.

One cool feature in the CSS Regions specification is region styling. With region styling, a designer can style the content based on which region it ends up flowing through. For example, if you wanted to change the color of the text displayed in the second region of the article flow, you could do so with region styling:

<style>
@-webkit-region #region-2 {
p { color: green; }
}
</style>

The extra styles are dynamically applied behind the scenes whenever the layout of the article content in the regions changes. So for example, if the user resizes the browser window and different pieces of content end up flowing through the styled region, the content will update dynamically. At this time, you can only style regions with the CSS properties color and background-color, but we intend to progressively add support for more properties, so stay tuned! In the meantime, check out this version of our article that uses region styling.

There is also a whole object model available for interacting with regions and named flows from within JavaScript. The proposed API will make it even easier to create fluid designs that adapt to layout changes. For example, authors can use the API to determine whether or not there are enough regions to display the content from the named flow. Handy stuff!

Dreaming with regions

CSS regions are powerful, and when they are combined with other advanced CSS features like shapes, filters, flexible boxes, transforms, and media queries, incredibly sophisticated designs can emerge.

Back in February, during a CSS regions pattern rodeo hosted by CodePen, Tyler Fry and Joshua Hibbert created some awesome regions demos. Tyler won the contest with his reading carousel made out of regions and transforms, and Joshua created an exploding book, featuring a nice hover effect when opening the book.

The Adobe WebPlatform team has created some very compelling demos with regions in partnership with National Geographic. Check out this article that seamlessly integrates text and photographs to create a flexible design. Adobe has also created a demo that is so cutting edge you will need to download a WebKit Nightly to view it properly. This beautiful prototype uses regions, so the article content breaks up automatically across the different containers, and if font size or window size changes, or if the user zooms in, everything reflows automatically. Check out the source code here!

We are so excited about regions as a technology and excited that they are already available for use in shipping browsers. We plan to continue to refine our implementation in WebKit and to add additional features, so be sure to check back for improvements.

## October 15, 2013

### Funding MathML Developments in Gecko and WebKit

#### Frédéric Wang

update 2013-10-15: since I got feedback, I have to say that my funding plan is independent of my work at MathJax ; I'm not a MathJax employee but I have an independent contractor status. Actually, I already used my business to fund an intern for Gecko MathML developments during Summer 2011 :-)

## Retrospect

Since last April, I have been allowed by the MathJax Consortium to dedicate a small amount of my time to do MathML development in browsers, until possibly more serious involvement later. At the same time, we mentioned this plan to Google developers but unfortunately they just decided to drop the WebKit MathML code from Blink, making external contributions hard and unwelcome...

Hence I have focused mainly on Gecko and WebKit: You can find the MathML bugs that have been closed during that period on bugzilla.mozilla.org and bugs.webkit.org. For Gecko, this has allowed me to finish some of the work I started as a volunteer before I was involved full-time in MathJax as well as to continue to mentor MathML contributors. Regarding WebKit, I added a few new basic features like MathML lengths, <mspace> or <mmultiscripts> while I was getting familiar with the MathML code and WebKit organization/community. I also started to work on <semantics> and <maction>. More importantly, I worked with Martin Robinson to address the design concerns of Google developers and a patch to fix these issues finally landed early this week.

However, my progress has been slow so as I mentioned in my previous blog post, I am planning to find a way to fund MathML developments...

## Why funding MathML?

Note: I am assuming that the readers of this blog know why MathML is important and are aware of the benefits it can bring to the Web community. If not, please check Peter Krautzberger's Interview by Fidus Writer or the MozSummit MathML slides for a quick introduction. Here my point is to explain why we need more than volunteer-driven development for MathML.

First the obvious thing: Volunteer time is limited so if we really want to see serious progress in MathML support we need to give a boost to MathML developments. e-book publishers/readers, researchers/educators who are stuck outside the Web in a LaTeX-to-PDF world, developers/users of accessibility tools or the MathML community in general want good math support in browsers now and not to wait again for 15 more years until all layout engines catch up with Gecko or that the old Gecko bugs are fixed.

There are classical misunderstandings from people thinking that non-native MathML solutions and other polyfills are the future or that math on the Web could be implemented via PNG/SVG images or Web Components. Just open a math book and you will see that e.g. inline equations must be correctly aligned with the text or participate in line wrapping. Moreover we are considering math on the Web not math on paper, so we want it to be compatible with HTML, SVG, CSS, Javascript, Unicode, Bidi etc and also something that is fast and responsive. Technically, this means that a clean solution must be in the core rendering engine, spread over several parts of the code and must have strong interaction with the various components like the HTML5 parser, the layout tree, the graphic and font libraries, the DOM module, the style tree and so forth. I do not see any volunteer-driven Blink/Gecko/WebKit feature off the top of my head that has this characteristic and actually even SVG or any other kind of language for graphics have less interaction with HTML than MathML has.

The consequence of this is that it is extremely difficult for volunteers to get involved in native MathML and to do quick progress because they have to understand how the various components of the Blink/Gecko/WebKit code work and be sure to do things correctly. Good mathematical rendering is already something hard by itself, so that is even more complicated when you are not writing an isolated rendering engine for math on which you can have full control. Also, working at the Blink/Gecko/WebKit level requires technical skills above the average so finding volunteers who can work with the high-minded engineers of the big browser companies is not something easy. For instance, among the enthusiastic people coming to me and willing to help MathML in Gecko, many got stuck when e.g. they tried to build the Firefox source or do something more advanced and I never heard back from them. In the other direction, Blink/Gecko/WebKit paid developers are generally not familiar with MathML and do not have time to learn more about it and thus can not always provide a relevant review of the code, or they may break something while trying to modify code they do not entirely understand. Moreover, both the volunteers and paid staff have only a small amount of time to do MathML stuff while the other parts of the engine evolve very quickly, so it's sometimes hard to keep everything in sync. Finally, the core layout engines have strong security requirements that are difficult to satisfy in a volunteer-driven situation...

## Beyond volunteer-driven MathML developments

At that point, there are several options. First the lazy one: Give up with native math rendering, only focus on features that have impact on the widest Web audience (i.e. those that would allow browser vendors to get more market share and thus increase their profit), thank the math people for creating the Web and kindly ask them to use whatever hacks they can imagine to display equations on the Web. Of course as a Mozillian, I think people must decide the Web they want and thus exclude this option.

Next there is the ingenuous option: Expect that browser companies will understand the importance of math-on-the-Web and start investing seriously in MathML support. However, Netscape and Microsoft rejected the <MATH> tag from the 1995 HTML 3.0 draft and the browser companies have kept repeating they would only rely on volunteer contributions to move MathML forward, despite the repeated requests from MathML folks and other scientific communities. So that option is excluded too, at least in the short to medium term.

So it remains the ambitious option: Math people and other interested parties must get together and try to fund native MathML developments. Despite the effort of my manager at MathJax to convince partners and raise funds, my situation has not changed much since April and it is not clear when/if the MathJax Consortium can take the lead in native MathML developments. Given my expertise in Gecko, WebKit and MathML, I feel the duty to do something. Hence I wish to reorganize my work time: Decrease my involvement in MathJax core in order to increase my involvement in Gecko/WebKit developments. But I need the help of the community for that purpose. If you run a business with interest for math-on-the-Web and are willing to fund my work, then feel free to contact me directly by mail for further discussion. In the short term, I want to experiment with Crowd Funding as discussed in the next section. If this is successful we can think of a better organization for MathML developments in the long term.

## Crowd Funding

Wikipedia defines Crowd funding as "the collective effort of individuals who network and pool their money, usually via the Internet, to support efforts initiated by other people or organizations". There are several Crowd Funding platforms with similar rule/interface. I am considering Catincan which is specialized in Open Source Crowd Funding, can be used by any backer/developer around the world, can rely on Bugzilla to track the bug status and seems to have good process to collect the fund from backers and to pay developers. You can easily login to the Catincan Website if you have a GitHub, Facebook or Google account (apparently Persona is not supported yet...). Finally, it seems to have a communication interface between backers and developers, so that everybody can follow the development on the funded features.

One distinctive feature of catincan is that only well-established Open Source projects can be funded and only developers from these projects can propose and work on the new features ; so that backers can trust that the features will be implemented. Of course, I have been working on Gecko, WebKit and MathML projects so I hope people believe I sincerely want to improve MathML support in browsers and that I have the skills to do so ;-)

As said in my previous blog post, it is not clear at all (at least to me) whether Crowd Funding can be a reliable method, but it is worth trying. There are many individuals and small businesses showing interest in MathML, without the technical knowledge or appropriate staff to improve MathML in browsers. So if each one fund a small amount of money, perhaps we can get something.

One constraint is that each feature has 60 days to reach the funding goal. I do not have any idea about how many people are willing to contribute to MathML and how much money they can give. The statistical sample of projects currently funded is too small to extract relevant information. However, I essentially see two options: Either propose small features and split the big ones in small steps, so that each catincat submission will need less work/money and improvements will be progressive with regular feedback to backers ; or propose larger features so they look more attractive and exciting to people and will require less frequent submissions to catincat. At the beginning, I plan to start with the former and if the crowd funding is successful perhaps try the latter.

## Status in Open Source Layout Engines

Note: Obviously, Open Source Crowd Funding does not apply to Internet Explorer, which is the one main rendering engine not mentioned below. Although Microsoft has done a great job on MathML for Microsoft Word, they did not give any public statement about MathML in Internet Explorer and all the bug reports for MathML have been resolved "by design" so far. If you are interested in MathML rendering and accessibility in Internet Explorer, please check Design Science blog for the latest updates and tools.

Note: I am actually focusing on the history of Chromium here but of course there are other Blink-based browsers. Note that programs like QtWebEngine (formerly WebKit-based) or Opera (formerly Presto-based) lost the opportunity to get MathML support when they switched to Blink.

Alex Milowski and François Sausset's first MathML implementation did not pass Google's security review. Dave Barton fixed many issues in that implementation and as far as I know, there were not any known security vulnerabilities when Dave submitted his last version. MathML was enabled in Chrome 24 but Chrome developers had some concerns about the design of the MathML implementation in WebKit, which indeed violated some assumptions of WebKit layout code. So MathML was disabled in Chrome 25 and as said in the introduction, the source code was entirely removed when they forked.

Currently, the Chromium Dashboard indicates that MathML is shipped in Firefox/Safari, has positive feedback from developers and is an established standard ; but the Chromium status remains "No active development". If I understand correctly, Google's official position is that they do not plan to invest in MathML development but will accept external contributions and may re-enable MathML when it is ready (for some sense of "ready" to be defined). Given the MathML story in Chrome, it seems really unlikely that any volunteer will magically show up and be willing to submit a MathML patch. Incidentally, note the interesting work of the ChromeVox team regarding MathML accessibility: Their recent video provides a good overview of what they achieve (where Volker Sorge politely regrets that "MathML is not implemented in all browsers").

Although Google's design concerns have now been addressed in WebKit, one most serious remark from one Google engineer is that the WebKit MathML implementation is of too low quality to be shipped so they just prefer to have no MathML at all. As a consequence, the best short term strategy seems to be improving WebKit MathML support and, once it is good enough, to submit a patch to Google. The immediate corollary is that if you wish to see MathML in Chrome or other Blink-based browsers you should help WebKit MathML development. See the next section for more details.

chromatic

Actually, I tried to import MathML into Blink one day this summer. However, there were divergences between the WebKit and Blink code bases that made that a bit difficult. I do not plan to try again anytime soon, but if someone is interested, I have published my script and patch on GitHub. Note there may be even more divergences now and the patch is certainly bit-rotten. I also thought about creating/maintaining a "Chromatic" browser (Chrome + mathematics) that would be a temporary fork to let Blink users benefit from native MathML until it is integrated back in Blink. But at the moment, that would probably be too much effort for one person and I would prefer to focus on WebKit/Gecko developments for now.

### WebKit

The situation in WebKit is much better. As said before, Google's concerns are now addressed and MathML will be enabled again in all WebKit releases soon. Martin Robinson is interested in helping the MathML developments in WebKit and his knowledge of fonts will be important to improve operator stretching, which is one of the biggest issue right now. One new volunteer contributor, Gurpreet Kaur, also started to do some work on WebKit like support for the *scriptshifts attributes or for the <menclose> element. Last but not least, a couple of Apple/WebKit developers reviewed and accepted patches and even helped to fix a few bugs, which made possible to move development forward.

When he was still working on WebKit, Dave Barton opened bug 99623 to track the top priorities. When the bugs below and their related dependencies are fixed, I think the rendering in WebKit will be good enough to be usable for advanced math notations and WebKit will pass the MathML Acid 1 test.

• Bug 44208: For example, in expression like $\mathrm{sin}\left(x\right)$, the "x" should be in italic but not the "sin". This is actually slightly more complicated: It says when the default mathvariant value must be normal/italic. mathvariant is more like the text-transform CSS property in the sense that it remaps some characters to the corresponding mathematical characters (italic, bold, fraktur, double-struck...) for example $\mathfrak{A}$ (mathvariant=fraktur A) should render exactly the same as $𝔄$ (U+1D504). By the way, there is the related bug 24230 on Windows, that prevents to use plane 1 characters. The best solution will probably be to implement mathvariant correctly. See also Gecko's ongoing work by James Kitchener below.
• Bug 99618: Implement <mmultiscripts>, that allows expressions like ${}_{6}{}^{14}\mathrm{C}$ or $R_{i}{}_{j}{}_{;}{}^{j}=\frac{1}{2}S_{;}{}_{i}$. As said in the introduction, this is fixed in WebKit Nightly.
• Bug 99614: Support for stretchy operators like in ${\left(\frac{\overline{{z}_{1}+{z}_{2}}}{3}\right)}^{4}$. Currently, WebKit can only stretch operators vertically using a few Unicode constructions like ⎛ (U+239B) + ⎜ (U+239C) + ⎝ (U+239D) for the left parenthesis. Essentially only similar delimiters like brackets, braces etc are supported. For small sizes like $\left(\text{}$ or for large operators like $\sum {n}^{2}$ it is necessary to use non-unicode glyphs in various math fonts, but this is not possible in WebKit MathML yet. All of this will require a fair amount of work: implementing horizontal stretching, font-specific stuff, largeop/symmetric properties etc
• Bug 99620: Implement the operator dictionary. Currently, WebKit treats all the operators the same way, so for example it will use the same 0.2em spacing before and after parenthesis, equal sign or invisible operators in e.g. $f\left(x\right)={x}^{2}$. Instead it should use the information provided by the MathML operator dictionary. This dictionary also specifies whether operators are stretchy, symmetric or largeop and thus is related to the previous point.
• Bug 119038: Use the code for vertical stretchy operators to draw the radical symbols in roots like $\sqrt{\frac{2}{3}}$. Currently, WebKit uses graphic primitives which do not give a really good rendering.
• Bug 115610: Implement <mspace> which is used by many MathML generators to do some spacing in mathematical formulas. As said in the introduction, this is fixed in WebKit Nightly.

In order to pass the Mozilla MathML torture tests, at least displaystyle and scriptlevel must be implemented too, probably as internal CSS properties. This should also allow to pass Joe Java's MathML test, although that one relies on the infamous <mfenced> that duplicates the stretchy operator features and is implemented inconsistently in rendering engines. I think passing the MathML Acid 2 test will require slightly more effort, but I expect this goal to be achievable if I have more time to work on WebKit:

• Bug 115610: Implement <mspace>. Fixed!
• Bug 120164: Implement negative spacing for <mspace> (I have an experimental patch).
• Bug 85730: Implement <mpadded>, which is also used by MathML generators to do some tweaking of formulas. I have only done some experiments, that would be a generalization of <mspace>
• Bug 85733: Implement the href attribute ; well I guess the name is explicit enough to understand what it is used for! I only have some experimental patch here too. That would be mimicing what is done in SVG or HTML.
• Bug 120059 and bug 100626: Implement <maction> (at least partially) and <semantics>, which have been asked by long-time MathML users Jacques Distler and Michael Kohlhase. I have patches ready for that and this could be fixed relatively soon, I just need to find time to finish the work.

In general passing the MathML Acid 2 test is not too hard, you merely only need to implement those few MathML elements whose exact rendering is clearly defined by the MathML specification. Passing the MathML Acid 3 test is not expected in the medium term. However, the score will naturally increase while we improve WebKit MathML implementation. The priority is to implement what is currently known to be important to users. To give examples of bugs not previously mentioned: Implementing menclose or fixing various DOM issues like bugs 57695, 57696 or 107392.

More advanced features like those mentioned in the next section for Gecko are probably worth considering later (Open type MATH, linebreaking, mlabeledtr...). It is worth noting that Apple has already done some work on accessibility (with MathML being readable by VoiceOver in iOS7), authoring and EPUB (MathML is enabled in WebKit-based ebook readers and ibooks-author has an integrated LaTeX-to-MathML converter).

### Gecko

In general I think I have a good relationship with the Mozilla community and most people have respect for the work that has been done by volunteers for almost 15 years now. The situation has greatly improved since I joined the project, at that time some people claimed the Mozilla MathML project was dead after Roger Sidge's departure. One important point is that Karl Tomlinson has worked on repairing the MathML support when Roger Sidge left the project. Hence there is at least one Mozilla employee with good knowledge of MathML who can review the volunteer patches. Another key ingredient is the work that has recently been made by Mozilla to increase engagement of the volunteer community like good documentation on MDN, the #Introduction channel, Josh Matthews' mentored bugs and of course programs like GSOC. However, as said above, it is one thing to attract enthusiastic contributors and another thing to get long-term contributors who can work independently on more advanced features. So let's go back to my latest Roadmap for the Mozilla MathML Project and see what has been accomplished for one year:

• Font support: Dmitry Shachnev created a Debian package for the MathJax fonts and Mike Hommey added MathJax and Asana fonts in the list of suggested packages for Iceweasel. The STIX fonts have also been updated in Fedora and are installed by default on Mac OS X Lion (10.7). For Linux distributions, it would be helpful to implement Auto Installation Support. The bug to add mathematical fonts to Android has been assigned in June but no more progress has happened so far. Henri Sinoven opened a bug for FirefoxOS but there has not been any progress there either. I had some patches to restore the "missing MathML fonts" warning (using an information bar) but it was refused by Firefox reviewers. However, the code to detect missing MathML font could still be used for the similar bug 648548, which also seems inactive since January. There are still some issues on the MathJax side that prevent to integrate Web fonts for the native MathML output mode. So at the moment the solution is still to inform visitors about MathML fonts or to add MathML Web fonts to your Web site. Khaled Hosny (font and LaTeX expert) recently updated my patches to prepare the support for Open Type fonts and he offered to help on that feature. After James Kitchener's work on mathvariant, we realized that we will probably need to provide Arabic mathematical fonts too.
• Spacing: Xuan Hu continued to work on <mpadded> improvements and I think his patch is close to be accepted. Quentin Headen has done some progress on <mtable> before focusing on his InstantBird GSOC project. He is still far from being able to work on mtable@rowspacing/columnspacing but a work around for that has been added to MathJax. I fixed the negative space regression which was missing to pass the MathML Acid 2 test and is used in MathJax. Again, Khaled Hosny is willing to help to use the spacing of the Open Type MATH, but that will still be a lot of work.
• <mlabeledtr>: A work around for native MathML has been added in MathJax.
• Linebreaking: No progress except that I have worked on fixing a bug with intrinsic width computation. The unrelated printing issues mentioned in the blog post have been fixed, though.
• Operator Stretching: No progress. I tried to analyze the regression more carefully, but nothing is ready yet.
• Tabular elements: As said above, Quentin Headen has worked a bit on cleaning up <mtable> but not much improvements on that feature so far.
• Token elements: My patch for <ms> landed and I have done significant progress on the bad measurement of intrinsic width for token elements (however, the fix only seems to work on Linux right now). James Kitchener has taken over my work on improving our mathvariant support and doing related refactoring of the code. I am confident that he will be able to have something ready soon. The primes in exponents should render correctly with MathJax fonts but for other math fonts we will have to do some glyph substitutions.
• Dynamic MathML: No progress here but there are not so many bugs regarding Javascript+MathML, so that should not be too serious.
• Documentation: It is now possible to use MathML in code sample or directly in the source code. The MathML project pages have been entirely migrated to MDN. Also, Florian Scholz has recently been hired by Mozilla as a documentation writer (congrats!) and will in particular continue the work he started as a volunteer to document MathML on MDN.

I apologize to volunteers who worked on bugs that are not mentioned above or who are doing documentation or testing that do not appear here. For a complete list of activity since September 2012, Bugzilla is your friend. There are two ways to consider the progress above. If you see the glass half full, then you see that several people have continued the work on various MathML issues, they have made some progress and we now pass the MathML Acid 2 test. If you see the glass half empty, then you see that most issues have not been addressed yet and in particular those that are blocking the native MathML to be enabled in MathJax: bug 687807, bug 415413, the math font issues discussed in the first point, and perhaps linebreaking too. That is why I believe we should go beyond volunteer-driven MathML developments.

Most of the bugs mentioned above are tested by the MathML Acid 3 tests and we will win a few points when they are fixed. Again, passing MathML Acid 3 test is not a goal by itself so let's consider what are the big remaining areas it contains:

• Improving Tabular Elements and Operator Stretching, which are obviously important and used a lot in e.g. MathJax.
• Linebreaking, which as I said is likely to become fundamental with small screens and ebooks.
• Elementary Mathematics (you know addition, subtraction, multiplication, and division that kids learn), which I suspect will be important for educational tools and ebooks.
• Alignment: This is the one part of MathML that I am not entirely sure is relevant to work on in the short term. I understand it is useful for advanced layout but most MathML tools currently just rely on tables to do that job and as far as I know the only important engine that implements that is MathPlayer.

Finally there are other features outside the MathML rendering engines that I also find important but for which I have less expertise:

• Transferring MathML that is implementing copy/cut/drag and paste. Currently, we can do that by treating MathML as normal HTML5 code or by using the "show MathML source" feature and copying the source code. However, it would be best to implement a standard way to communicate with other MathML applications like Microsoft Word, Mathematica, Mapple, Windows' Handwriting panel etc I wrote some work-in-progress patches last year.
• Authoring MathML: Essentially implementing things like deletion, insertion etc maybe simple MathML token creation ; in Gecko's core editor, which is used by BlueGriffon, KompoZer, SeaMonkey, Thunderbird or even MDN. Other things like integrating Javascript parsers (e.g. ASCIIMath) or equation panels with buttons like are probably better done at the higher JS/HTML/XUL level. Daniel Glazman already wrote math input panels for BlueGriffon and Thunderbird.
• MathML Accessibility: This is one important application of MathML for which there is strong demand and where Mozilla is behind the competitors. James Teh started some experimental work on his NVDA tool before the summit.
• EPUB reader for FirefoxOS (and other mobile platforms): During the "Co-creating Action Plans" session, the Mozilla Taipei people were thinking about missing features for FirefoxOS and this idea about EPUB reader was my modest contribution. There are a few EPUB readers relying on Gecko and it would be good to check if they work in FirefoxOS and if they could be integrated by default, just like Apple has iBooks. BTW, there is a version of BlueGriffon that can edit EPUB books.

## Conclusion

I hope I have convinced some of the readers about the need to fund MathMLin browsers. There is a lot of MathML work to do on Gecko and WebKit but both projects have volunteers and core engineers who are willing to help. There are also several individuals / companies relying on MathML support in rendering engines for their projects and could support the MathML developments in some way. I am willing to put more of my time on Gecko and WebKit developments, but I need financial help for that purpose. I'm proposing catincan Crowd Funding in the short term so that anyone can contribute at the appropriate level, but other alternatives to fund the MathML development can be found like asking Peter Krautzberger about native MathML funding in MathJax, discussing with Igalia about funding Martin Robinson to work more on WebKit MathML or contacting me directly to establish some kind of part-time consulting agreement.

Please leave a comment on this blog or send me a private mail, if you agree that funding MathML in browsers is important, if you like the crowd funding idea and plan to contribute ; or if you have any opinions about alternative funding options. Also, please tell me what seem to be the priority for you and your projects among what I have mentioned above (layout engines, features etc) or among others that I may have forgotten. Of course, any other constructive comment to help MathML support in browsers is welcome. I plan to submit features on catincan soon, once I have more feedback on what people are interested in. Thank you!

## October 07, 2013

### Post-Summit Thoughts on the MathML Project

#### Frédéric Wang

I'm back from a great Mozilla Summit 2013 and I'd just like to write a quick blog post about the MathML booths at the Innovation Fairs. I did not have the opportunity to talk with the MathML people who ran the booth at Santa Clara yet. However, everything went pretty well at Brussels, modulo of course some demos failing when done in live... If you are interested, the slides and other resources are available on my GitHub page.

Many Mozillians did not know about MathML or that it had been available in Gecko since the early days of the Mozilla project. Many people who use math (or just knowing someone who does) were curious about that feature and excited about the MathML potentials. I appreciated to get this positive feedback from Mozillians willing to use math on the Web and related media, instead of the scorn or hatred I sometimes see by misinformed people. I expect to provide more updates on LaTeXML, MediaWiki Math and MathJax when their next versions are released. The Gecko MathML support improves slowly but there has been interesting work by James Kitchener recently that I'd like to mention too.

Let's do an estimation à la Fermi: only a few volunteers have been contributing regularly and simultaneously to MathML in Gecko while most Mozilla-funded Gecko projects have certainly development teams that are 3 times as large. Let's be optimistic and assume that these volunteers have been able to dedicate a mean of 1 work day per week, compared to 5 for full-time staff. Given that the Mozilla MathML project will celebrate its 15 years next May, that means that the volunteer work transposed in terms of paid-staff time is only $\le \frac{15}{3\cdot 5}=1$ year. To be honest, I'm disregarding here the great work made by the Mozilla NZ team around 2007 to repair MathML after the Cairo migration. But still, what we have achieved in quality and completeness with such limited resources and time is really impressive.

As someone told me at the MathML booth, it's really frustating that something that is so important for the small portion of math-educated people is ignored because it is useless for the vast majority of people. This is not entirely true, since even elementary mathematics taught at school like the one of this blog post are not easily expressed with standard HTML and even less in a way accessible to people with visual disabilities. However, it summarizes well the feeling MathML folks had when they tried to convince Google to accept the volunteer work on MathML, despite its low quality.

As explained at the Summit Sessions, Mozilla's mission is different and the goal is to give people the right to control the Web they want. The MathML project is perhaps one of the oldest and successful volunteer-driven Mozilla project that is still active and demonstrates concretely the idea of the Mozilla's mission with e.g. the work of Roger Sidge who started to write the MathML implementation when Netscape opened its source code or the one of Florian Scholz who made MDN one of the most complete Web resource for MathML.

Mozilla Corporation has kept saying they don't want to invest in MathML developments and the focus right now is clearly on other features like FirefoxOS. Even projects that have a larger audience than the MathML support like the mail client or the editor are not in the priorities so someone else definitely need to step in for MathML. I've tried various methods, with more or less success, to boost the MathML developments like mentoring a GSoC project, funding a summer internship or relying on mentored bugs. I'm now considering crowd funding to help the MathML developments in Gecko (and WebKit). I don't want to do another Fermi estimation now but at first that looks like a very unreliable method. The only revenue generated by the MathML project so far are the $2\frac{⌊100\cdot \pi ⌋}{100}=2\cdot 3.14=6.28$ dollars to the Mozilla Fundation via contributions to my MathML-fonts add-on, so it's hard to get an idea of how much people would contribute to the Gecko implementaton. However, that makes sense since the only people who showed interest in native MathML support so far are individuals or small businesses (e.g. working on EPUB or accessibility) and I think it's worth trying it anyway. That's definitely something I'll consider after MathJax 2.3 is released...

## August 27, 2013

### HTML Alchemy – Combining CSS Shapes with CSS Regions

Note: Support for shape-inside is only available until the following nightly builds: WebKit r166290 (2014-03-26); Chromium 260092 (2014-03-28).

I have been working on rendering for almost a year now. Since I landed the initial implementation of Shapes on Regions in both Blink and WebKit, I’m incredibly excited to talk a little bit about these features and how you can combine them together.

Don’t know what CSS Regions and Shapes are? Start here!

The first ingredient in my HTML alchemy kitchen is CSS Regions. With CSS Regions, you can flow content into multiple styled containers, which gives you enormous creative power to make magazine style layouts. The second ingredient is CSS Shapes, which gives you the ability to wrap content inside or outside any shape. In this post I’ll talk about the “shape-inside” CSS property, which allows us to wrap content inside an arbitrary shape.

Let’s grab a bowl and mix these two features together, CSS Regions and CSS Shapes to produce some really interesting layouts!

In the latest Chrome Canary and Safari WebKit Nightly, after enabling the required experimental features, you can flow content continuously through multiple kinds of shapes. This rocks! You can step out from the rectangular text flow world and break up text into multiple, non-rectangular shapes.

## Demo

If you already have the latest Chrome Canary/Safari WebKit Nightly, you can just go ahead and try a simple example on codepen.io. If you are too lazy, or if you want to extend your mouse button life by saving a few button clicks, you can continue reading.

In the picture above we see that the “Lorem ipsum” story flows through 4 different, colorful regions. There is a circle shape on each of the first two fixed size regions. Check out the code below to see how we apply the shape to the region. It’s pretty straightforward, right?
#region1, #region2 {
-webkit-flow-from: flow;
background-color: yellow;
width: 200px;
height: 200px;
-webkit-shape-inside: circle(50%, 50%, 50%);
}
The content flows into the third (percentage sized) region, which represents a heart (drawn by me, all rights reserved). I defined the heart’s coordinates in percentages, so the heart will stretch as you resize the window.
#region3 {
-webkit-flow-from: flow;
width: 50%;
height: 400px;
background-color: #EE99bb;
-webkit-shape-inside: polygon(11.17% 10.25%,2.50% 30.56%,3.92% 55.34%,12.33% 68.87%,26.67% 82.62%,49.33% 101.25%,73.50% 76.82%,85.17% 65.63%,91.63% 55.51%,97.10% 31.32%,85.79% 10.21%,72.47% 5.35%,55.53% 14.12%,48.58% 27.88%,41.79% 13.72%,27.50% 5.57%);
}

The content that doesn’t fit in the first three regions flows into the fourth region. The fourth region (see the retro-blue background color) has its CSS width and height set to auto, so it grows to fit the remaining content.

## Real world examples

After trying the demo and checking out the links above, I’m sure you’ll see the opportunities for using shape-inside with regions in your next design. If you have some thoughts on this topic, don’t hesitate to comment. Please keep in mind that these features are under development, and you might run into bugs. If you do, you should report them on WebKit’s Bugzilla for Safari or Chromium’s issue tracker for Chrome. Thanks for reading!

## August 06, 2013

### WebGL, at last!

#### Brent Fulgham

It's been a long time since I've written an update -- but my lack of blog posting is not an indication of a lack of progress in WebKit or the WinCairo port. Since I left my former employer (who *still* hasn't gotten around to updating the build machine I set up there), we've:

• Migrated from Visual Studio 2005 to Visual Studio 2010 (and soon, VS2012)
• Enabled New-run-webkit-tests
• Updated the WinCairo Support Libraries to support 64-bit builds
• Integrated a ton of cURL improvements and extensions thanks to the TideSDK guys
• and ...
... thanks to the hard work of Alex Christensen, brought up WebGL on the WinCairo port.  This is a little exciting for me, because it marks the first time (I can recall) where the WinCairo port actually gained a feature that was not already part of the core Apple Windows port.

The changes needed to see these circa-1992 graphics in all their three-dimensional glory are already landed in the WebKit tree.  You just need to:

1. Enable the libEGL, libGLESv2, translator_common, translator_glsl, and translator_hlsl for the WinCairo build (they are currently turned off).
2. Make the following change to WTF/wtf/FeatureDefines.h:

Brent Fulgham@WIN7-VM ~/WebKit/Source/WTF/wtf

## 2. Directory setup

It is suggested (and actually required by some build scripts) to have a base directory which holds Qt5, Qt Components and WebKit project sources. The suggested base directory can be created by running:

### 3.2. rsync-scripts

$wget http://trac.webkit.org/attachment/wiki/SettingUpDevelopmentEnvironmentForN9/rsync-scripts.tar.gz?format=raw$ tar xzf rsync-scripts.tar.gz

$git clone git://gitorious.org/qtwebkit/testfonts.git ### 4.2. Qt5, QtComponents and WebKit The script below when successfully run will create ~/swork/qt5, ~/swork/qtcomponents and ~/swork/webkit directories: $ browser-scripts/clone-sources.sh --no-ssh

NOTE: You can also manually download sources, but remember to stick with the directory names described above.

# 5. Pre-build hacks

### 5.1. Qt5 translations

Qt5 translations are not being properly handled by cross-platform toolchain. This happens mainly because lrelease application is called to generate Qt message files, but since it is an ARMEL binary your system is probably not capable of running it natively (unless you have a misc_runner kernel module properly set, then you can safely skip this step). In this case, you can use lrelease from your system’s Qt binaries without any worries.

If you have a Scratchbox environment set, it is suggested for you to stop its service first:

$sudo service scratchbox-core stop Now you can manually generate Qt message files by running this: $ cd ~/swork/qt5/qttranslations/translations
$for file in ls *ts; do lrelease$file -qm echo "$file" | sed 's/ts$/qm/'; done

### 5.2. Disable jsondb-client tool

QtJsonDB module from Qt5 contains a tool called jsondb-client, which depends on libedit (not available on MADDE target). It is safe to disable its compilation for now:

$ln -s ~/swork/qt5/qtbase/mkspecs ~/QtSDK/Madde/sysroots/harmattan_sysroot_10.2011.34-1_slim/home/<USER>/swork/qt5/mkspecs ## 6. Build sources You can execute the script that will build all sources using cross-compilation setup: $ browser-scripts/build-sources.sh --cross-compile

If everything went well, you now have the most up-to-date binaries for Qt5/WebKit2 development for Nokia N9. Please have a look at WebKit’s wiki for more information about how to update sources after a previous build and information on how to keep files in sync with device. The guide assumes PR1.1 firmware for N9 device, which is already outdated, so I might come up next with updated instructions on how to safely sync files to your PR1.2-enabled device.

## March 10, 2012

### WebKitGTK+ Debian packaging repository changes

#### Gustavo Noronha

For a while now the git repository used for packaging WebKitGTK+ has been broken. Broken as in nobody was able to clone it. In addition to that, the packaging workflow had been changing over time, from a track-upstream-git/patches applied one to a import-orig-only/patches-not-applied one.

After spending some more time trying to unbreak the repository for the third time I decided it might be a good time for a clean up. I created a new repository, imported all upstream versions for series 1.2.x (which is in squeeze), 1.6.x (unstable), and 1.7.x (experimental). I also imported packaging-related commis for those versions using git format-patch and black magic.

One of the good things about doing this move, and which should make hacking the WebKitGTK+ debian package more pleasant and accessible can be seen here:

 kov@goiaba ~/s/debian-webkit> du -sh webkit/.git webkit.old/.git 27M webkit/.git 1.6G webkit.old/.git 

If you care about the old repository, it’s on git.debian.org still, named old-webkit.git. Enjoy!

## December 07, 2011

### WebKitGTK+ hackfest \o/

#### Gustavo Noronha

It’s been a couple days since I returned from this year’s WebKitGTK+ hackfest in A Coruña, Spain. The weather was very nice, not too cold and not too rainy, we had great food, great drinks and I got to meet new people, and hang out with old friends, which is always great!

Hackfest black board, photo by Mario

I think this was a very productive hackfest, and as usual a very well organized one! Thanks to the GNOME Foundation for the travel sponsorship, to our friends at Igalia for doing an awesome job at making it happen, and to Collabora for sponsoring it and granting me the time to go there! We got a lot done, and although, as usual, our goals list had many items not crossed, we did cross a few very important ones. I took part in discussions about the new WebKit2 APIs, got to know the new design for GNOME’s Web application, which looks great, discussed about Accelerated Compositing along with Joone, Alex, Nayan and Martin Robinson, hacked libsoup a bit to port the multipart/x-mixed-replace patch I wrote to the awesome gio-based infrastructure Dan Winship is building, and some random misc.

The biggest chunk of time, though, ended up being devoted to a very uninteresting (to outsiders, at least), but very important task: making it possible to more easily reproduce our test results. TL;DR? We made our bots’ and development builds use jhbuild to automatically install dependencies; if you’re using tarballs, don’t worry, your usual autogen/configure/make/make install have not been touched. Now to the more verbose version!

The need

Our three build slaves reporting a few failures

For a couple years now we have supported an increasingly complex and very demanding automated testing infrastructure. We have three buildbot slaves, one provided by Collabora (which I maintain), and two provided by Igalia (maintained by their WebKitGTK+ folks). Those bots build as many check ins as possible with 3 different configurations: 32 bits release, 64 bits release, and 64 bits debug.

In addition to those, we have another bot called the EWS, or Early Warning System. There are two of those at this moment: one VM provided by Collabora and my desktop, provided by myself. These bots build every patch uploaded to the bugzilla, and report build failures or passes (you can see the green bubbles). They are very important to our development process because if the patch causes a build failure for our port people can often know that before landing, and try fixes by uploading them to bugzilla instead of doing additional commits. And people are usually very receptive to waiting for EWS output and acting on it, except when they take way too long. You can have an idea of what the life of an EWS bot looks like by looking at the recent status for the WebKitGTK+ bots.

Maintaining all of those bots is at times a rather daunting task. The tests require a very specific set of packages, fonts, themes and icons to always report the same size for objects in a render. Upgrades, for instance, had to be synchronized, and usually involve generating new baselines for a large number of tests. You can see in these instructions, for instance, how strict the environment requirements are – yes, we need specific versions of fonts, because they often cause layouts to change in size! At one point we had tests fail after a compiler upgrade, which made rounding act a bit different!

So stability was a very important aspect of maintaining these bots. All of them have the same version of Debian, and most of the packages are pinned to the same version. On the other hand, and in direct contradition to the stability requirement, we often require bleeding edge versions of some libraries we rely on, such as libsoup. Since we started pushing WebKitGTK+ to be libsoup-only, its own progress has been pretty much driven by WebKitGTK+’s requirements, and Dan Winship has made it possible to make our soup backend much, much simpler and way more featureful. That meant, though, requiring very recent versions of soup.

To top it off, for anyone not running Debian testing and tracking the exact same versions of packages as the bots it was virtually impossible to get the tests to pass, which made it very difficult for even ourselves to make sure all patches were still passing before committing something. Wow, what a mess.

The explosion^Wsolution

So a few weeks back Martin Robinson came up with a proposed solution, which, as he says, is the “nuclear bomb” solution. We would have a jhbuild environment which would build and install all of the dependencies necessary for reproducing the test expectations the bots have. So over the first three days of the hackfest Martin and myself hacked away in building scripts, buildmaster integration, a jhbuild configuration, a jhbuild modules file, setting up tarballs, and wiring it all in a way that makes it convenient for the contributors to get along with. You’ll notice that our buildslaves now have a step just before compiling called “updated gtk dependencies” (gtk is the name we use for our port in the context of WebKit), which runs jhbuild to install any new dependencies or version bumps we added. You can also see that those instructions I mentioned above became a tad simpler.

It took us way more time than we thought for the dust to settle, but it eventually began to. The great thing of doing it during the hackfest was that we could find and fix issues with weird configurations on the spot! Oh, you build with AR_FLAGS=cruT and something doesn’t like it? OK, we fix it so that the jhbuild modules are not affected by that variable. Oh, turns out we missed a dependency, no problem, we add it to the modules file or install them on the bots, and then document the dependency. I set up a very clean chroot which we could use for trying out changes so as to not disrupt the tree too much for the other hackfest participants, and I think overall we did good.

The aftermath

By the time we were done our colleagues who ran other distributions such as Fedora were already being able to get a substantial improvements to the number of tests passing, and so did we! Also, the ability to seamlessly upgrade all the bots with a simple commit made it possible for us to very easily land a change that required a very recent (as in unreleased) version of soup which made our networking backend way simpler. All that red looks great, doesn’t it? And we aren’t done yet, we’ll certainly be making more tweaks to this infrastructure to make it more transparent and more helpful to the users (contributors and other people interested in running the tests).

If you’ve been hit by the instability we caused, sorry about that, poke mrobinson or myself in the #webkitgtk+ IRC channel on FreeNode, and we’ll help you out or fix any issues. If you haven’t, we hope you enjoy all the goodness that a reproducible testing suite has to offer! That’s it for now, folks, I’ll have more to report on follow-up work started at the hackfest soon enough, hopefully =).

## November 29, 2011

### Accelerated Compositing in webkit-clutter

#### Gustavo Noronha

For a while now my fellow Collaboran Joone Hur has been working on implementing the Accelerated Compositing infrastructure available in WebKit in webkit-clutter, so that we can use Clutter’s powers for compositing separate layers and perform animations. This work is being done by Collabora and is sponsored by BOSCH, whom I’d like to thank! What does all this mean, you ask? Let me tell me a bit about it.

The way animations usually work in WebKit is by repainting parts of the page every few milliseconds. What that means in technical terms is that an area of the page gets invalidated, and since the whole page is one big image, all of the pieces that are in that part of the page have to be repainted: the background, any divs, images, text that are at that part of the page.

What the accelerated compositing code paths allow is the creation of separate pieces to represent some of the layers, allowing the composition to happen on the GPU, removing the need to perform lots of cairo paint operations per second in many cases. So if we have a semi-transparent video moving around the page, we can have that video be a separate texture that is layered on top of the page, made transparent and animated by the GPU. In webkit-clutter’s case this is done by having separate actors for each of the layers.

I have been looking at this code on and off, and recently joined Joone in the implementation of some of the pieces. The accelerated compositing infrastructure was originally built by Apple and is, for that reason, works in a way that is very similar to Core Animation. The code is still a bit over the place as we work on figuring out how to best translate the concepts into clutter concepts and there are several bugs, but some cool demos are already possible! Bellow you have one of the CSS3 demos that were made by Apple to demo this new functionality running on our MxLauncher test browser.

You can also see that the non-Accelerated version is unable to represent the 3D space correctly. Also, can you guess which of the two MxLauncher instances is spending less CPU? In this second video I show the debug borders being painted around the actors that were created to represent layers.

The code, should you like to peek or test is available in the ac2 branch of our webkit-clutter repository: http://gitorious.org/webkit-clutter/webkit-clutter/commits/ac2

We still have plenty of work to do, so expect to hear more about it. During our annual hackfest in A Coruña we plan to discuss how this work could be integrated also in the WebKitGTK+ port, perhaps by taking advantage of clutter-gtk, which would benefit both ports, by sharing code and maintenance, and providing this great functionality to Epiphany users. Stay tuned!

## October 09, 2011

### Tests Active

#### Brent Fulgham

Looking back over this blog, I see that it was around a year ago that I got the initial WinCairo buildbot running. I'm very pleased to announce that I have gotten ahold of a much more powerful machine, and am now able to run a full build and tests in slightly under an hour -- a huge improvement over the old hardware which took over two hours just to build the software!

This is a big step, because we can now track regressions and gauge correctness compared to the other platforms. Up to now, testing has largely consisted of periodic manual runs of the test suite, and a separate set of high-level tests run as part of a larger application. This was not ideal, because it was easy for low-level functions in WebKit that I rarely use to be broken and missed.

All is not perfect, of course. Although over 12,000 tests now run (successfully) with each build, that is effectively two thirds of the full test suite. Most of the tests I have disabled are due to small differences in the output layout. I'm trying to understand why these differences exist, but I suspect many of them simply reflect small differences in Cairo compared to the CoreGraphics rendering layer.

If any of you lurkers are interested in helping out, trying out some of the tests I have disabled and figuring out why they fail would be a huge help!

## July 14, 2011

### An Unseasonable Snowfall

#### Brent Fulgham

A year or two ago I ported the Cocoa "CallJS" application to MFC for use with WebKit. The only feedback I ever got on the topic was a complaint that it would not build under the Visual Studio Express software many people used.

After seeing another few requests on the webkit-help mailing list for information on calling JavaScript from C++ (and vice-versa), I decided to dust off the old program and convert it to pure WINAPI calls so that VS Express would work with it.

Since my beloved Layered Window patches finally landed in WebKit, I also incorporated a transparent WebKit view floating over the main application window. Because I suck at art, I stole appropriated the Let It Snow animation example to give the transparent layer something to do.

Want to see what it looks like?

## July 10, 2011

### Updated WebKit SDK (@r89864)

#### Brent Fulgham

I have updated the WebKitSDK to correspond to SVN revision r8984.

Major changes in this revision:
* JavaScript engine improvements.
* Rendering improvements.
* New 'Transparent Web View' support.
* General performance and memory use improvements.

This ZIP file also contains updated versions of Zlib, OpenSSL, cURL, and OpenCFLite.

Note that I have stopped statically linking Cairo; I'm starting to integrate some more recent Cairo updates (working towards some new rendering features), and wanted to be able to update it incrementally as changes are made.

This package contains the same Cairo library (in DLL form) as used in previous versions.

As usual, please let me know if you encounter any problems with this build.

[Update] I forgot to include zlib1.dll! Fixed in the revised zip file.

## July 05, 2011

### WinCairoRequirements Sources Archive

#### Brent Fulgham

I've posted the 80 MB source archive of the requirements needed to build the WinCairo port of WebKit.

Note that you do NOT need these sources unless you plan on building them yourself or wish to archive the source code for these modules. The binaries are always present in the WinCairoRequirements.zip file, which is downloaded and unzipped to the proper place when you execute the update-webkit --wincairo command.

## June 28, 2011

### Towards a Simpler WinCairo Build

#### Brent Fulgham

For the past couple of years, anyone interested in trying to build the WinCairo port of WebKit had to track down a number of support libraries, place them in their development environment's include (and link search) paths, and then cross their fingers and hope everything built.

To make things a little easier, I wrapped up the libraries and headers I use for building and posted them as a zip file on my .Mac account. This made things a little easier, but you still had to figure out where to drop the files and figure out if I had secretly updated my 'requirements.zip' file without telling anyone. Not ideal.

A couple of days ago, while trolling through the open review queue, I ran across a Bug filed by Carl Lobo, which automated the task of downloading the requirements file when running build-webkit --wincairo. This was a huge improvement!

Today, I hijacked Carl's changes and railroaded the patch through the review process (making a few modifications along the way):

• I renamed my requirements file WinCairoRequirements.zip.

• I added a timestamp file, so that build-webkit --wincairo can check to see if the file changed, and download it if necessary.

• I propagated Carl's changes to update-webkit, so that now by adding the --wincairo argument it will update the WinCairoRequirements file.

I'm really excited about this update. If you've been wanting to try out the WinCairo port of WebKit, this would be a great time to try it out. I'd love to hear your experiences!

## June 14, 2011

### Benchmarking Javascript engines for EFL

#### Lucas De Marchi

The Enlightenment Foundation Libraries has several bindings for other languages in order to ease the creation of end-user applications, speeding up its development. Among them, there’s a binding for Javascript using the Spidermonkey engine. The questions are: is it fast enough? Does it slowdown your application? Is Spidermonkey the best JS engine to be used?

To answer these questions Gustavo Barbieri created some C, JS and Python benchmarks to compare the performance of EFL using each of these languages. The JS benchmarks were using Spidermonkey as the engine since elixir was already done for EFL. I then created new engines (with only the necessary functions) to also compare to other well-known JS engines: V8 from Google and JSC (or nitro) from WebKit.

## Libraries setup

For all benchmarks EFL revision 58186 was used. Following the setup of each engine:

• Spidermonkey: I’ve used version 1.8.1-rc1 with the already available bindings on EFL repository, elixir;
• V8: version ﻿3.2.5.1, using a simple binding I created for EFL. I named this binding ev8;
• JSC: ﻿WebKit’s sources are needed to compile JSC. I’ve used revision 83063. Compiling with CMake, I chose the EFL port and enabled the option SHARED_CORE in order to have a separated library for Javascript;

## Benchmarks

Startup time: This benchmark measures the startup time by executing a simple application that imports evas, ecore, ecore-evas and edje, bring in some symbols and then iterates the main loop once before exiting. I measured the startup time for both hot and cold cache cases. In the former the application is executed several times in sequence and the latter includes a call to drop all caches so we have to load the library again from disk

Runtime – Stress: This benchmark executes as many frames per second as possible of a render-intensive operation. The application is not so heavy, but it does some loops, math and interacts with EFL. Usually a common application would do far less operations every frame because many operations are done in EFL itself, in C, such as list scrolling that is done entirely in elm_genlist. This benchmark is made of 4 phases:

• ﻿Phase 0 (P0): Un-scaled blend of the same image 16 times;
• Phase 1 (P1): Same as P0, with additional 50% alpha;
• Phase 2 (P2): Same as P0, with additional red coloring;
• Phase 3 (P3): Same as P0, with additional 50% alpha and red coloring;

The C and Elixir’s versions are available at EFL repository.

Runtime – animation: usually an application doesn’t need “as many FPS as possible”, but instead it would like to limit to a certain amount of frames per second. E.g.: iphone’s browser tries to keep a constant of 60 FPS. This is the value I used on this benchmark. The same application as the previous benchmark is executed, but it tries to keep always the same frame-rate.

## Results

The first computer I used to test these benchmarks on was my laptop. It’s a Dell Vostro 1320, Intel Core 2 Duo with 4 GB of RAM and a standard 5400 RPM disk. The results are below.

Benchmarks on Dell 1320 laptop

First thing to notice is there are no results for “Runtime – animation” benchmark. This is because all the engines kept a constant of 60fps and hence there were no interesting results to show. The first benchmark shows that V8′s startup time is the shortest one when considering we have to load the application and libraries from disk. JSC was the slowest and  Spidermonkey was in between.

With hot caches, however, we have another complete different scenario, with JSC being almost as fast as the native C application. Following, V8 with a delay a bit larger and Spidermonkey as the slowest one.

The runtime-stress benchmark shows that all the engines are performing well when there’s some considerable load in the application, i.e. removing P0 from from this scenario. JSC was always at the same speed of native code; Spidermonkey and V8 had an impact only when considering P0 alone.

Next computer to consider in order to execute these benchmarks was  a Pandaboard, so we can see how well the engines are performing in an embedded platform. Pandaboard has an ARM Cortex-A9 processor with 1GB of RAM and the partition containing the benchmarks is in an external flash storage drive. Following the results for each benchmark:

Benchmarks on Pandaboard

Once again, runtime-animation is not shown since it had the same results for all engines. For the startup tests, now Spidermonkey was much faster than the others, followed by V8 and JSC in both hot and cold caches. In runtime-stress benchmark, all the engines performed well, as in the first computer, but now JSC was the clear winner.

There are several points to be considered when choosing an engine to be use as a binding for a library such as EFL. The raw performance and startup time seems to be very near to the ones achieved with native code. Recently there were some discussions in EFL mailing list regarding which engine to choose, so I think it would be good to share these numbers above. It’s also important to notice that these bindings have a similar approach of elixir, mapping each function call in Javascript to the correspondent native function. I made this to be fair in the comparison among them, but depending on the use-case it’d  be good to have a JS binding similar to what python’s did, embedding the function call in real python objects.

## April 29, 2011

### Collection of WebKit ports

#### Holger Freyther

WebKit is a very successfull project. It is that in many ways. The code produced seems to very fast, the code is nice to work on, the people are great, the partys involved collaborate with each other in the interest of the project. The project is also very successfull in the mobile/smartphone space. All the major smartphone platforms but Windows7 are using WebKit. This all looks great, a big success but there is one thing that stands out.

From all the smartphone platforms no one has fully upstreamed their port. There might be many reasons for that and I think the most commonly heard reason is the time needed to get it upstreamed. It is specially difficult in a field that is moving as fast as the mobile industry. And then again there is absolutely no legal obligation to work upstream.

For most of today I collected the ports I am aware of, put them into one git repository, maybe find the point where they were branched, rebase their changes. The goal is to make it more easy to find interesting things and move them back to upstream. One can find the combined git tree with the tags here. I started with WebOS, moved to iOS, then to Bada and stopped at Android as I would have to pick the sourcecode for each android release for each phone from each vendor. I think I will just be happy with the Android git tree for now. At this point I would like to share some of my observations in the order I did the import.

## Palm

Palm's release process is manual. In the last two releases they call the file .tgz but forgot to gzip it, in 2.0.0 the tarball name was in camel case. The thing that is very nice about Palm is that they provide their base and their changes (patch) separately. From looking at the 2.1.0 release it looks that for the Desktop version they want to implement Complex Font rendering. Earlier versions (maybe it is still the case) lack the support for animated GIF.

## iOS

Apple's release process seems to be very structured. The source can be downloaded here. What I think is to note is that the release tarball contains some implementations of WebCore only as .o file and Apple has stopped releasing the WebKit sourcecode beginning with iOS 4.3.0.

This port is probably not known by many. The release process seems to be manual as well, the name of directories changed a lot between the releases, they come with a WML Script engine and they do ship something they should not ship.

I really hope that this combined tree is useful for porters that want to see the tricks used in the various ports and don't want to spend the time looking for each port separately.

## February 13, 2011

### How to make the GNU Smalltalk Interpreter slower

#### Holger Freyther

This is another post about a modern Linux based performance measurement utility. It is called perf, it is included in the Linux kernel sources and it entered the kernel in v2.6.31-rc1. In many ways it is obsoleting OProfile, in fact for many architectures oprofile is just a wrapper around the perf support in the kernel. perf comes with a few nice application. perf top provides a statistics about which symbols in user and in kernel space are called, perf record to record an application or to start an application to record it and then perf report to browse this report with a very simple CLI utility. There are tools to bundle the record and the application in an archive, a diff utility.

For the last year I was playing a lot with GNU Smalltalk and someone posted the results of a very simplistic VM benchmark ran across many different Smalltalk implementations. In one of the benchmarks GNU Smalltalk is scoring last among the interpreters and I wanted to understand why it is slower. In many cases the JavaScriptCore interpreter is a lot like the GNU Smalltalk one, a simple direct-threaded bytecode interpreter, uses computed goto (even is compiled with -fno-gcse as indicated by the online help, not that it changed something for JSC), heavily inlined many functions.

There are also some differences, the GNU Smalltalk implementation is a lot older and in C. The first notable is that it is a Stack Machine and not register based, there are global pointers for the SP and the IP. Some magic to make sure that in the hot loop the IP/SP is 'local' in a register, depending on the available registers also keep the current argument in one, the interpreter definition is in a special file format but mostly similar to how Interepreter::privateExecute is looking like. The global state mostly comes from the fact that it needs to support switching processes and there might be some event during the run that requires access to the IP to store it to resume the old process. But in general the implementation is already optimized and there is little low hanging fruits and most experiments result in a slow down.

The two important things are again: Having a stable benchmark, having a tool to help to know where to look for things. In my case the important tools are perf stat, perf record, perf report and perf annotate. I have put a copy of the output to the end of this blog post. The stat utility provides one with number of instructions executed, branches, branch misses (e.g. badly predicted), L1/L2 cache hits and cache misses.

The stable benchmark helps me to judge if a change is good, bad or neutral for performance within the margin of error of the test. E.g. if I attempt to reduce the code size the instructions executed should decrease, if I start putting __builtin_expect.. into my code the number of branch misses should go down as well. The other useful utility is to the perf report that allows one to browse the recorded data, this can help to identify the methods one wants to start to optimize, it allows to annotate these functions inside the simple TUI interface, but does not support searching in it.

Because the codebase is already highly optimized any of my attempts should either decrease the code size (and the pressure on the i-cache), the data size (d-cache), remove stores or loads from memory (e.g. reorder instructions), fix branch predictions. The sad truth is that most of my changes were either slow downs or neutral to the performance and it is really important to undo these changes and not have false pride (unless it was also a code cleanup or such).

So after about 14 hours of toying with it the speed ups I have managed to make come from inlining a method to unwind a context (callframe), reordering some compares on the GC path and disabling the __builtin_expect branch hints as they were mostly wrong (something the kernel people found to be true in 2010 as well). I will just try harder, or try to work on the optimizer or attempt something more radical...

This probe will be executed whenever the sqlite3_get_table function of the mentioned library will be called. The $zSql is a variable passed to the sqlite3_get_table function and contains the query to be executed. I am converting the pointer to a local variable and then can print it. Using this simple probe helped me to see which queries were executed by the database library and helped me to do an easy optimisation. In general it could be very useful to build a set of probes (I think one calls set a tapset) that check for API misusage, e.g. calling functions with certain parameters where something else might be better. E.g. in Glib use truncate instead of assigning "" to the GString, or check for calls to QString::fromUtf16 coming from Qt code itself. On second thought this might be better as a GCC plugin, or both. ## December 17, 2010 ### In the name of performance #### Holger Freyther I tend to see people doing weird things and then claim that the change is improving performance. This can be re-ordering instructions to help the compiler, attempting to use multiple cores of your system, writing a memfill in assembly. On the one hand people can be right and the change is making things faster, on the other hand they could use assembly to make things look very complicated, justify their pay, and you might feel awkward to question if it is making any sense. In the last couple of weeks I have stumbled on some of those things. For some reason I found this bug report about GLIBC changing the memcpy routine for SSE and breaking the flash plugin (because it uses memcpy in the wrong way). The breakage is justified that the new memcpy was optimized and is faster. As Linus points out with his benchmark the performance improvement is mostly just wishful thinking. Another case was someone providing MIPS optimized pixman code to speed-up all drawing which turned out to be wishful thinking as well... The conclusion is. If someone claims that things are faster with his patch. Do not simply trust him, make sure he refers to his benchmark, is providing numbers of before and after and maybe even try to run it yourself. If he can not provide this, you should wonder how he measured the speed-up! There should be no place for wishful thinking in benchmarking. This is one of the areas where Apple's WebKit team is constantly impressing me. ## October 23, 2010 ### Easily embedding WebKit into your EFL application #### Lucas De Marchi This is the first of a series of posts that I’m planning to do using basic examples in EFL, the Enlightenment Foundation Libraries. You may have heard that EFL is reaching its 1.0 release. Instead of starting from the very beginning with the basic functions of these libraries, I decided to go the opposite way, showing the fun stuff that is possible to do. Since I’m also an WebKit developer, let’s put the best of both softwares together and have a basic window rendering a webpage. Before starting off, just some remarks: 1. I’m using here the basic EFL + WebKit-EFL (sometimes called ewebkit). Developing an EFL application can be much simpler, particularly if you use an additional library with pre-made widgets like Elementary. However, it’s good to know how the underlying stuff works, so I’m providing this example. 2. This could have been the last post in a series when talking about EFL since it uses at least 3 libraries. Don’t be afraid if you don’t understand what a certain function is for or if you can’t get all EFL and WebKit running right now. Use the comment section below and I’ll make my best to help you. ### Getting EFL and WebKit In order to able to compile the example here, you will need to compile two libraries from source: EFL and WebKit. For both libraries, you can either get the last version from svn or use the last snapshots provided. • EFL: Grab a snapshot from the download page. How to checkout the latest version from svn is detailed here, as well as some instructions on how to compile • WebKit-EFL: A very detailed explanation on how to get WebKit-EFL up and running is available on trac. Recently, though, WebKit-EFL started to be released too. It’s not detailed in the wiki yet, but you can grab a snapshot instead of checking out from svn. ### hellobrowser! In the spirit of “hello world” examples, our goal here is to make a window showing a webpage rendered by WebKit. For the sake of simplicity, we will use a default start page and put a WebKit-EFL “widget” to cover the entire window. See below a screenshot: hellobrowser - WebKit + EFL The code for this example is available here. Pay attention to a comment in the beginning of this file that explains how to compile it: gcc -o hellobrowser hellobrowser.c \ -DEWK_DATADIR="\"$(pkg-config --variable=datadir ewebkit)\"" \ \$(pkg-config --cflags --libs ecore ecore-evas evas ewebkit)

The things worth noting here are the dependencies and a variable. We directly depend on ecore and evas from EFL and on WebKit. We define a variable, EWK_DATADIR, using pkg-config so our browser can use the default theme for web widgets defined in WebKit. Ecore handles events like mouse and keyboard inputs, timers etc whilst evas is the library responsible for drawing. In a later post I’ll detail them a bit more. For now, you can read more about them on their official site.

The main function is really simple. Let’s divide it by pieces:

 // Init all EFL stuff we use evas_init(); ecore_init(); ecore_evas_init(); ewk_init();

Before you use a library from EFL, remember to initialize it. All of them use their own namespace, so it’s easy to know which library you have to initialize: for example, if you call a function starting by “ecore_”, you know you first have to call “ecore_init()”. The last initialization function is WebKit’s, which uses the “ewk_” namespace.

 window = ecore_evas_new(NULL, 0, 0, 800, 600, NULL); if (!window) { fprintf(stderr, "something went wrong... :(\n"); return 1; }

Ecore-evas then is used to create a new window with size 800×600. The other options are not relevant for an introduction to the libraries and you can find its complete documentation here.

 // Get the canvas off just-created window evas = ecore_evas_get(window);

From the Ecore_Evas object we just created, we grab a pointer to the evas, which is the space in which we can draw, adding Evas_Objects. Basically an Evas_Object is an object that you draw somewhere, i.e. in the evas. We want to add only one object to our window, that is where WebKit you render the webpages. Then, we have to ask WebKit to create this object:

 // Add a View object into this canvas. A View object is where WebKit will // render stuff. browser = ewk_view_single_add(evas);

Below I demonstrate a few Evas’ functions that you use to manipulate any Evas_Object. Here we are manipulating the just create WebKit object, moving to the desired position, resizing to 780x580px and then telling Evas to show this object. Finally, we tell Evas to show the window we created too. This way we have a window with an WebKit object inside with a little border.

 // Make a 10px border, resize and show evas_object_move(browser, 10, 10); evas_object_resize(browser, 780, 580); evas_object_show(browser); ecore_evas_show(window);

We need to setup a bit more things before having a working application. The first one is to give focus to the Evas_Object we are interested on in order to receive keyboard events when opened. Then we connect a function that will be called when the window is closed, so we can properly exit our application.

 // Focus it so it will receive pressed keys evas_object_focus_set(browser, 1);   // Add a callback so clicks on "X" on top of window will call // main_signal_exit() function ecore_event_handler_add(ECORE_EVENT_SIGNAL_EXIT, main_signal_exit, window);

After this, we are ready to show our application, so we start the mainloop. This function will only return when the application is closed:

 ecore_main_loop_begin();

The function called when the application is close, just tell Ecore to exit the mainloop, so the function above returns and the application can shutdown. See its implementation below:

static Eina_Bool main_signal_exit(void *data, int ev_type, void *ev) { ecore_evas_free(data); ecore_main_loop_quit(); return EINA_TRUE; }

Before the application exits, we shutdown all the libraries that were initialized, in the opposite order:

 // Destroy all the stuff we have used ewk_shutdown(); ecore_evas_shutdown(); ecore_shutdown(); evas_shutdown();

This is a basic working browser, with which you can navigate through pages, but you don’t have an entry to set the current URL, nor “go back” and “go forward” buttons etc. All you have to do is start adding more Evas_Objects to your Evas and connect them to the object we just created. For a still basic example, but with more stuff implemented, refer to the EWebLauncher that we ship with the WebKit source code. You can see it in the “WebKitTools/EWebLauncher/” folder or online at webkit’s trac. Eve is another browser with a lot more features that uses Elementary in addition to EFL, WebKit. See a blog post about it with some nice pictures.

Now, let’s do something funny with our browser. With a bit more lines of code you can turn your browser upside down. Not really useful, but it’s funny. All you have to do is to rotate the Evas_Object WebKit is rendering on. This is implemented by the following function:

// Rotate an evas object by 180 degrees static void _rotate_obj(Evas_Object *obj) { Evas_Map *map = evas_map_new(4);   evas_map_util_points_populate_from_object(map, obj); evas_map_util_rotate(map, 180.0, 400, 300); evas_map_alpha_set(map, 0); evas_map_smooth_set(map, 1); evas_object_map_set(obj, map); evas_object_map_enable_set(obj, 1);   evas_map_free(map); }

See this screenshot below and  get the complete source code.

EFL + WebKit doing Politreco upside down

## October 02, 2010

### Deploying WebKit, common issues

#### Holger Freyther

From my exposure to people deploying QtWebKit or WebKit/GTK+ there are some things that re-appear and I would like to discuss these here.

• Weird compile error in JavaScript?
• It is failing in JavaScriptCore as it is the first that is built. It is most likely that the person that provided you with the toolchain has placed a config.h into it. There are some resolutions to it. One would be to remove the config.h from the toolchain (many things will break), or use -isystem instead of -I for system includes.
The best way to find out if you suffer from this problem is to use -E instead of -c to only pre-process the code and see where the various includes are coming from. It is a strategy that is known to work very well.

• Most likely you do not have a DNS Server set, or no networking, or the system your board is connected to is not forwarding the data. Make sure you can ping a website that is supposed to work, e.g. ping www.yahoo.com, the next thing would be to use nc to execute a simple HTTP 1.1 get on the site and see if it is working. In most cases you simply lack networking connectivity.

• HTTPS does not work
• It might be either an issue with Qt or an issue with your system time. SSL Certificates at least have two dates (Expiration and Creation) and if your system time is after the Expiration or before the Creation you will have issues. The easiest thing is to add ntpd to your root filesystem to make sure to have the right time.

The possible issue with Qt is a bit more complex. You can build Qt without OpenSSL support, you can make it link to OpenSSL or you can make it to dlopen OpenSSL at runtime. If SSL does not work it is most likely that you have either build it without SSL support, or with runtime support but have failed to install the OpenSSL library.

Depending on your skills it might be best to go back to ./configure and make Qt link to OpenSSL to avoid the runtime issue. strings is a very good tool to find out if your libQtNetwork.so contains SSL support, together with using objdump -x and search for _NEEDED you will find out which config you have.

• Local pages are not loaded
• This is a pretty common issue for WebKit/GTK+. In WebKit/GTK+ we are using GIO for local files and to determine the filetype it is using the freedesktop.org shared-mime-info. Make sure you have that installed.

• The page only displays blank
• This is another issue that comes back from time to time. It only appears on WebKit/GTK+ with the DirectFB backend but sadly people never report back if and how they have solved it. You could make a difference and contribute back to the WebKit project.

In general most of these issues can be avoided by using a pre-packaged Embedded Linux Distribution like Ångström (or even Debian). The biggest benefit of that approach is that someone else made sure that when you install WebKit, all dependencies will be installed as well and it will just work for your ARM/MIPS/PPC system. It will save you a lot of time.

## August 28, 2010

### WebKit

#### Lucas De Marchi

After some time working with the EFL port of WebKit, I’ve been nominated as an official webkit developer. Now I have super powers in the official repository , but I swear I intend to use it with caution and responsibility. I’ll not forget Uncle Ben’s advice: ﻿﻿”with great power comes great responsibility”.

I’m preparing a post to talk about WebKit, EFL, eve (a new web browser based on WebKit + EFL) and how to easily embed a browser in your application. Stay tuned.

## August 10, 2010

### Coscup2010/GNOME.Asia with strong web focus

#### Holger Freyther

On the following weekend the Coscup 2010/GNOME.Asia is taking place in Taipei. The organizers have decided to have a strong focus on the Web as can be seen in the program.

On saturday there are is a keynote and various talks about HTML5, node.js. The Sunday will see three talks touching WebKit/GTK+. There is one about building a tablet OS with WebKit/GTK+, one by Xan Lopez on how to build hybrid applications (a topic I have devoted moiji-mobile.com to) and a talk by me using gdb to explain how WebKit/GTK+ is working and how the porting layer interacts with the rest of the code.

I hope the audience will enjoy the presentations and I am looking forward to attend the conference, there is also a strong presence of the ex-Openmoko Taiwan Engineering team. See you on Saturday/Sunday and drop me an email if you want to talk about WebKit or GSM...

## September 06, 2008

### Skia graphics library in Chrome: First impressions

#### Alp Toker

With the release of the WebKit-based Chrome browser, Google also introduced a handful of new backends for the browser engine including a new HTTP stack and the Skia graphics library. Google’s Android WebKit code drops have previously featured Skia for rendering, though this is the first time the sources have been made freely available. The code is apparently derived from Google’s 2005 acquisition of North Carolina-based software firm Skia and is now provided under the Open Source Apache License 2.0.

Weighing in at some 80,000 lines of code (to Cairo’s 90,000 as a ballpark reference) and written in C++, some of the differentiating features include:

• Optimised software-based rasteriser (module sgl/)
• Optional GL-based acceleration of certain graphics operations including shader support and textures (module gl/)
• Animation capabilities (module animator/)
• Some built-in SVG support (module (svg/)
• Built-in image decoders: PNG, JPEG, GIF, BMP, WBMP, ICO (modules images/)
• Text capabilities (no built-in support for complex scripts)
• Some awareness of higher-level UI toolkit constructs (platform windows, platform events): Mac, Unix (sic. X11, incomplete), Windows, wxwidgets
• Performace features
• Copy-on-write for images and certain other data types
• Extensive use of the stack, both internally and for API consumers to avoid needless allocations and memory fragmentation

The library is portable and has (optional) platform-specific backends:

• Fonts: Android / Ascender, FreeType, Windows (GDI)
• XML: expat, tinyxml
• Android shared memory (ashmem) for inter-process image data references

### Skia Hello World

In this simple example we draw a few rectangles to a memory-based image buffer. This also demonstrates how one might integrate with the platform graphics system to get something on screen, though in this case we’re using Cairo to save the resulting image to disk:

#include "SkBitmap.h" #include "SkDevice.h" #include "SkPaint.h" #include "SkRect.h" #include <cairo.h>   int main() { SkBitmap bitmap; bitmap.setConfig(SkBitmap::kARGB_8888_Config, 100, 100); bitmap.allocPixels(); SkDevice device(bitmap); SkCanvas canvas(&device); SkPaint paint; SkRect r;   paint.setARGB(255, 255, 255, 255); r.set(10, 10, 20, 20); canvas.drawRect(r, paint);   paint.setARGB(255, 255, 0, 0); r.offset(5, 5); canvas.drawRect(r, paint);   paint.setARGB(255, 0, 0, 255); r.offset(5, 5); canvas.drawRect(r, paint);   { SkAutoLockPixels image_lock(bitmap); cairo_surface_t* surface = cairo_image_surface_create_for_data( (unsigned char*)bitmap.getPixels(), CAIRO_FORMAT_ARGB32, bitmap.width(), bitmap.height(), bitmap.rowBytes()); cairo_surface_write_to_png(surface, "snapshot.png"); cairo_surface_destroy(surface); }   return 0; }

You can build this example for yourself linking statically to the libskia.a object file generated during the Chrome build process on Linux.

### Not just for Google Chrome

The Skia backend in WebKit, the first parts of which are already hitting SVN (r35852, r36074) isn’t limited to use in the Chrome/Windows configuration and some work has already been done to get it up and running on Linux/GTK+ as part of the ongoing porting effort.

The post Skia graphics library in Chrome: First impressions appeared first on Alp Toker.

## June 12, 2008

### WebKit Meta: A new standard for in-game web content

#### Alp Toker

Over the last few months, our browser team at Nuanti Ltd. has been developing Meta, a brand new WebKit port suited to embedding in OpenGL and 3D applications. The work is being driven by Linden Lab, who are eagerly investigating WebKit for use in Second Life.

While producing Meta we’ve paid great attention to resolving the technical and practical limitations encountered with other web content engines.

uBrowser running with the WebKit Meta engine

### High performance, low resource usage

Meta is built around WebKit, the same engine used in web browsers like Safari and Epiphany, and features some of the fastest content rendering around as well as nippy JavaScript execution with the state of the art SquirrelFish VM. The JavaScript SDK is available independently of the web renderer for sandboxed client-side game scripting and automation.

It’s also highly scalable. Some applications may need only a single browser context but virtual worlds often need to support hundreds of web views or more, each with active content. To optimize for this use case, we’ve cut down resource usage to an absolute minimum and tuned performance across the board.

### Stable, easy to use cross-platform SDK

Meta features a single, rock-solid API that works identically on all supported platforms including Windows, OS X and Linux. The SDK is tailored specifically to embedding and allows tight integration (shared main loop or operation in a separate rendering thread, for example) and hooks to permit seamless visual integration and extension. There is no global setup or initialization and the number of views can be adjusted dynamically to meet resource constraints.

### Minimal dependencies

Meta doesn’t need to use a conventional UI toolkit and doesn’t need any access to the underlying windowing system or the user’s filesystem to do its job, so we’ve done away with these concepts almost entirely. It adds only a few megabytes to the overall redistributable application’s installed footprint and won’t interfere with any pre-installed web browsers on the user’s machine.

### Nuanti will be offering commercial and community support and is anticipating involvement from the gaming industry and homebrew programmers.

In the mid term, we aim to submit components of Meta to the WebKit Open Source project, where our developers are already actively involved in maintaining various subsystems.

### Find out more

Today we’re launching meta.nuanti.com and two mailing lists to get developers talking. We’re looking to make this site a focal point for embedders, choc-full of technical details, code samples and other resources.

The post WebKit Meta: A new standard for in-game web content appeared first on Alp Toker.

## April 21, 2008

### Acid3 final touches

#### Alp Toker

Recently we’ve been working to finish off and land the last couple of fixes to get a perfect pixel-for-pixel match against the reference Acid3 rendering in WebKit/GTK+. I believe we’re the first project to achieve this on Linux — congratulations to everyone on the team!

Epiphany using WebKit r32284

We also recently announced our plans to align more closely with the GNOME desktop and mobile platform. To this end we’re making a few technology and organisational changes that I hope to discuss in an upcoming post.

The post Acid3 final touches appeared first on Alp Toker.

## April 06, 2008

### WebKit Summer of Code Projects

#### Alp Toker

With the revised deadline for Google Summer of Code ’08 student applications looming, we’ve been getting a lot of interest in browser-related student projects. I’ve put together a list of some of my favourite ideas.

If in doubt, now’s the time to submit proposals. Already-listed ideas are the most likely to get mentored but students are free to propose their own ideas as well. Proposals for incremental improvements will tend to be favoured over ideas for completely new applications, but a proof of concept and/or roadmap can help when submitting plans for larger projects.

Update: There’s no need to keep asking about the status of an application on IRC/private mail etc. It’s a busy time for the upstream developers but they’ll get back in touch as soon as possible.

The post WebKit Summer of Code Projects appeared first on Alp Toker.

## March 27, 2008

### WebKit gets 100% on Acid3

#### Alp Toker

Today we reached a milestone with WebKit/GTK+ as it became the first browser engine on Linux/X11 to get a full score on Acid3, shortly after the Acid3 pass by WebKit for Safari/Mac.

Epiphany using WebKit r31371

There is actually still a little work to be done before we can claim a flawless Acid3 pass. Two of the most visible remaining issues in the GTK+ port are :visited (causing the “LINKTEST FAILED” notice in the screenshot) and the lack of CSS text shadow support in the Cairo/text backend which is needed to match the reference rendering.

It’s amazing to see how far we’ve come in the last few months, and great to see the WebKit GTK+ team now playing an active role in the direction of WebCore as WebKit continues to build momentum amongst developers.

Update: We now also match the reference rendering.

The post WebKit gets 100% on Acid3 appeared first on Alp Toker.

## March 15, 2008

### Bossa Conf ’08

#### Alp Toker

Am here in the LHR lounge. In a couple of hours, we take off for the INdT Bossa Conference, Pernambuco, Brazil via Lisbon. Bumped in to Pippin who will be presenting Clutter. Also looking forward to Lennart‘s PulseAudio talk amongst others.

If you happen to be going, drop by on my WebKit Mobile presentation, 14:00 Room 01 this Monday. We have a small surprise waiting for Maemo developers.

The post Bossa Conf ’08 appeared first on Alp Toker.