January 06, 2021

Release Notes for Safari Technology Preview 118

Surfin’ Safari

Safari Technology Preview Release 118 is now available for download for macOS Big Sur and macOS Catalina. If you already have Safari Technology Preview installed, you can update in the Software Update pane of System Preferences on macOS.

This release covers WebKit revisions 270230-270749.

Web Inspector

  • Elements
    • Added an experimental Font details sidebar panel for showing information about the currently used font of the selected node (r270637)
  • Sources
    • Added support for intercepting and overriding network requests (r270604)


  • Implemented Definite and Indefinite Sizes in flexbox (r270578)
  • Corrected cases in which box-sizing was border-box and didn’t use the content box to compute size based on aspect ratio (r270617)
  • Fixed preserving aspect ratio when computing cross size of flexed images in auto-height flex container (r270288)
  • Added support for aspect-ratio on replaced and non-replaced elements (r270551, r270618)
  • Changed text-decoration-color animation not to be discrete (r270597)
  • Changed getComputedStyle rounding lineHeight to nearest pixel (r270248)
  • Changed to trigger web font loads earlier (r270590)


  • Made only the first wheel event in a gesture to be cancelable (r270425)


  • Enabled “at” methods (r270550)
  • Changed get and set for object literal and class to not be escaped (r270487)
  • Accepted escaped keywords for class and object property names (r270481)
  • Aligned %TypedArray% constructor/slice behavior with the specification strictly (r270552, r270371)
  • Added a JSC API to allow acquiring the JSLock to accelerate performance (r270659)
  • Removed unnecessary JSLock use from various JSC APIs (r270665)
  • Aligned [[DefineOwnProperty]] method of mapped arguments object with the specification strictly (r270664)
  • Changed Reflect.preventExtensions not throwing if called on WindowProxy or Location (r270702)


  • Fixed rasterizer discard interfering with implicit clears in WebGL 2 (r270253)


  • Implemented WebVTT VTTCue region attribute (r270738)

Private Click Measurement

  • Exposed an API for enabling or disabling Private Click Measurement (r270710)


  • Added support for RTCRtpSender::setStreams (r270486)
  • Enabled use of new socket API for WebRTC TLS connections by default (r270680)
  • Fixed ICE not resolving for turns relay candidates rooted in LetsEncrypt CA (r270626)
  • Improved RTCRtpSender and RTCRtpReceiver transforms support (r270641, r270290, r270294, r270507, r270532)
  • Introduced an experimental flag specific to VP9 profile 2 (r270256)


  • Changed to allow blob URLs with fragments (r270269)
  • Fixed lazy loaded iframe to not lazy load when scripting is disabled (r270300)
  • Changed Reflect.preventExtensions to not throw if called on WindowProxy or Location (r270702)
  • Changed sessionStorage to not be cloned when a window is opened with rel=noopener (r270273)
  • Updated the list of blocked ports according fetch specification (r270321)


  • Fixed VoiceOver not announcing the aria-checked state for ARIA treeitem (r270333)


  • Fixed the onClicked listener not being called for page actions

January 06, 2021 09:10 PM

December 22, 2020

Manuel Rego: 2020 Recap

Igalia WebKit

2020 is not a great year to do any kind of recap, but there have been some positive things happening in Igalia during this year. Next you can find a highlight of some of these things in no particular order.

CSS Working Group A Coruña F2F

The year couldn’t start better, on January Igalia hosted a CSS Working Group face-to-face meeting in our office in A Coruña (Galicia, Spain). Igalia has experience arranging other events in our office, but this was the first time that the CSSWG came here. It was an amazing week and I believe everyone enjoined the visit to this corner of the world. 🌍

Brian Kardell from Igalia was talking to everybody about Container Queries. This is one of the features that web authors have been asking for since ever, and Brian was trying to push the topic forward and find some kind of solution (even if not 100% feature complete) for this topic. In that week there were discussions about the relationship with other topics like Resize Observer or CSS Containment, and new ideas appeared too. Brian posted a blog post after the event, explaining some of those ideas. Later my colleague Javi Fernández worked on an experiment that Brian mentioned on a recent post. The good news is that all these conversations managed to bring this topic back to life, and past November Google announced that they have started working on a Container Queries prototype in Chromium.

During the meeting Jen Simmons (in Mozilla at that time, now in Apple) presented some topics from Mozilla, including a detailed proposal for Masonry Layout based on Grid, this has been something authors have also showed interest, and Firefox has already a prototype implementation behind a runtime flag.

Apart from the three days full of meetings and interesting discussions, some of the CSSWG members participated in a local meetup giving 4 nice talks:

Finally, I remember some corridor conversations about the Mozilla layoffs that had just happened just a few days before the event, but nobody could expect what was going to happen during the summer. It looks like 2020 has been a bad year for Mozilla in general and Servo in particular. 😢

Open Prioritization

This summer Igalia launched the Open Prioritization campaign, where we proposed a list of topics to be implemented on the different browser engines, and people supported them with different pledges; I wrote a blog post about it by that time.

Open Prioritization: :focus-visible in Safari/WebKit: $30.8K pledged out of $35K. Open Prioritization: :focus-visible in Safari/WebKit

This was a cool experiment, and it looks like a successful one, as :focus-visible in WebKit/Safari has been the winner. Igalia is currently collecting funds through Open Collective in order to start the implementation of :focus-visible in WebKit, you still have time to support it if you’re interested. If everything goes fine this should happen during the first quarter of 2021. 🚀

Igalia Chats

This actually started in later 2019, but it has been ongoing during the whole 2020. Brian Kardell has been recording a podcast series about the web platform and some of its features with different people from the industry. They have been getting more and popular, and Brian was even asked to record one of these for the last BlinkOn edition.

So far 8 episodes of around 1 hour length have been published, with 13 different guests. More to come in 2021! If you are curious and want to know more, you can find them at Igalia website or in your favourite podcasting platform.

Igalia contributions

This is not a comprehensive list but just some highlights of what Igalia has been doing in 2020 around CSS:

We’re working on a demo about these features, that we’ll be publishing next year.

In February Chromium published the requirements to become API owner. Due to my involvement on the Blink project since the fork from WebKit back in 2013, I was nominated and became Blink API Owner past March. 🥳

Yoav Weiss on the BlinkOn 13 Keynote announcing me as API owner Yoav Weiss on the BlinkOn 13 Keynote announcing me as API owner

The API owners met on a weekly basis to review the intent threads and discuss about them, it’s an amazing learning experience to be part of this group. In my case when reviewing intents I usually pay attention to things related to interoperability, like the status of the spec, test suites and other implementations. In addition, I have the support from all my awesome colleagues at Igalia that help me to play this role, thank you all!

2021 and beyond…

Igalia keeps growing and a bunch of amazing folks will join us soon, particularly Delan Azabani and Felipe Erias are already starting these days as part of the Web Platform team.

Open Prioritization should have the first successful project, as :focus-visible is advancing funding and it gets implemented in WebKit. We hope this can lead to new similar experiments in the future.

And I’m sure many other cool things will happen at Igalia next year, stay tuned!

December 22, 2020 11:00 PM

December 14, 2020

CSS Individual Transform Properties

Surfin’ Safari

CSS Transforms appeared on the Web along with CSS Animations and CSS Transitions to add visual effects and motion on the Web. Those technologies have been a staple of the Web platform and Web developers’ toolkit for well over a decade. In fact, the CSS transform property first shipped in Safari all the way back in July 2008 when iPhone OS 2.0 shipped. You can find some historical posts about initial support in WebKit from October 2007, and another post from July 2009 focusing on 3D transforms when CSS Transforms shipped in Mac OS X Leopard.

And now, there is some news in the world of CSS Transforms: individual transform properties are enabled by default in Safari Technology Preview 117. This means that, as in Firefox and Chrome Canary, you can now use the new translate, rotate and scale CSS properties to specify what have so far been functions of the transform property, including 3D operations.

Using these properties is simple and should make Web developers feel right at home. Consider these two equivalent examples:

div.transform-property {
    transform: translate(100px, 100px) rotate(180deg) scale(2);

div.individual-properties {
    translate: 100px 100px;
    rotate: 180deg;
    scale: 2;

But why would you use these new properties over the transform property? One reason is convenience, as you might deem it simpler to write scale: 2 rather than transform: scale(2) when all you intend to do is scale an element.

But I think the main draw here is that you are now free to compose those various transform properties any way you see fit. For instance, you can easily write a CSS class to flip an element using the scale property without worrying that you might override other transform-related properties:

.flipped {
    scale: -1;

Your flipped class will work just fine even if a rotate or transform property applies a rotation to the element.

This feature also comes in handy when animating transforms. Let’s say you’re writing an animation that scales an element up over its entire duration but also applies a rotation for the second half of that animation. With the transform, property you would have had to pre-compute what the intermediate values for the scale should have been when the rotation would start and end:

@keyframes scale-and-rotate {
    0%   { transform: scale(1) }
    50%  { transform: scale(1.5) rotate(0deg) }
    100% { transform: scale(2) rotate(180deg) }

While this may not look like such a big deal when you look at it, making any further changes to those keyframes would require recomputing those values. Now, consider this same animation written with the individual transform properties:

@keyframes scale-and-rotate {
    0%   { scale: 0 }
    50%  { rotate: 0deg } 
    100% { scale: 1; rotate: 180deg; }

You can easily change the keyframes and add other properties as you like, leaving the browser to work out how to correctly apply those individual transform properties.

But that’s not all; there is also the case where you want separate animations to apply to an element at the same time. You could split out this single set of keyframes into two different sets and tweak the timing instead:

.animated {
    /* Apply the scale keyframes for 1s and the rotate
       keyframes for 500ms with a 500ms delay. */
    animation: scale 1s, rotate 500ms 500ms;

@keyframes scale {
    from { scale: 0 }
    to   { scale: 1 }

@keyframes rotate {
    from { rotate: 0deg }
    to   { rotate: 180deg }

Now keyframes applying to transforms are not only easier to author, but you can better separate the timing and the keyframes by composing multiple transform animations. And if you are a seasoned CSS Animations developer, you’ll know how important this can be when you factor in timing functions.

Additionally, animating the new individual transform properties retains the same great performance as animating the transform property since these properties support hardware acceleration.

But what about the transform property? How does it relate to those new individual transform properties?

First, remember that the transform property supports transform functions that are not represented as individual transform properties. There are no equivalent CSS properties for the skew(), skewX() and skewY() functions and no property equivalent to the matrix() function.

But what happens when you specify some of the individual transform properties as well as the transform property? The CSS Transform Level 2 specification explains how individual transform properties and the transform-origin and transform properties are composed to form the current transformation matrix. To summarize, first the individual transform properties are applied – translate, rotate, and then scale – and then the functions in the transform property are applied.

This means that there’s a clear model to use those individual transform properties and the transform property together to enhance your ability to transform content on the Web platform.

And before you start using these new properties, it is important that you know how to detect their availability and use transform as a fallback. Here, the @supports rule will allow you to do what you need:

@supports (translate: 0) {
    /* Individual transform properties are supported */
    div {
        translate: 100px 100px;

@supports not (translate: 0) {
    /* Individual transform properties are NOT supported */
    div {
        transform: translate(100px, 100px);

We encourage you to start exploring how to use those three new properties in Safari Technology Preview in your projects and file bug reports on bugs.webkit.org should you encounter unexpected issues. You can also send a tweet to @webkit or @jonathandavis to share your thoughts on individual transform properties.

December 14, 2020 06:00 PM

December 10, 2020

Release Notes for Safari Technology Preview 117

Surfin’ Safari

Safari Technology Preview Release 117 is now available for download for macOS Big Sur and macOS Catalina. If you already have Safari Technology Preview installed, you can update in the Software Update pane of System Preferences on macOS.

This release covers WebKit revisions 268651-270230.

Web Inspector

  • Elements
    • Added the option to “Edit Breakpoint…” or “Reveal Breakpoint” in Sources Tab (r269337)
    • Extra closing parenthesis added after var in styles panel (r269201)
  • Network
    • Fixed background color of rows from previous session (r269127)
    • Truncated data URLs in the Resources sidebar and Headers panel (r269075)
  • Search
    • Prevented stealing focus from the search field when shown (r269074)
  • Sources
    • Changed the default breakpoint action to be evaluate (r269547)
  • Console
    • Exposed console command line API to breakpoint conditions and actions (r269023, r269044)
    • Fixed using Show JavaScript Console in an empty tab in Safari Technology Preview (r270060)
  • Other Changes
    • Updated styles to use CSS properties with neutral directionality (r269166)


  • Added support for discrete animations of many CSS properties (r269812, r269333, r269357, r268792, r268718, r268726)
  • Added support for animations on more pseudo-elements (such as :marker) (r269813)
  • Added support for more properties on ::marker (r269774)
  • Added parse support for aspect-ratio CSS property (r269641)
  • Made CSS font shorthands parsable within a worker (r269957)
  • Changed images as flex items to use the overridingLogicalHeight when defined to compute the logical width (r270073)
  • Changed images as flex items to use the overridingLogicalWidth when defined to compute the logical height (r270116)
  • Changed background-size to not accept negative values (r269237)
  • Fixed issues with percentage height on grid item replaced children when the grid item has a scrollbar (r269717)
  • Serialized aspect ratio with spaces around the slash (r268659)


  • Enabled static public class fields (r269922, r269939)
  • Enabled static and instance private class fields (r270066)
  • Implemented Intl.DateTimeFormat.formatRangeToParts (r269706)
  • Implemented Intl.ListFormat (r268956)
  • Aligned %TypedArray% behavior with recent spec adjustments (r269670)
  • Implemented @@species support in ArrayBuffer#slice (r269574)
  • Fixed toLocaleDateString() resolving incorrect date for some old dates (r269502)
  • Resurrected SharedArrayBuffer and Atomics behind a flag (JSC_useSharedArrayBuffer=1) (r269531)


  • Added wasm atomics instructions, partially behind a flag (JSC_useSharedArrayBuffer=1) (r270208)
  • Fixed opcodes for table.grow and table.size (r269790)
  • Implemented shared WebAssembly.Memory behind a flag (JSC_useSharedArrayBuffer=1) (r269940)
  • Implemented i32 sign-extension-ops (r269929)


  • Added proper garbage collection to ResizeObserver (r268860)
  • Changed Worklet.addModule() to reject promise with an AbortError when the network load fails (r270033)
  • Changed event targets to be cleared after dispatch if the target pointed to a shadow tree (r269546)
  • Changed WebSocket constructor to not throw when the port is blocked (r269459)
  • Fixed toggling dark mode to update the scrollbar appearance in overflow: scroll elements (r269437)
  • Fixed navigator.clipboard to be exposed on *.localhost pages (r269960)
  • Fixed auto-focus of text input to not select text (r269587)
  • Fixed Canvas drawImage to not raise an IndexSizeError on empty sources (r270126)
  • Fixed getIndexedParameter indexing crash (r270160)
  • Fixed text getting clobbered when assigning to input.defaultValue (r269528)
  • Fixed <input disabled> to fire click events after dispatchEvent (r269452)
  • Fixed the space between minute and meridiem fields in time inputs being too large (r270148)
  • Fixed window.event to not be affected by nodes moving post-dispatch (r269500)
  • Improved exception messages when AudioContext.suspend() / resume() promises are rejected (r268999)
  • Promises returned by our DOM API have the caller’s global instead of the callee’s (r269227)
  • Removed unneeded whitespace between content and <br> (r268958, r269036)

Speech Recognition

  • Added audio capture for SpeechRecognition (r270158)
  • Added a default action for SpeechRecognition permission request (r269918)
  • Implemented basic permission check for SpeechRecognition (r269810)


  • Added WebRTC SFrame transform (r269830)
  • Added infrastructure for WebRTC transforms (r269764)
  • Added support for RTCPeerConnection.onicecandidateerror event (r270101)
  • Added support for RTCRtpScriptTransform (r270107)
  • Added support for VP9 Profile 2 (10-bit color) in WebRTC (r268971)
  • Increased camera failing timer to 30 seconds (r269190)


  • A video element may fail to enter picture-in-picture from fullscreen (r268816)
  • Added handling trackId changes across Initialization Segments in MSE (r269121)
  • Added addOutput() and removeOutput() utility functions to AudioSummingJunction (r268820)
  • Added skeleton implementation of Media Session API (r268735)
  • Changed to ensure WebAudio API throws exceptions with useful error messages (r268812)
  • Changed AudioBuffer channels to be neuterable and detachable (r269108)
  • Fixed an infinite loop in sample eviction when duration is NaN in MSE (r270106)
  • Fixed Web Audio continuing to play when navigating off the web page via an iframe (r268893)
  • Fixed poor resampling quality when using AudioContext sampleRate parameter (r270141, r270157)
  • Fixed AudioBuffer.getChannelData(x) to keep returning the same JavaScript wrapper for a given channel (r269081)
  • Fixed AudioContext.suspend() to not reject promise when the audio session is interrupted (r269039)
  • Fixed transparent video poster image to keep element transparent once the first frame is preloaded (r269407)
  • Fixed fetching an audio worklet module using a data URL (r270046)
  • Improved the speed of audio and video element creation up to 50x faster (r269077)

Web Animations

  • Ensured animation updates are not scheduled when there are no styles to update (r269963)
  • Fixed KeyframeEffect.pseudoElement to return a valid string when targeting ::marker or ::first-letter (r269623)
  • Fixed accelerated animations of individual transform properties to apply rotate before scale (r269527)


  • Changed programmatic scroll to stop rubberbanding (r269373, r269559)
  • Changed to update scrolling geometry immediately for programmatic scrolls (r269558)

Scroll Snap

  • Fixed scroll snap specified on :root (r269506)
  • Fixed scroll-snap on root aligning to the body margin edge, not the viewport edge (r269622)
  • Made axis in scroll-snap-type required (r268665)
  • Made scroll-margin independent of scroll snapping and applied it when scrolling to anchors (r269144)
  • Made scroll-padding independent of scroll-snap and have it affect scrollIntoView (r270023)
  • Stopped creating implicit snap points at scrollmin and scrollmax (r268856)

Private Click Measurement

  • Added persistence for pending ad clicks and attributions so they survive browser restart (r270136)
  • Changed to accept ad click data when the link opens a new window (r269129)
  • Changed attribute and JSON key names according to the W3C conversation (r269886)
  • Switched to JSON report format (r269489)

Web Driver

  • Added handling for surrogate pairs in keyboard actions (r269421)
  • Added support for a sequence of character key presses (r269035)
  • Added handling HTTPS configuration for WebDriver tests (r268723)
  • Fixed elements in Shadow DOM incorrectly marked as stale (r268867)

December 10, 2020 09:15 PM

November 29, 2020

Philippe Normand: Catching up on WebKit GStreamer WebAudio backends maintenance

Igalia WebKit

Over the past few months the WebKit development team has been working on modernizing support for the WebAudio specification. This post highlights some of the changes that were recently merged, focusing on the GStreamer ports.

My fellow WebKit colleague, Chris Dumez, has been very active lately, updating the WebAudio implementation …

By Philippe Normand at November 29, 2020 12:45 PM

November 26, 2020

Víctor Jáquez: Notes on using Emacs (LSP/ccls) for WebKit

Igalia WebKit

I used to regard myself as an austere programmer in terms of tooling: Emacs —with a plain configuration— and grep. This approach forces you to understand all the elements involved in a project.

Some time ago I have to code in Rust, so I needed to learn the language as fast as possible. I looked for packages in MELPA that could help me to be productive quickly. Obviously, I installed rust-mode, but I also found racer for auto-completion. I tried it out. It was messy to setup and unstable, but it helped me to code while learning. When I felt comfortable with the base code, I uninstalled it.

This year I returned to work on WebKit. The last time I contributed to it was around five years ago, but now in a different area (still in the multimedia stack). WebKit is huge, and because of C++, I found gtags rather limited. Out of curiosity I looked for something similar to racer but for C++. And I spent a while digging on it.

The solution consists in the integration of three MELPA packages:

  • lsp-mode: a client for Language Server Protocol for Emacs.
  • company-mode: a text completion framework.
  • ccls: A C/C++ language server. Besides emacs-ccls adds more functionality to lsp-mode.

(I known, there’s a simpler alternative to lsp-mode, but I haven’t tried it yet).

First we might explain what’s LSP. It stands for Language Server Protocol, defined with JSON-RPC messages, between the editor and the language server. It was orginally developed by Microsoft for Visual Studio, which purpose is to support auto-completion, finding symbol’s definition, to show early error markers, etc., inside the editor. Therefore, lsp-mode is an Emacs mode that communicates with different language servers in LSP and operates in Emacs accordingly.

In order to support the auto-completion use-case lsp-mode uses the company-mode. This Emacs mode is capable to create a floating context menu where the editing cursor is placed.

The third part of the puzzle is, of course, the language server. There’s a language servers for different programming languages. For C & C++ there are two servers: clangd and ccls. The former uses Clang compiler, the last can use either Clang, GCC or MSVC. Along this text ccls will be used for reasons exposed later. In between, emacs-ccls leverages and extends the support of ccls in lsp-mode, though it’s not mandatory.

In short, the basic .emacs configuration, using use-package, would have these lines:

(use-package company
  :config (global-company-mode 1))

(use-package lsp-mode
  :diminish "L"
  :init (setq lsp-keymap-prefix "C-l"
              lsp-enable-file-watchers nil
              lsp-enable-on-type-formatting nil
              lsp-enable-snippet nil)
  :hook (c-mode-common . lsp-deferred)
  :commands (lsp lsp-deferred))

(use-package ccls
  :init (setq ccls-sem-highlight-method 'font-lock)
  :hook ((c-mode c++-mode objc-mode) . (lambda () (require 'ccls) (lsp-deferred))))

The snippet first configures company-mode. It is enabled globally because, normally, it is a nice feature to have, even in non-coding buffers, such as this very one, for writing a blog post in markdown format. Diminish mode hides or abbreviates the mode description in the Emacs’ mode line.

Later comes lsp-mode. It’s big and aims to do a lot of things, basically we have to tell it to disable certain features, such as file watcher, something not viable in massive projects as WebKit; as I don’t use snippet (generic text templates), I also disable it; and finally, lsp-mode tries to format the code at typing, I don’t know how the code style is figured out, but in my experience, it’s always detected wrong, so I disabled it too. Finally, lsp-mode is launched when a text uses the c-mode-common, shared by c++-mode too. lsp-mode is launched deferred, meaning it’ll startup until the buffer is visible; this is important since we might want to delay ccls session creation until the buffer’s .dir-locals.el file is processed, where it is configured for the specific project.

And lastly, ccls-mode configuration, hooked until c-mode or c++-mode are loaded up in a deferred fashion (already explained).

It’s important to understand how ccls works in order to integrate it in our workflow of a specific project, since it might need to be configured using Emacs’ per-directory local variales.

We are living in a post-Makefile world (almost), proof of that is ccls, which instead of a makefile, it uses a compilation database, a record of the compile options used to build the files in a project. It’s commonly described in JSON and it’s generated automatically by build systems such as meson or cmake, and later consumed by ninja or ccls to execute the compilation. Bear in mind that ccls uses a cache, which can eat a couple gigabytes of disk.

Now, let’s review the concrete details of using these features with WebKit. Let me assume that WebKit local repository is cloned in ~/WebKit.

As you may know, the cool way to compile WebKit is with flatpak. Flatpak adds an indirection in the compilation process, since it’s done in an isolated environment, above the native system. As a consequence, ccls has to be the one inside the Flatpak environment. In ~/.local/bin/webkit-ccls:

set -eu
cd $HOME/WebKit/
exec Tools/Scripts/webkit-flatpak -c ccls "$@"

Basically the scripts calls ccls inside flatpak, which is available in the SDK. And this is why ccls instead of clang, since clang is not provided.

By default ccls assumes the compilation database is in the project’s root directory, but in our case, it’s not, thus it is required to configure the database directory for our WebKit setup. For it, as we already said, a .dir-locals.el file is used.

  (indent-tabs-mode . nil)
  (c-basic-offset . 4))
  (indent-tabs-mode . nil)
  (c-basic-offset . 4))
  (indent-tabs-mode . nil)
  (c-basic-offset . 4))
  (indent-tabs-mode . nil))
  (fill-column . 100)
  (ccls-executable . "/home/vjaquez/.local/bin/webkit-ccls")
  (ccls-initialization-options . (:compilationDatabaseDirectory "/app/webkit/WebKitBuild/Release"
                                  :cache (:directory ".ccls-cache")))
  (compile-command . "build-webkit --gtk --debug")))

As you can notice, ccls-execute is defined here, though it’s not a safe local variable. Also the ccls-initialization-options, which is a safe local variable. It is important to notice that the compilation database directory is a path inside flatpak, and always use the Release path. I don’t understand why, but Debug path didn’t work for me. This mean that WebKit should be compiled as Release frequently, even if we only use Debug type for coding (as you may see in my compile-command).

Update: Now we can explain why it’s important to configure lsp-mode as deferred: to avoid connections to ccls before processing the .dir-locals.el file.

And that’s all. Now I have early programming errors detection, auto-completion, and so on. I hope you find these notes helpful.

Update: Sadly, because of flatpak indirection, symbols’ definition finding won’t work because the file paths stored in ccls cache are relative to flatpak’s file system. For that I still rely on global and its Emacs mode.

By vjaquez at November 26, 2020 04:20 PM

November 23, 2020

MediaRecorder API

Surfin’ Safari

Safari Technology Preview 105 and Safari in the latest iOS 14.3 beta enabled support for the MediaRecorder API by default. This API takes as input live audio/video content to produce compressed media. While the immediate use case is to record from the camera and/or microphone, this API can take any MediaStreamTrack as input, be it a capture track, coming from the network using WebRTC, or generated from HTML (Canvas, WebAudio), as illustrated in the chart below.

The generated output, exposed as blobs, can be readily rendered in a video element to preview the content, edit it, and/or upload to servers for sharing with others.

This API can be feature-detected, as can the set of supported file/container formats and audio/video codecs. Safari currently supports the MP4 file format with H.264 as video codec and AAC as audio codec. MediaRecorder support can be checked as follows:

function supportsRecording(mimeType)
    if (!window.MediaRecorder)
        return false;
    if (!MediaRecorder.isTypeSupported)
        return mimeType.startsWith("audio/mp4") || mimeType.startsWith("video/mp4");
    return MediaRecorder.isTypeSupported(mimeType);

The following example shows how camera and microphone can be recorded as mp4 content and locally previewed on the same page.

<button onclick="startRecording()">start</button><br>
<button onclick="endRecording()">end</button>
<video id="video" autoplay playsInline muted></video>
let blobs = [];
let stream;
let mediaRecorder;
async function startRecording()
    stream = await navigator.mediaDevices.getUserMedia({ audio: true, video: true });
    mediaRecorder = new MediaRecorder(stream);
    mediaRecorder.ondataavailable = (event) => {
       // Let's append blobs for now, we could also upload them to the network.
       if (event.data)
    mediaRecorder.onstop = doPreview;
    // Let's receive 1 second blobs
function endRecording()
    // Let's stop capture and recording
    stream.getTracks().forEach(track => track.stop());
function doPreview()
    if (!blobs.length)
    // Let's concatenate blobs to preview the recorded content
    video.src = URL.createObjectURL(new Blob(blobs, { type: mediaRecorder.mimeType }));

Future work may extend the support to additional codecs as well as supporting options like video/audio bitrates.

getUserMedia in WKWebView

Speaking of Safari in latest iOS 14.3 beta and local capture, navigator.mediaDevices.getUserMedia can now be exposed to WKWebView applications. navigator.mediaDevices.getUserMedia is automatically exposed if the embedding application is able to natively capture either audio or video. Please refer to Apple documentation to meet these requirements. Access to camera and microphone is gated by a user prompt similar to Safari and SafariViewController prompts. We hope to extend WKWebView APIs to allow applications to further control their camera and microphone management in future releases.

We hope you will like these new features. As always, please let us know if you encounter any bugs (or if you have ideas for future enhancements) by filing bugs on bugs.webkit.org.

November 23, 2020 06:00 PM

November 20, 2020

Paulo Matos: A tour of the for..of implementation for 32bits JSC

Igalia WebKit

We look at the implementation of the for-of intrinsic in 32bit JSC (JavaScriptCore).


By Paulo Matos at November 20, 2020 02:00 PM

November 19, 2020

Release Notes for Safari Technology Preview 116

Surfin’ Safari

Safari Technology Preview Release 116 is now available for download for macOS Big Sur and macOS Catalina. If you already have Safari Technology Preview installed, you can update in the Software Update pane of System Preferences on macOS.

This release covers WebKit revisions 267959-268651.

Web Extensions

  • Added support for non-persistent background pages
  • Fixed browser.tabs.update() to accept calls without a tabId parameter
  • Fixed browser.tabs.update() to allow navigations to a URL with a custom scheme

Web Inspector

  • Sources
    • Added support for creating a local override from resources that failed to load (r267977)
    • Added a + to the Local Overrides section in the navigation sidebar to streamline creating custom local overrides (r267979)
    • Fixed issue where event breakpoints were not editable after being added (r267976)
    • Fixed issue where line-based JavaScript breakpoints were not added on reload (r268629)
    • Fixed issue where the Sources Tab had wrong icon when paused (r268427)

Web Audio API

  • Enabled AudioWorklet API by default (r268459)
  • Added implementation for AudioWorkletGlobalScope.registerProcessor() (r268103)
  • Added implementation for AudioWorkletGlobalScope‘s currentFrame, currentTime, and sampleRate attributes (r268076)
  • Changed to use AudioWorkletProcessor to process audio (r268365)
  • Changed calling AudioContext.resume() right after AudioContext.suspend() to be a no-op (r268368)
  • Changed AudioWorkletGlobalScope to perform a microtask checkpoint after each rendering quantum (r268369)
  • Fixed parameters argument for AudioWorkletProcessor.process() to be spec-compliant (r268414)


  • Enabled video capture by default on macOS (r268052)
  • Added support for MediaRecorder bitrate getters (r268363)
  • Added support for MediaRecorder pause and resume (r268130)
  • Added support for respecting enabled and muted tracks (r267987)
  • Added support for BlobEvent.timecode (r268136)
  • Fixed MediaRecorder .stop to not throw in Inactive state (r268477)
  • Made sure to fire the correct set of events in case MediaRecorder stream has track changes (r268119)


  • Added support for the individual transform properties translate, rotate, scale, including accelerated animation (r267985, r268627)
  • Fixed flex-grow property to be animatable (r268516)
  • Fixed CSS image-orientation: none to be ignored for cross-origin images (r268249)
  • CSS transform computed style should not reflect individual transform properties (r268263)
  • Added painting CSS highlights over images (r268487)
  • Fixed clip-path: path() ignoring page zooming (r268138)
  • Fixed background-clip: var(--a) invalidating -webkit-background-clip: text when --a: text (r268158)


  • Respect the font size when presenting the <select> dropdown when custom fonts are used (r268126)


  • Changed arguments.callee to become ThrowTypeError if the function has a complex-parameter-list (spec-term) (r268323)
  • Changed BigInt constructor to be constructible while it always throws an error (r268322)
  • Fixed Array.prototype.sort‘s sortBucketSort which accessed an array in an invalid way leading to incorrect results with indexed properties on the prototype chain (r268375)
  • Improved the essential internal methods for %TypedArray% to adhere to spec (r268640)


  • Removed the alg field from the attestation statement (r268602)


  • Fixed AirPlay menu not showing up when the AirPlay button is clicked (r268308)
  • Improved computation of default audio input and output devices (r268396)


  • Allowed passive mouse wheel event listeners to not force synchronous scrolling (r268476)
  • Implemented Blob.stream (r268228)
  • Updated FileReader.result to return null if it isn’t done yet (r268232)
  • Improved xhr.response conformance to the specification (r267959)

URL Parsing

  • Aligned URL setters to reasonable behaviors of other browsers (r268050)
  • Changed to parse “#” as a delimiter for fragment identifier in data URIs (r267995)
  • Changed to fail parsing URLs with hosts containing invalid punycode encodings (r267965)
  • Fixed UTF-8 encoding in URL parsing (r267963)

Storage Access API

  • Enabled per-page storage access scope (r267973)


  • Fixed accessibility on Presidential Executive Order pages (r268117, r268206)


  • Fixed WebDriver Input clear/value commands when the target is inside a Shadow DOM (r267978)

November 19, 2020 10:17 PM

November 16, 2020

New WebKit Features in Safari 14

Surfin’ Safari

With the release of Safari 14 for macOS Big Sur, iPadOS 14, iOS 14, and watchOS 7, WebKit brings significant improvements to performance and privacy along with a host of new features for web developers.

Take a look at all of the improvements WebKit is adding with the release of Safari 14.

Safari Web Extensions

This release brings support for Safari Web Extensions. They are a type of extension primarily built with JavaScript, HTML, and CSS packaged with native apps. This allows extension developers to maintain a single codebase that can be packaged for other browsers.

It also means developers with extensions for other browsers can easily bring their projects to Safari with a command-line tool. It jump-starts your development by converting your web extension into an Xcode project, ready to build and test. After testing, you can submit it to the App Store.

You can learn more about Safari’s web extension support by watching the “Meet Safari Web Extensions” session from WWDC 2020.

Webpage Translation

WebKit with Safari 14 on macOS Big Sur, iOS 14, and iPad OS 14 allows users to translate webpages between English, Spanish, Simplified Chinese, French, German, Russian, and Brazilian Portuguese. Safari automatically detects the language of webpages and offers translation based on the user’s Preferred Languages list.

Content authors can instruct Safari on the specific elements that should or should not be translated. Enable translation of element contents with an empty translate attribute or translate="yes", or disable with translate="no". It’s best to mark specific elements and avoid using the attribute on a single container for the entire document.

Performance Improvements

One area of focus in WebKit was on performance. Significant performance gains improve page load performance and page performance for developers. Loading a previously unvisited page is 13% faster, and loading recently visited pages is 42-52% faster. Tab closing performance improved from 3.5 seconds to 50 milliseconds. WebKit also added support for incrementally loading PDF files and now renders the first page up to 60× faster.

For web developers, WebKit improved asynchronous scrolling for iframes and overflow: scroll containers on macOS. Faster IndexedDB operations, for-of loops, JavaScript Promises, JavaScript cookie access, and JavaScript delete operations improve page performance for web developers and users.

WebKit and Safari can now use platform support for HTTP/3 for improved network efficiency and faster load times. HTTP/3 makes use of multiplexed connections over UDP to reduce congestion and transport latency. It all adds up to better perceived performance for your web apps.

For more details, see the “What’s new for web developers” session from WWDC 2020.

Improved Compatibility

Another area of focus was improving WebKit’s interoperability. One measure of that is passing Web Platform Tests. It’s a set of tests used by browser developers to ensure implementations are cross-browser compatible helping developers have more interoperable code. In these releases, WebKit improved the pass rates for over 140,000 tests across Service Workers, SVG, CSS, XHR+Fetch, and more.

Learn more by watching the “What’s new for web developers” session from WWDC 2020.

Privacy Updates

With each release, WebKit refines its privacy protections for users. This year WebKit enabled full third-party cookie blocking and added support for the Storage Access API in Private Browsing mode in Safari. In addition, Safari added a Privacy Report that shows users the trackers that Intelligent Tracking Prevention prevented from accessing identifying information.

Learn more about WebKit’s privacy enhancements in the “CNAME Cloaking and Bounce Tracking Defense” and “Full Third-Party Cookie Blocking and More” blog posts.

Touch ID and Face ID for the Web

Web developers can now support logging into websites with Face ID and Touch ID. New platform authenticator support in WebKit’s Web Authentication implementation provides a highly secure alternative to usernames and passwords. Support for WebAuthn was introduced in Safari 13 on macOS and iOS 13.3 with support for hardware security keys. New in this release is added support for PIN entry and account selection on external Web Authentication security keys.

For more, read the “Meet Face ID and Touch ID for the Web” blog post.

WebP Support

Improvements for media in WebKit include support for a new image format and new video playback capabilities. This release of WebKit in Safari 14 adds support for the WebP open-source image format. It offers content authors smaller file sizes for lossy and lossless formats with advanced features like alpha-channel transparency and animations.

Learn more about WebP support from the “What’s new for web developers” talk from WWDC 2020.

Reserving Layout Space for Images

Another image-related improvement eliminates layout shifting. It comes from a change to how WebKit derives the aspect ratio of an image. Web authors can simply add width and height attributes to an <img> element with a numeric value to tell WebKit the proportions of an image to reserve when calculating image size from CSS. It’s a simple change that significantly improves the user experience.

To see this in action watch the “What’s new for web developers” session from WWDC 2020.

New CSS Features

Safari 14 supports the image-orientation property in CSS to override WebKit’s default behavior of rotating based on image EXIF data. The default image-orientation: from-image can be set to image-orientation: none to override the behavior and ignore the EXIF orientation flag.

New support for the :is() pseudo-selector works as a synonym for the previously supported :matches(). It can be used to match a list of selectors with the specificity of the most specific selector.

It can be used to avoid repetitive selectors. Compare the following:

/* Removing margins from any subsequent headings */
h1, h2, h3, h4, h5, h6 {
    margin-top: 3em;

h1 + h2, h1 + h3, h1 + h4, h1 + h5, h1 + h6,
h2 + h3, h2 + h3, h2 + h4, h2 + h5, h2 + h6,
h3 + h4, h3 + h3, h3 + h4, h3 + h5, h3 + h6,
h4 + h5, h4 + h3, h4 + h4, h4 + h5, h4 + h6,
h5 + h6, h5 + h3, h5 + h4, h5 + h5, h5 + h6 {
    margin-top: 0;

The override could be written with the :is() pseudo-selector like this instead:

:is(h1, h2, h3, h4, h5, h6) + :is(h1, h2, h3, h4, h5, h6) {
    margin-top: 0;

The :where() pseudo-selector is also supported and works like :is() except it resets the specificity back to 0 making it easy to override complex matches.

Other notable CSS additions include support for line-break: anywhere to break long content before it overflows the container, and image-set() support for all other image functions including image(), -webkit-canvas(), -webkit-cross-fade(), and -webkit-*-gradient().

Learn more about these CSS features by watching the “What’s new for web developers” from WWDC 2020.

Media Enhancements

For video, Safari on iOS 14 adds support for the Picture-in-Picture API for iPhone. On macOS, new support for high-dynamic range (HDR) video playback is added. Content authors can use media-queries or the matchMedia method in JavaScript to detect high-dynamic range display capability and deliver a progressively enhanced experience for users with HDR displays.

    @media only screen (dynamic-range: high) {
        /* HDR-only CSS rules */

    if (window.matchMedia("dynamic-range: high")) {
        // HDR-specific JavaScript

You can learn more about these media enhancements by watching the “What’s new for web developers” from WWDC 2020.

JavaScript Improvements

Beyond performance improvements, WebKit added several new capabilities to its JavaScript engine. This release includes support for BigInt, a new datatype for integers that are larger than the MAX_SAFE_INTEGER.

let bigInt = BigInt(Number.MAX_SAFE_INTEGER) + 2n;

Three new types of logical assignment operators are available: AND, OR, and nullish. Using these operators only evaluates the left-hand side of an expression once and can be used non-destructively when assigning values.

let foo = null;

foo ??= 1; // nullish assignment operator
> 1

foo &&= 2; // AND assignment operator
> 2

foo ||= 3; // OR assignment operator
> 2

foo ??= 4; // nullish assignment operator
> 2

WebKit also introduces support for the optional chaining operator that gives you a shortcut for safely accessing object properties.

function optionalChaining(object) {
    return object?.foo;

function optionalChainingTranspiled(object) {
    if (object !== null && object !== undefined)
        return object.foo;
    return undefined;

There’s also added support of the EventTarget constructor which means developers can create custom instances of EventTarget of their own design without the overhead of repurposing a DOM element, giving non-DOM objects an interface for dispatching custom events.

You can learn more about JavaScript improvements by watching the “What’s new for web developers” from WWDC 2020.

Web Inspector Updates

Web Inspector in Safari 14 on macOS added the Source Tab combing the Resources Tab and Debugger Tab together. It lists all resources loaded by the inspected page since Web Inspector opened, along with XHR+Fetch resources and long-lived WebSockets. Web Inspector’s JavaScript debugging tools are here too, with all of the stepping and the breakpoint controls, organized in a more compact and unified way alongside the resources of the inspected page. The Sources Tab also offers new capabilities such as organizing by file path instead of file type, Local Overrides for completely replacing the content and headers of responses loaded over the network, and the Inspector Bootstrap Script to evaluate JavaScript before anything else in the page.

In the Timelines Tab is the new Media & Animations timeline to capture events related to media elements, CSS animations and CSS transitions. It makes it easy to correlate activity captured in other timelines to state changes in media elements, such as pausing or resuming playback, or CSS animations or transitions, such as when they’re created and each time they iterate.

Among the enhancements Web Inspector offers improved VoiceOver support and a new HSL color picker with Display-P3 color support.

You can learn more watching the “What’s new in Web Inspector” video session from WWDC 2020 or referring to the Web Inspector Reference documentation.


These improvements are available to users running watchOS 7, iOS 14 and iPadOS 14, macOS Big Sur, macOS Catalina and macOS Mojave. These features were also available to web developers with Safari Technology Preview releases. Changes in this release of Safari were included in the following Safari Technology Preview releases: 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109. Download the latest Safari Technology Preview release to stay on the forefront of future web platform and Web Inspector features. You can also use the WebKit Feature Status page to watch for changes to your favorite web platform features.

Send a tweet to @webkit or @jonathandavis to share your thoughts on this release. If you run into any issues, we welcome your bug reports for Safari, or WebKit bugs for web content issues.

November 16, 2020 05:00 PM

November 12, 2020

CNAME Cloaking and Bounce Tracking Defense

Surfin’ Safari

This blog post covers several enhancements to Intelligent Tracking Prevention (ITP) in Safari 14 on macOS Big Sur, Catalina, and Mojave, iOS 14, and iPadOS 14 to address our latest discoveries in the industry around tracking.

CNAME Cloaking Defense

ITP now caps the expiry of cookies set in so-called third-party CNAME-cloaked HTTP responses to 7 days. On macOS, this enhancement is specific to Big Sur.

What Is CNAME Cloaking?

In the eyes of web browsers, the first party of a website is typically defined by its registrable domain. This means that www.blog.example and comments.blog.example are considered same-site and the same party. If the user loads a webpage from www.blog.example, and that page makes a subresource request to comments.blog.example, that request will carry all cookies that are set to cover the blog.example site, including login cookies and user identity cookies. In addition, the response to that comments.blog.example subresource request can set cookies for blog.example, and those cookies will be first-party cookies.

Enter CNAMEs. CNAME stands for canonical name record and maps one domain name to another as part of the Domain Name System, or DNS. This means a site owner can configure one of their subdomains, such as sub.blog.example, to resolve to thirdParty.example, before resolving to an IP address. This happens underneath the web layer and is called CNAME cloaking — the thirdParty.example domain is cloaked as sub.blog.example and thus has the same powers as the true first party.

CNAME Cloaking and Tracking

Cross-site trackers have convinced site owners to set up CNAME cloaking in order to circumvent tracking prevention, such as ITP’s 7-day expiry cap on cookies set in JavaScript. In our blog case, this would be making track.blog.example resolve to tracker.example.

A recent paper from researchers at the Graduate University for Advanced Studies (Sokendai) and the French National Cybersecurity Agency (ANSSI) found 1,762 websites CNAME cloaking 56 trackers in total.

CNAME Cloaking and Website Security

Site owners who set up CNAME cloaking risk full website takeovers or customer cookie hijacking if the CNAME records aren’t properly managed, for instance if CNAME cloaking isn’t decommissioned when no longer in use. It was recently reported that 250 websites of banks, healthcare companies, restaurant chains, and civil rights groups had been compromised through mismanaged CNAME cloaking. In June this year, Microsoft documented these attacks and how their cloud customers should prevent them.

ITP’s Defense Against CNAME Cloaking Tracking

ITP now detects third-party CNAME cloaking requests and caps the expiry of any cookies set in the HTTP response to 7 days. This cap is aligned with ITP’s expiry cap on all cookies created through JavaScript.

Third-party CNAME cloaking is defined as a first-party subresource that resolves through a CNAME that differs from the first-party domain and differs from the top frame host’s CNAME, if one exists. Yes, the whole site can be CNAME cloaked, when it uses so called edge servers.

The best way to explain this is through a table (1p means first-party, 3p means third-party):

1p host, e.g. www.blog.example 1p subdomain other than the 1p host, e.g. track.blog.example Capped cookie expiry?
No cloaking No cloaking No cap
No cloaking other.blog.example (1p cloaking) No cap
No cloaking tracker.example (3p cloaking) 7-day cap
abc123.edge.example (cloaking) No cloaking No cap
abc123.edge.example (cloaking) abc123.edge.example (matching cloaking) No cap
abc123.edge.example (cloaking) other.blog.example (1p cloaking) No cap
abc123.edge.example (cloaking) tracker.example (3p cloaking) 7-day cap

SameSite=Strict Cookie Jail for Bounce Trackers

In June 2018, we announced an update to ITP to detect and defend against first party bounce trackers. In March 2020, we announced an enhancement to also detect delayed bounce tracking. Since then, we have received a report of one specific website engaged in bounce tracking while also being likely to get frequent user interaction. To combat such issues, we proposed to the W3C Privacy Community Group what we call a SameSite=Strict jail as well as other escalations.

What the SameSite=strict jail does is detect bounce tracking and, at a certain threshold, rewrite all the tracking domain’s cookies to SameSite=strict. This means that they will not be sent in cross-site, first-party navigations, and they can no longer be used for simple redirect-based bounce tracking.

Our implementation is rather relaxed, with the threshold set to 10 unique navigational, first-party redirects (unique in the sense of going to unique domains), and an automatic reset of that counter once the cookies are rewritten to SameSite=strict. This automatically gives the domain a new chance so that they can disengage in bounce tracking and “get out of jail.”

Our current list of domains we subject to this protection is empty because the domain reported to us has stopped their bounce tracking. But this protection remains in our toolbox.

Partitioned Ephemeral IndexedDB

Up until now, WebKit has blocked cross-origin IndexedDB. WebKit now allows partitioned and ephemeral third-party IndexedDB in an effort to align with other browsers now that they are interested in storage partitioning too. You can partake in the ongoing standardization effort for storage partitioning on GitHub.

Partitioned means unique IndexedDB instance per first-party site and ephemeral means in-memory-only, i.e. goes away on browser quit.

Third-Party Cookie Blocking and Storage Access API In Private Browsing

Private Browsing in Safari is based on WebKit’s ephemeral sessions where nothing is persisted to disk. This means ITP would not be able to learn things between launches of Safari. Further, Private Browsing also uses a separate ephemeral session for each new tab the user opens. To uphold this separation between tabs, ITP wouldn’t be able to classify cross-site trackers from the user’s full browsing even in-memory.

However, full third-party cookie blocking doesn’t need classification and is now enabled by default in Private Browsing. This might seem simple to support but the challenge was to make the Storage Access API work with the aforementioned tab separation. This is how it works: Say identityProvider.example wants to request storage access as third-party on the login page for social.example in Tab A. Interacting with identityProvider.example as a first party website in Tab B will not suffice to allow it to request storage access in Tab A since that would leak state between the separate ephemeral sessions. Thus, the user must interact with identityProvider.example in the same tab as where identityProvider.example later requests storage access as third-party. This makes sure that login flows where two different parties are involved and third-party cookie access is required, is possible in Private Browsing mode.

Home Screen Web Application Domain Exempt From ITP

Back in March 2020, when we announced ITP’s 7-day cap on all script-writeable storage, developers asked about home screen web applications and whether they were exempt from this 7-day cap. We explained how ITP’s counter of “days of use” and capture of user interaction effectively made sure that the first party of home screen web applications would not be subjected to the new 7-day cap. To make this more clear, we have implemented an explicit exception for the first-party domain of home screen web applications to make sure ITP always skips that domain in its website data removal algorithm.

In addition, the website data of home screen web applications is kept isolated from Safari and thus will not be affected by ITP’s classification of tracking behavior in Safari.

Thanks To My Coworkers

The above updates to WebKit and ITP would not have been possible without the help from Kate, Jiten, Scott, Tommy, Sihui, and David. Thank you!

November 12, 2020 06:30 PM

October 29, 2020

Claudio Saavedra: Thu 2020/Oct/29

Igalia WebKit

In this line of work, we all stumble at least once upon a problem that turns out to be extremely elusive and very tricky to narrow down and solve. If we&aposre lucky, we might have everything at our disposal to diagnose the problem but sometimes that&aposs not the case – and in embedded development it&aposs often not the case. Add to the mix proprietary drivers, lack of debugging symbols, a bug that&aposs very hard to reproduce under a controlled environment, and weeks in partial confinement due to a pandemic and what you have is better described as a very long lucid nightmare. Thankfully, even the worst of nightmares end when morning comes, even if sometimes morning might be several days away. And when the fix to the problem is in an inimaginable place, the story is definitely one worth telling.

The problem

It all started with one of Igalia&aposs customers deploying a WPE WebKit-based browser in their embedded devices. Their CI infrastructure had detected a problem caused when the browser was tasked with creating a new webview (in layman terms, you can imagine that to be the same as opening a new tab in your browser). Occasionally, this view would never load, causing ongoing tests to fail. For some reason, the test failure had a reproducibility of ~75% in the CI environment, but during manual testing it would occur with less than a 1% of probability. For reasons that are beyond the scope of this post, the CI infrastructure was not reachable in a way that would allow to have access to running processes in order to diagnose the problem more easily. So with only logs at hand and less than a 1/100 chances of reproducing the bug myself, I set to debug this problem locally.


The first that became evident was that, whenever this bug would occur, the WebKit feature known as web extension (an application-specific loadable module that is used to allow the program to have access to the internals of a web page, as well to enable customizable communication with the process where the page contents are loaded – the web process) wouldn&apost work. The browser would be forever waiting that the web extension loads, and since that wouldn&apost happen, the expected page wouldn&apost load. The first place to look into then is the web process and to try to understand what is preventing the web extension from loading. Enter here, our good friend GDB, with less than spectacular results thanks to stripped libraries.

#0  0x7500ab9c in poll () from target:/lib/libc.so.6
#1  0x73c08c0c in ?? () from target:/usr/lib/libEGL.so.1
#2  0x73c08d2c in ?? () from target:/usr/lib/libEGL.so.1
#3  0x73c08e0c in ?? () from target:/usr/lib/libEGL.so.1
#4  0x73bold6a8 in ?? () from target:/usr/lib/libEGL.so.1
#5  0x75f84208 in ?? () from target:/usr/lib/libWPEWebKit-1.0.so.2
#6  0x75fa0b7e in ?? () from target:/usr/lib/libWPEWebKit-1.0.so.2
#7  0x7561eda2 in ?? () from target:/usr/lib/libWPEWebKit-1.0.so.2
#8  0x755a176a in ?? () from target:/usr/lib/libWPEWebKit-1.0.so.2
#9  0x753cd842 in ?? () from target:/usr/lib/libWPEWebKit-1.0.so.2
#10 0x75451660 in ?? () from target:/usr/lib/libWPEWebKit-1.0.so.2
#11 0x75452882 in ?? () from target:/usr/lib/libWPEWebKit-1.0.so.2
#12 0x75452fa8 in ?? () from target:/usr/lib/libWPEWebKit-1.0.so.2
#13 0x76b1de62 in ?? () from target:/usr/lib/libWPEWebKit-1.0.so.2
#14 0x76b5a970 in ?? () from target:/usr/lib/libWPEWebKit-1.0.so.2
#15 0x74bee44c in g_main_context_dispatch () from target:/usr/lib/libglib-2.0.so.0
#16 0x74bee808 in ?? () from target:/usr/lib/libglib-2.0.so.0
#17 0x74beeba8 in g_main_loop_run () from target:/usr/lib/libglib-2.0.so.0
#18 0x76b5b11c in ?? () from target:/usr/lib/libWPEWebKit-1.0.so.2
#19 0x75622338 in ?? () from target:/usr/lib/libWPEWebKit-1.0.so.2
#20 0x74f59b58 in __libc_start_main () from target:/lib/libc.so.6
#21 0x0045d8d0 in _start ()

From all threads in the web process, after much tinkering around it slowly became clear that one of the places to look into is that poll() call. I will spare you the details related to what other threads were doing, suffice to say that whenever the browser would hit the bug, there was a similar stacktrace in one thread, going through libEGL to a call to poll() on top of the stack, that would never return. Unfortunately, a stripped EGL driver coming from a proprietary graphics vendor was a bit of a showstopper, as it was the inability to have proper debugging symbols running inside the device (did you know that a non-stripped WebKit library binary with debugging symbols can easily get GDB and your device out of memory?). The best one could do to improve that was to use the gcore feature in GDB, and extract a core from the device for post-mortem analysis. But for some reason, such a stacktrace wouldn&apost give anything interesting below the poll() call to understand what&aposs being polled here. Did I say this was tricky?

What polls?

Because WebKit is a multiprocess web engine, having system calls that signal, read, and write in sockets communicating with other processes is an everyday thing. Not knowing what a poll() call is doing and who is it that it&aposs trying to listen to, not very good. Because the call is happening under the EGL library, one can presume that it&aposs graphics related, but there are still different possibilities, so trying to find out what is this polling is a good idea.

A trick I learned while debugging this is that, in absence of debugging symbols that would give a straightforward look into variables and parameters, one can examine the CPU registers and try to figure out from them what the parameters to function calls are. Let&aposs do that with poll(). First, its signature.

int poll(struct pollfd *fds, nfds_t nfds, int timeout);

Now, let's examine the registers.

(gdb) f 0
#0  0x7500ab9c in poll () from target:/lib/libc.so.6
(gdb) info registers
r0             0x7ea55e58	2124766808
r1             0x1	1
r2             0x64	100
r3             0x0	0
r4             0x0	0

Registers r0, r1, and r2 contain poll()&aposs three parameters. Because r1 is 1, we know that there is only one file descriptor being polled. fds is a pointer to an array with one element then. Where is that first element? Well, right there, in the memory pointed to directly by r0. What does struct pollfd look like?

struct pollfd {
  int   fd;         /* file descriptor */
  short events;     /* requested events */
  short revents;    /* returned events */

What we are interested in here is the contents of fd, the file descriptor that is being polled. Memory alignment is again in our side, we don&apost need any pointer arithmetic here. We can inspect directly the register r0 and find out what the value of fd is.

(gdb) print *0x7ea55e58
$3 = 8

So we now know that the EGL library is polling the file descriptor with an identifier of 8. But where is this file descriptor coming from? What is on the other end? The /proc file system can be helpful here.

# pidof WPEWebProcess
1944 1196
# ls -lh /proc/1944/fd/8
lrwx------    1 x x      64 Oct 22 13:59 /proc/1944/fd/8 -> socket:[32166]

So we have a socket. What else can we find out about it? Turns out, not much without the unix_diag kernel module, which was not available in our device. But we are slowly getting closer. Time to call another good friend.

Where GDB fails, printf() triumphs

Something I have learned from many years working with a project as large as WebKit, is that debugging symbols can be very difficult to work with. To begin with, it takes ages to build WebKit with them. When cross-compiling, it&aposs even worse. And then, very often the target device doesn&apost even have enough memory to load the symbols when debugging. So they can be pretty useless. It&aposs then when just using fprintf() and logging useful information can simplify things. Since we know that it&aposs at some point during initialization of the web process that we end up stuck, and we also know that we&aposre polling a file descriptor, let&aposs find some early calls in the code of the web process and add some fprintf() calls with a bit of information, specially in those that might have something to do with EGL. What can we find out now?

Oct 19 10:13:27.700335 WPEWebProcess[92]: Starting
Oct 19 10:13:27.720575 WPEWebProcess[92]: Initializing WebProcess platform.
Oct 19 10:13:27.727850 WPEWebProcess[92]: wpe_loader_init() done.
Oct 19 10:13:27.729054 WPEWebProcess[92]: Initializing PlatformDisplayLibWPE (hostFD: 8).
Oct 19 10:13:27.730166 WPEWebProcess[92]: egl backend created.
Oct 19 10:13:27.741556 WPEWebProcess[92]: got native display.
Oct 19 10:13:27.742565 WPEWebProcess[92]: initializeEGLDisplay() starting.

Two interesting findings from the fprintf()-powered logging here: first, it seems that file descriptor 8 is one known to libwpe (the general-purpose library that powers the WPE WebKit port). Second, that the last EGL API call right before the web process hangs on poll() is a call to eglInitialize(). fprintf(), thanks for your service.

Number 8

We now know that the file descriptor 8 is coming from WPE and is not internal to the EGL library. libwpe gets this file descriptor from the UI process, as one of the many creation parameters that are passed via IPC to the nascent process in order to initialize it. Turns out that this file descriptor in particular, the so-called host client file descriptor, is the one that the freedesktop backend of libWPE, from here onwards WPEBackend-fdo, creates when a new client is set to connect to its Wayland display. In a nutshell, in presence of a new client, a Wayland display is supposed to create a pair of connected sockets, create a new client on the Display-side, give it one of the file descriptors, and pass the other one to the client process. Because this will be useful later on, let&aposs see how is that currently implemented in WPEBackend-fdo.

    int pair[2];
    if (socketpair(AF_UNIX, SOCK_STREAM | SOCK_CLOEXEC, 0, pair)  0)
        return -1;

    int clientFd = dup(pair[1]);

    wl_client_create(m_display, pair[0]);

The file descriptor we are tracking down is the client file descriptor, clientFd. So we now know what&aposs going on in this socket: Wayland-specific communication. Let&aposs enable Wayland debugging next, by running all relevant process with WAYLAND_DEBUG=1. We&aposll get back to that code fragment later on.

A Heisenbug is a Heisenbug is a Heisenbug

Turns out that enabling Wayland debugging output for a few processes is enough to alter the state of the system in such a way that the bug does not happen at all when doing manual testing. Thankfully the CI&aposs reproducibility is much higher, so after waiting overnight for the CI to continuously run until it hit the bug, we have logs. What do the logs say?

WPEWebProcess[41]: initializeEGLDisplay() starting.
  -> wl_display@1.get_registry(new id wl_registry@2)
  -> wl_display@1.sync(new id wl_callback@3)

So the EGL library is trying to fetch the Wayland registry and it&aposs doing a wl_display_sync() call afterwards, which will block until the server responds. That&aposs where the blocking poll() call comes from. So, it turns out, the problem is not necessarily on this end of the Wayland socket, but perhaps on the other side, that is, in the so-called UI process (the main browser process). Why is the Wayland display not replying?

The loop

Something that is worth mentioning before we move on is how the WPEBackend-fdo Wayland display integrates with the system. This display is a nested display, with each web view a client, while it is itself a client of the system&aposs Wayland display. This can be a bit confusing if you&aposre not very familiar with how Wayland works, but fortunately there is good documentation about Wayland elsewhere.

The way that the Wayland display in the UI process of a WPEWebKit browser is integrated with the rest of the program, when it uses WPEBackend-fdo, is through the GLib main event loop. Wayland itself has an event loop implementation for servers, but for a GLib-powered application it can be useful to use GLib&aposs and integrate Wayland&aposs event processing with the different stages of the GLib main loop. That is precisely how WPEBackend-fdo is handling its clients&apos events. As discussed earlier, when a new client is created a pair of connected sockets are created and one end is given to Wayland to control communication with the client. GSourceFunc functions are used to integrate Wayland with the application main loop. In these functions, we make sure that whenever there are pending messages to be sent to clients, those are sent, and whenever any of the client sockets has pending data to be read, Wayland reads from them, and to dispatch the events that might be necessary in response to the incoming data. And here is where things start getting really strange, because after doing a bit of fprintf()-powered debugging inside the Wayland-GSourceFuncs functions, it became clear that the Wayland events from the clients were never dispatched, because the dispatch() GSourceFunc was not being called, as if there was nothing coming from any Wayland client. But how is that possible, if we already know that the web process client is actually trying to get the Wayland registry?

To move forward, one needs to understand how the GLib main loop works, in particular, with Unix file descriptor sources. A very brief summary of this is that, during an iteration of the main loop, GLib will poll file descriptors to see if there are any interesting events to be reported back to their respective sources, in which case the sources will decide whether to trigger the dispatch() phase. A simple source might decide in its dispatch() method to directly read or write from/to the file descriptor; a Wayland display source (as in our case), will call wl_event_loop_dispatch() to do this for us. However, if the source doesn&apost find any interesting events, or if the source decides that it doesn&apost want to handle them, the dispatch() invocation will not happen. More on the GLib main event loop in its API documentation.

So it seems that for some reason the dispatch() method is not being called. Does that mean that there are no interesting events to read from? Let&aposs find out.

System call tracing

Here we resort to another helpful tool, strace. With strace we can try to figure out what is happening when the main loop polls file descriptors. The strace output is huge (because it takes easily over a hundred attempts to reproduce this), but we know already some of the calls that involve file descriptors from the code we looked at above, when the client is created. So we can use those calls as a starting point in when searching through the several MBs of logs. Fast-forward to the relevant logs.

socketpair(AF_UNIX, SOCK_STREAM|SOCK_CLOEXEC, 0, [128, 130]) = 0
dup(130)               = 131
close(130)             = 0
fcntl64(128, F_DUPFD_CLOEXEC, 0) = 130
epoll_ctl(34, EPOLL_CTL_ADD, 130, {EPOLLIN, {u32=1639599928, u64=1639599928}}) = 0

What we see there is, first, WPEBackend-fdo creating a new socket pair (128, 130) and then, when file descriptor 130 is passed to wl_client_create() to create a new client, Wayland adds that file descriptor to its epoll() instance for monitoring clients, which is referred to by file descriptor 34. This way, whenever there are events in file descriptor 130, we will hear about them in file descriptor 34.

So what we would expect to see next is that, after the web process is spawned, when a Wayland client is created using the passed file descriptor and the EGL driver requests the Wayland registry from the display, there should be a POLLIN event coming in file descriptor 34 and, if the dispatch() call for the source was called, a epoll_wait() call on it, as that is what wl_event_loop_dispatch() would do when called from the source&aposs dispatch() method. But what do we have instead?

poll([{fd=30, events=POLLIN}, {fd=34, events=POLLIN}, {fd=59, events=POLLIN}, {fd=110, events=POLLIN}, {fd=114, events=POLLIN}, {fd=132, events=POLLIN}], 6, 0) = 1 ([{fd=34, revents=POLLIN}])
recvmsg(30, {msg_namelen=0}, MSG_DONTWAIT|MSG_CMSG_CLOEXEC) = -1 EAGAIN (Resource temporarily unavailable)

strace can be a bit cryptic, so let&aposs explain those two function calls. The first one is a poll in a series of file descriptors (including 30 and 34) for POLLIN events. The return value of that call tells us that there is a POLLIN event in file descriptor 34 (the Wayland display epoll() instance for clients). But unintuitively, the call right after is trying to read a message from socket 30 instead, which we know doesn&apost have any pending data at the moment, and consequently returns an error value with an errno of EAGAIN (Resource temporarily unavailable).

Why is the GLib main loop triggering a read from 30 instead of 34? And who is 30?

We can answer the latter question first. Breaking on a running UI process instance at the right time shows who is reading from the file descriptor 30:

#1  0x70ae1394 in wl_os_recvmsg_cloexec (sockfd=30, msg=msg@entry=0x700fea54, flags=flags@entry=64)
#2  0x70adf644 in wl_connection_read (connection=0x6f70b7e8)
#3  0x70ade70c in read_events (display=0x6f709c90)
#4  wl_display_read_events (display=0x6f709c90)
#5  0x70277d98 in pwl_source_check (source=0x6f71cb80)
#6  0x743f2140 in g_main_context_check (context=context@entry=0x2111978, max_priority=, fds=fds@entry=0x6165f718, n_fds=n_fds@entry=4)
#7  0x743f277c in g_main_context_iterate (context=0x2111978, block=block@entry=1, dispatch=dispatch@entry=1, self=)
#8  0x743f2ba8 in g_main_loop_run (loop=0x20ece40)
#9  0x00537b38 in ?? ()

So it&aposs also Wayland, but on a different level. This is the Wayland client source (remember that the browser is also a Wayland client?), which is installed by cog (a thin browser layer on top of WPE WebKit that makes writing browsers easier to do) to process, among others, input events coming from the parent Wayland display. Looking at the cog code, we can see that the wl_display_read_events() call happens only if GLib reports that there is a G_IO_IN (POLLIN) event in its file descriptor, but we already know that this is not the case, as per the strace output. So at this point we know that there are two things here that are not right:

  1. A FD source with a G_IO_IN condition is not being dispatched.
  2. A FD source without a G_IO_IN condition is being dispatched.

Someone here is not telling the truth, and as a result the main loop is dispatching the wrong sources.

The loop (part II)

It is at this point that it would be a good idea to look at what exactly the GLib main loop is doing internally in each of its stages and how it tracks the sources and file descriptors that are polled and that need to be processed. Fortunately, debugging symbols for GLib are very small, so debugging this step by step inside the device is rather easy.

Let&aposs look at how the main loop decides which sources to dispatch, since for some reason it&aposs dispatching the wrong ones. Dispatching happens in the g_main_dispatch() method. This method goes over a list of pending source dispatches and after a few checks and setting the stage, the dispatch method for the source gets called. How is a source set as having a pending dispatch? This happens in g_main_context_check(), where the main loop checks the results of the polling done in this iteration and runs the check() method for sources that are not ready yet so that they can decide whether they are ready to be dispatched or not. Breaking into the Wayland display source, I know that the check() method is called. How does this method decide to be dispatched or not?

    [](GSource* base) -> gboolean
        auto& source = *reinterpret_cast(base);
        return !!source.pfd.revents;

In this lambda function we&aposre returning TRUE or FALSE, depending on whether the revents field in the GPollFD structure have been filled during the polling stage of this iteration of the loop. A return value of TRUE indicates the main loop that we want our source to be dispatched. From the strace output, we know that there is a POLLIN (or G_IO_IN) condition, but we also know that the main loop is not dispatching it. So let&aposs look at what&aposs in this GPollFD structure.

For this, let&aposs go back to g_main_context_check() and inspect the array of GPollFD structures that it received when called. What do we find?

(gdb) print *fds
$35 = {fd = 30, events = 1, revents = 0}
(gdb) print *(fds+1)
$36 = {fd = 34, events = 1, revents = 1}

That&aposs the result of the poll() call! So far so good. Now the method is supposed to update the polling records it keeps and it uses when calling each of the sources check() functions. What do these records hold?

(gdb) print *pollrec->fd
$45 = {fd = 19, events = 1, revents = 0}
(gdb) print *(pollrec->next->fd)
$47 = {fd = 30, events = 25, revents = 1}
(gdb) print *(pollrec->next->next->fd)
$49 = {fd = 34, events = 25, revents = 0}

We&aposre not interested in the first record quite yet, but clearly there&aposs something odd here. The polling records are showing a different value in the revent fields for both 30 and 34. Are these records updated correctly? Let&aposs look at the algorithm that is doing this update, because it will be relevant later on.

  pollrec = context->poll_records;
  i = 0;
  while (pollrec && i  n_fds)
      while (pollrec && pollrec->fd->fd == fds[i].fd)
          if (pollrec->priority = max_priority)
              pollrec->fd->revents =
                fds[i].revents & (pollrec->fd->events | G_IO_ERR | G_IO_HUP | G_IO_NVAL);
          pollrec = pollrec->next;


In simple words, what this algorithm is doing is to traverse simultaneously the polling records and the GPollFD array, updating the polling records revents with the results of polling. From reading how the pollrec linked list is built internally, it&aposs possible to see that it&aposs purposely sorted by increasing file descriptor identifier value. So the first item in the list will have the record for the lowest file descriptor identifier, and so on. The GPollFD array is also built in this way, allowing for a nice optimization: if more than one polling record – that is, more than one polling source – needs to poll the same file descriptor, this can be done at once. This is why this otherwise O(n^2) nested loop can actually be reduced to linear time.

One thing stands out here though: the linked list is only advanced when we find a match. Does this mean that we always have a match between polling records and the file descriptors that have just been polled? To answer that question we need to check how is the array of GPollFD structures filled. This is done in g_main_context_query(), as we hinted before. I&aposll spare you the details, and just focus on what seems relevant here: when is a poll record not used to fill a GPollFD?

  n_poll = 0;
  lastpollrec = NULL;
  for (pollrec = context->poll_records; pollrec; pollrec = pollrec->next)
      if (pollrec->priority > max_priority)

Interesting! If a polling record belongs to a source whose priority is lower than the maximum priority that the current iteration is going to process, the polling record is skipped. Why is this?

In simple terms, this happens because each iteration of the main loop finds out the highest priority between the sources that are ready in the prepare() stage, before polling, and then only those file descriptor sources with at least such a a priority are polled. The idea behind this is to make sure that high-priority sources are processed first, and that no file descriptor sources with lower priority are polled in vain, as they shouldn&apost be dispatched in the current iteration.

GDB tells me that the maximum priority in this iteration is -60. From an earlier GDB output, we also know that there&aposs a source for a file descriptor 19 with a priority 0.

(gdb) print *pollrec
$44 = {fd = 0x7369c8, prev = 0x0, next = 0x6f701560, priority = 0}
(gdb) print *pollrec->fd
$45 = {fd = 19, events = 1, revents = 0}

Since 19 is lower than 30 and 34, we know that this record is before theirs in the linked list (and so it happens, it&aposs the first one in the list too). But we know that, because its priority is 0, it is too low to be added to the file descriptor array to be polled. Let&aposs look at the loop again.

  pollrec = context->poll_records;
  i = 0;
  while (pollrec && i  n_fds)
      while (pollrec && pollrec->fd->fd == fds[i].fd)
          if (pollrec->priority = max_priority)
              pollrec->fd->revents =
                fds[i].revents & (pollrec->fd->events | G_IO_ERR | G_IO_HUP | G_IO_NVAL);
          pollrec = pollrec->next;


The first polling record was skipped during the update of the GPollFD array, so the condition pollrec && pollrec->fd->fd == fds[i].fd is never going to be satisfied, because 19 is not in the array. The innermost while() is not entered, and as such the pollrec list pointer never moves forward to the next record. So no polling record is updated here, even if we have updated revent information from the polling results.

What happens next should be easy to see. The check() method for all polled sources are called with outdated revents. In the case of the source for file descriptor 30, we wrongly tell it there&aposs a G_IO_IN condition, so it asks the main loop to call dispatch it triggering a a wl_connection_read() call in a socket with no incoming data. For the source with file descriptor 34, we tell it that there&aposs no incoming data and its dispatch() method is not invoked, even when on the other side of the socket we have a client waiting for data to come and blocking in the meantime. This explains what we see in the strace output above. If the source with file descriptor 19 continues to be ready and with its priority unchanged, then this situation repeats in every further iteration of the main loop, leading to a hang in the web process that is forever waiting that the UI process reads its socket pipe.

The bug – explained

I have been using GLib for a very long time, and I have only fixed a couple of minor bugs in it over the years. Very few actually, which is why it was very difficult for me to come to accept that I had found a bug in one of the most reliable and complex parts of the library. Impostor syndrome is a thing and it really gets in the way.

But in a nutshell, the bug in the GLib main loop is that the very clever linear update of registers is missing something very important: it should skip to the first polling record matching before attempting to update its revents. Without this, in the presence of a file descriptor source with the lowest file descriptor identifier and also a lower priority than the cutting priority in the current main loop iteration, revents in the polling registers are not updated and therefore the wrong sources can be dispatched. The simplest patch to avoid this, would look as follows.

   i = 0;
   while (pollrec && i  n_fds)
+      while (pollrec && pollrec->fd->fd != fds[i].fd)
+        pollrec = pollrec->next;
       while (pollrec && pollrec->fd->fd == fds[i].fd)
           if (pollrec->priority = max_priority)

Once we find the first matching record, let&aposs update all consecutive records that also match and need an update, then let&aposs skip to the next record, rinse and repeat. With this two-line patch, the web process was finally unlocked, the EGL display initialized properly, the web extension and the web page were loaded, CI tests starting passing again, and this exhausted developer could finally put his mind to rest.

A complete patch, including improvements to the code comments around this fascinating part of GLib and also a minimal test case reproducing the bug have already been reviewed by the GLib maintainers and merged to both stable and development branches. I expect that at least some GLib sources will start being called in a different (but correct) order from now on, so keep an eye on your GLib sources. :-)

Standing on the shoulders of giants

At this point I should acknowledge that without the support from my colleagues in the WebKit team in Igalia, getting to the bottom of this problem would have probably been much harder and perhaps my sanity would have been at stake. I want to thank Adrián and &Zcaronan for their input on Wayland, debugging techniques, and for allowing me to bounce back and forth ideas and findings as I went deeper into this rabbit hole, helping me to step out of dead-ends, reminding me to use tools out of my everyday box, and ultimately, to be brave enough to doubt GLib&aposs correctness, something that much more often than not I take for granted.

Thanks also to Philip and Sebastian for their feedback and prompt code review!

October 29, 2020 01:10 PM

October 22, 2020

Release Notes for Safari Technology Preview 115

Surfin’ Safari

Safari Technology Preview Release 115 is now available for download for macOS Big Sur and macOS Catalina. If you already have Safari Technology Preview installed, you can update in the Software Update pane of System Preferences on macOS.

This release covers WebKit revisions 267325-267959.

Web Inspector

  • Sources Tab
    • Added a checkbox to the popover when configuring a local override to allow it to skip the network (r267723)

Web Audio

  • Enabled the modern unprefixed WebAudio API (r267488, r267504)
  • Changed AnalyserNode to downmix input audio to mono (r267346)
  • Changed AnalyserNode’s getByteFrequencyData() and getFloatFrequencyData() to only do FFT analysis once per render quantum (r267349)
  • Changed AudioBufferSourceNode to update grain parameters when the buffer is set after rendering has started (r267386)
  • Updated AudioParam.setValueCurveAtTime() to have an implicit call to setValueAtTime() at the end (r267435)
  • Updated AudioParams with automations to process timelines (r267432)
  • Fixed BiquadFilterNode’s lowpass and highpass filters (r267444)
  • Fixed Web Audio API outputting silence for 302 redirected resource (r267507, r267532)
  • Made AudioBufferSourceNode loop fixes (r267443)
  • Changed to properly handle AudioParam.setTargetAtTime() followed by a ramp (r267381)
  • Improved AudioBufferSourceNode resampling (r267453)

  • Added stubs for AudioWorklet (r267744)

  • Added basic infrastructure for AudioWorklet (r267859)
  • Added stubs for AudioWorkletProcessor and AudioWorkletGlobalScope (r267891)


  • Fixed BigInt to work with Map and Set (r267373)
  • Enabled Intl.DateTimeFormat dayPeriod (r267454)
  • Updated Intl rounding behavior to align with specifications update (r267500)
  • Updated functions to consistently enumerate length property before name property (r267364)
  • Updated Array.prototype.sort to be consistent with specifications (r267514)
  • Implemented item method proposal, note that this will be renamed to at later (r267814)


  • Performance.navigation and Performance.timing are incorrectly exposed to workers (r267333)
  • Update User Timing interfaces to User Timing Level 3 (r267402)
  • Fixed visibilitychange:hidden event to fire during page navigations (r267614)
  • Re-aligned HTMLElement with the HTML spec (r267893)


  • Added support for HTMLMediaElement.setSinkId (r267472)
  • Fixed webkitfullscreenchange to fire for Shadow DOM elements (r267724)


  • Added support for the individual transform properties translate, rotate, scale, including accelerated animation (r267887, r267937, r267958)
  • Changed to clear the override width to properly compute percent margins in CSS Grid (r267503)

  • Implemented the CSS math-style property (r267578)

  • Implemented row-gap and column-gap for flex layout (r267829)
  • Implemented list-style-type: <string> (r267940)
  • Fixed CSS Selector an-plus-b serialization (r267812)
  • CSS serialization expects comments between certain tokens (r267766)
  • Fixed CSS variable causing a background url() to get resolved with a different base (r267951)
  • Updated to repaint as needed when adding and removing highlights (r267863)


  • Changed to not set the UV option if the authenticator doesn’t support it (r267369)

Selection API

  • Fixed selectAllChildren to return InvalidNodeTypeError when passed a DocumentType node (r267327)
  • Improved VisibleSelection, FrameSelection, and DOMSelection to preserve anchor and focus (r267329)


  • Updated toRTCIceProtocol to handle ssltcp candidates (r267401)


  • Added support for accessing the ‘SameSite’ cookie attribute (r267919)
  • Fixed several issues when switching to new browser context (r267918)

October 22, 2020 09:00 PM

October 19, 2020

Meet Face ID and Touch ID for the Web

Surfin’ Safari

People often see passwords are the original sin of authentication on the web. Passwords can be easy to guess and vulnerable to breaches. Frequent reuse of the same password across the web makes breaches even more profitable. As passwords are made stronger and unique, they can quickly become unusable for many users. Passwords indeed look notorious, but are passwords themselves the problem, or is it their use as a sole factor for authentication?

Many believe the latter, and thus multi-factor authentication has become more and more popular. The introduction of a second factor does fix most of the security issues with passwords, but it inevitably makes the whole authentication experience cumbersome with an additional step. Therefore, multi-factor authentication has not become the de facto authentication mechanism on the web. Face ID and Touch ID for the web provides both the security guarantees of multi-factor authentication and ease of use. It offers multi-factor authentication in a single step. Using this technology, available on over a billion capable Apple devices, web developers can now broadly offer traditional multi-factor authentication with a smooth, convenient experience. And being built on top of the Web Authentication API makes Face ID and Touch ID phishing resistant as well.

This blog post extends the content of WWDC 2020 “Meet Face ID and Touch ID for the web” session by providing detailed examples to assist developers’ adoption of this new technology, including how to manage different user agent user interfaces, how to propagate user gestures from user-activated events to WebAuthn API calls, and how to interpret Apple Anonymous Attestation. This article will end by summarizing the unique characteristics of Apple’s platform authenticator and the current status of security key support. If you haven’t heard about WebAuthn before, you’re strongly encouraged to first watch the WWDC 2020 session, which covers the basic concepts. Otherwise, please enjoy.

Managing User Experiences

Although user agents are not required to offer UI guidance to users during WebAuthn flows, the reality is that all of them do. This allows user agents to share some of the burden from websites to manage the user experience, but it creates another complexity for websites as each user agent has a different way of presenting the WebAuthn ceremony in its UI. A WebAuthn ceremony could either be the authentication process or the registration process. This section presents how WebAuthn ceremony options map to WebKit/Safari’s UI and the recommended user experience for Face ID and Touch ID for the web.

One challenge is to manage different user experiences among the platform authenticator and security keys. Although the WebAuthn API allows presenting both options to the user simultaneously, it’s not the best approach. First, most users are probably only familiar with the branding of the platform authenticator, i.e., Face ID and Touch ID on Apple’s platforms, but are unfamiliar with security keys. Offering both at the same time can confuse users and make it difficult for them to decide what to do. Secondly, the platform authenticator has different behaviors and use cases from security keys. For example, Face ID and Touch ID are suitable for use as a more convenient, alternative mechanism to sign in when most security keys are not. And credentials stored in security keys can often be used across different devices and platforms while those stored in the platform authenticator are typically tied to a platform and a device. Therefore, it is better to present these two options to the user separately.

Presenting Face ID and Touch ID Alone

What follows is the recommended way to invoke Face ID and Touch ID for the web. Below is the corresponding Safari UI for registration ceremonies. Here, the Relying Party ID is picked to be displayed in the dialog.

Here is the corresponding code snippet to show the above dialog.

const options = {
    publicKey: {
        rp: { name: "example.com" },
        user: {
            name: "john.appleseed@example.com",
            id: userIdBuffer,
            displayName: "John Appleseed"
        pubKeyCredParams: [ { type: "public-key", alg: -7 } ],
        challenge: challengeBuffer,
        authenticatorSelection: { authenticatorAttachment: "platform" }

const publicKeyCredential = await navigator.credentials.create(options);

The essential option is to specify authenticatorSelection: { authenticatorAttachment: "platform" } , which tells WebKit to only invoke the platform authenticator. After the publicKeyCredential is returned, one of the best practices is to store the Credential ID in a server-set, secure, httpOnly cookie, and mark its transport as "internal". This cookie can then be used to improve the user experience of future authentication ceremonies.

To protect users from tracking, the WebAuthn API doesn’t allow websites to query the existence of credentials on a device. This important privacy feature, however, requires some extra effort for websites to store provisioned Credential IDs in a separate source and query it before the authentication ceremony. The separate source is often on the backend server. This practice works well for security keys given that they can be used across platforms. Unfortunately, it does not work for the platform authenticator as credentials can only be used on the device where they were created. A server-side source cannot tell whether or not a particular platform authenticator indeed preserves a credential. Hence, a cookie is especially useful. This cookie should not be set through the document.cookie API since Safari’s Intelligent Tracking Prevention caps the expiry of such cookies to seven days. It’s also important to mark those credentials as "internal" such that websites could supply it in the authentication ceremony options to prevent WebKit from asking users for security keys at the same time.

Below are two different UIs for authentication ceremonies. The first one is streamlined for the case where the user agent only has a single credential, while the second one shows how the user agent allows the user to select one of many credentials. For both cases, only user.name submitted in the registration ceremony is selected to display. For the second case, the order of the list is sorted according to the last used date of the credential. WebKit keeps track of the last used date. Websites thus do not need to worry about it.

Here is the corresponding code snippet to show the above dialogs.

const options = {
    publicKey: {
        challenge: challengeBuffer,
        allowCredentials: [
            { type: "public-key", id: credentialIdBuffer1, transports: ["internal"] },
            // ... more Credential IDs can be supplied.

const publicKeyCredential = await navigator.credentials.get(options);

To be noted, even though an improvement over WebKit can be made such that transports: ["internal"] is not necessary to prevent WebKit from asking users for security keys as long as all allowed credentials are found within the platform authenticator, it is for the happy path only. In the case where no credentials are found, this extra property can tell WebKit to show an error message instead of asking the user for security keys.

Presenting Face ID and Touch ID along with Security Keys

Despite the fact that the following usage is discouraged, WebKit/Safari has prepared dedicated UI to allow the user to select a security key in addition to the platform authenticator. Below is the one for registration ceremonies.

The above dialog can be obtained by deleting authenticatorSelection: { authenticatorAttachment: "platform" } from the registration ceremony code snippet above.

The above dialog will be shown if any entry in the allowCredentials array from the authentication ceremony code snippet above doesn’t have the transports: ["internal"] property.

To be noted, security keys can be used immediately in both cases after the UI is shown. “Use Security Key” and “Account from Security Key” options are there to show instructions of how to interact with security keys.

Specifying allowCredentials or not

allowCredentials is optional for authentication ceremonies. However, omitting it will result in undetermined behavior in WebKit/Safari’s UI. If credentials are found, the authentication ceremony UI above will be shown. If no credentials are found, WebKit will ask the user for their security keys. Therefore, it is highly recommended not to omit this option.

Propagating User Gestures

Unsolicited permission prompts are annoying. Mozilla has conducted surveys [1, 2] that verify this. Even though WebAuthn prompts are not as often seen on the web as notification prompts today, this situation will change with the release of Face ID and Touch ID for the web.

Websites don’t ask for notification permission for fun. They ask because notifications can bring users back to their sites and increase their daily active users metric. A similar financial incentive could be found with WebAuthn prompts especially when platform authenticators are available as a fulfilled authentication request results in a high fidelity, persistent unique identifier of the user. This is a universal truth about authentication and that is why many sites ask for it before users even interact with the site. Though it is inevitable that WebAuthn credential will be leveraged to serve targeted ads to users, at least a similar protection that Mozilla did in Firefox for notification permission prompts can be utilized to make those WebAuthn prompts less annoying to users, which is to require user gestures for the WebAuthn API to eliminate annoying ‘on load’ prompts.

We foresaw this problem some time ago and filed an issue on the WebAuthn specification, but it didn’t get much traction back then. One reason is that it is a breaking change. Another reason is that the risk is not as high with security keys since they are not that popular and not always attached to the platform. The amount of unsolicited prompts has been surprisingly low. The situation is different with the release of Face ID and Touch ID for the web. So, Face ID and Touch ID for the web require user gestures to function. (User gestures are not required for security keys for backward compatibility.)

A user gesture is an indicator to signal WebKit that the execution of the current JavaScript context is a direct result of a user interaction, or more precisely from a handler for a user activated event, such as a touchend, click, doubleclick, or keydown event [3]. Requiring user gestures for the WebAuthn API means API calls must happen within the above JavaScript context. Normally, the user gesture will not be propagated to any async executors within the context. Since it is popular for websites to fetch a challenge asynchronously from a server right before invoking WebAuthn API, WebKit allows WebAuthn API to accept user gestures propagated through XHR events and the Fetch API. Here are examples of how websites can invoke Face ID and Touch ID for the web from user activated events.

Calling the API Directly from User Activated Events

// Fetching the challengeBuffer before the onclick event.

button.addEventListener("click", async () => {
    const options = {
        publicKey: {
            challenge: challengeBuffer,

    const publicKeyCredential = await navigator.credentials.create(options);

Propagating User Gestures Through XHR Events

button.addEventListener("click", () => {
    const xhr = new XMLHttpRequest();
    xhr.onreadystatechange = async function() {
        if (this.readyState == 4 && this.status == 200) {
            const challenge = this.responseText;
            const options = {
                publicKey: {
                    challenge: hexStringToUint8Array(challenge), // a custom helper

            const publicKeyCredential = await navigator.credentials.create(options);
    xhr.open("POST", "/WebKit/webauthn/challenge", true);
    xhr.setRequestHeader("Content-type", "application/x-www-form-urlencoded");

Propagating User Gestures Through Fetch API

button.addEventListener("click", async () => {
    const response = await fetch("/WebKit/webauthn/challenge", { method: "POST" });
    const challenge = await response.text();

    const options = {
        publicKey: {
            challenge: hexStringToUint8Array(challenge), // a custom helper
    const publicKeyCredential = await navigator.credentials.create(options);

To be noted, readable streams cannot propagate user gestures yet (related bug). Also, the user gesture will expire after 10 seconds for both XHR events and Fetch API.

Easter Egg: Propagating User Gestures Through setTimeout

button.addEventListener("click", () => {
    setTimeout(async () => {
        const options = { ... };
        const publicKeyCredential = await navigator.credentials.create(options);
    }, 500);

The user gesture in the above example will expire after 1 second.

On iOS 14, iPadOS 14 and macOS Big Sur Beta Seed 1, only the very first case is supported. Thanks to early feedback from developers, we were able to identify limitations and add the later cases. This also helped us recognize that user gestures are not a well understood concept among web developers. Therefore, we are going to contribute to the HTML specification and help establish a well established concept of a user gesture for consistency among browser vendors. Depending on how it goes, we might reconsider expanding the user gesture requirement to security keys.

Interpreting Apple Anonymous Attestation

Attestation is an optional feature which provides websites a cryptographic proof of the authenticator’s provenance such that websites that are restricted by special regulations can make a trust decision. Face ID and Touch ID for the web offers Apple Anonymous Attestation. Once verified, this attestation guarantees that an authentic Apple device performed the WebAuthn registration ceremony, but it does not guarantee the operating system running on that device is untampered. If the operating system is untampered, it also guarantees that the private key of the just generated credential is protected by the Secure Enclave and the usage of the private key is guarded with Face ID or Touch ID. (A note: the guard falls back to device passcode if biometric fails multiple times in a row.)

Apple Anonymous Attestation is first of its kind, providing a service like an Anonymization CA, where the authenticator works with a cloud operated CA owned by its manufacturer to dynamically generate per-credential attestation certificates such that no identification information of the authenticator will be revealed to websites in the attestation statement. Furthermore, among data relevant to the registration ceremony, only the public key of the credential along with a hash of the concatenated authenticator data and client data are sent to the CA for attestation, and the CA will not store any of these. This approach makes the whole attestation process privacy preserving. In addition, this approach avoids the security pitfall of Basic Attestation that the compromising of a single device results in revoking certificates from all devices with the same attestation certificate.

Enabling Apple Anonymous Attestation

const options = {
    publicKey: {
        attestation: "direct", // the essential option

const publicKeyCredential = await navigator.credentials.create(options);

Verifying the Statement Format

This is the definition of the Apple Anonymous Attestation statement format. Issue 1453 is tracking the progress of adding this statement format to the WebAuthn standard.

$$attStmtType //= (
                       fmt: "apple",
                       attStmt: appleStmtFormat

appleStmtFormat = {
                       x5c: [ credCert: bytes, * (caCert: bytes) ]

The semantics of the above fields are as follows:
credCert followed by its certificate chain, each encoded in X.509 format.
The credential public key certificate used for attestation, encoded in X.509 format.

Here is the verification procedure given inputs attStmt, authenticatorData and clientDataHash:

  1. Verify that attStmt is valid CBOR conforming to the syntax defined above and perform CBOR decoding on it to extract the contained fields.
  2. Concatenate authenticatorData and clientDataHash to form nonceToHash.
  3. Perform SHA-256 hash of nonceToHash to produce nonce.
  4. Verify nonce matches the value of the extension with OID ( 1.2.840.113635.100.8.2 ) in credCert. The nonce here is used to prove that the attestation is live and to protect the integrity of the authenticatorData and the client data.
  5. Verify credential public key matches the Subject Public Key of credCert.
  6. If successful, return implementation-specific values representing attestation type Anonymous CA and attestation trust path x5c.

The final step is to verify x5c is a valid certificate chain starting from the credCert to the Apple WebAuthn root certificate, which then proves the attestation. (This step is usually shared among different types of attestations that utilize x5c [4].) To be noted, the AAGUID is all zeros even if the attestation is enabled as all Apple devices that support Face ID and Touch ID for the web should have the same properties as explained at the beginning of this section and no other devices can request Apple Anonymous Attestation.

Unique Characteristics of Apple’s Platform Authenticator

Here is a summary about unique characteristics of Apple’s platform authenticator, i.e., Face ID and Touch ID for the web.

  • Different option set results in different UI, and therefore please specify it wisely.
  • Only RP ID and user.name are selected to display in the UI.
  • User gestures are required to invoke the platform authenticator.
  • Apple Anonymous Attestation is available. Use it only if attestation is necessary for you.
  • AAGUID is all zero even if attestation is used.
  • Face ID and Touch ID for the web is available in Safari, SFSafariViewController and ASWebAuthenticationSession on iOS 14, iPadOS 14 and macOS Big Sur. For macOS, Safari 14 with downlevel OS will not get this feature because the attestation relies on a new system framework.
  • All public key credentials generated by the platform authenticator are resident keys regardless of what option is specified.
  • Credentials can only be cleared for all via Safari > History > Clear History… on Mac Safari or Settings > Safari > Clear History and Website Data on iOS & iPadOS.
  • The signature counter is not implemented and therefore it is always zero. Secure Enclave is used to prevent the credential private key from leaking instead of a software safeguard.

Current Status of Security Key Support

Besides the introduction of Face ID and Touch ID for the web, iOS 14, iPadOS 14 and Safari 14 on all supported macOS also have improved security key support including PIN entry and account selection. Here is a list of features that are currently supported. All of them have been supported since iOS 13.3, iPadOS 13.3 and Safari 13 except the two aforementioned.

  • All MUST features in WebAuthn Level 1 and all optional features except CollectedClientData.tokenBinding and most of the extensions. Only the appid extension is supported.
  • All CTAP 2.0 authenticator API except setPin and changePin.
  • USB, Lightning, and NFC transports are supported on capable devices.
  • U2F security keys are supported via CTAP 2.0 but not CTAP 1/U2F JS.
  • Like Face ID and Touch ID for the web, security key support is available in Safari, SFSafariViewController and ASWebAuthenticationSession.


In this blog post, we introduced Face ID and Touch ID for the web. We believe it is a huge leap forward for authentication on the web. It serves as a great alternative way to sign in, especially for traditional multi-factor authentication mechanisms. With the assistance of this technology, we believe multi-factor authentication will replace sole-factor password as the de facto authentication mechanism on the web. Developers, please start testing this feature today and let us know how it works for you by sending feedback on Twitter (@webkit, @alanwaketan, @jonathandavis) or by filing a bug.

October 19, 2020 05:00 PM

October 08, 2020

Release Notes for Safari Technology Preview 114

Surfin’ Safari

Safari Technology Preview Release 114 is now available for download for macOS Big Sur and macOS Catalina. If you already have Safari Technology Preview installed, you can update in the Software Update pane of System Preferences on macOS.

This release covers WebKit revisions 265893-267325.

Web Inspector

  • Elements Tab
    • Changed to grey out properties that aren’t used or don’t apply (r266066)
    • Changed to hide non-inheritable properties when viewing inherited rules (r266069)
    • Changed to not show inline swatches for properties that aren’t used or don’t apply (r266070)
  • Sources Tab
    • Changed to allow event breakpoints to be configured (r266074, r266480)
    • Changed to evaluate breakpoint conditions before incrementing the ignore count (r266138)
    • Changed to allow DOM breakpoints to be configured (r266669)
    • Changed to allow special JavaScript breakpoints to be configured (r266534)
    • Changed to allow URL breakpoints to be configured (r266538)
  • Network Tab
    • Fixed WebSockets to be reported as type websocket (r266441)
    • Fixed issue where response content was not shown for 304 responses from XHR requests (r266568)
  • Timelines Tab
    • Fixed duplicate “Timeline Recording 1” on open (r266477)
    • Fixed re-enabling the JavaScript Allocations timeline to show previously captured heap snapshots in the table (r266463)
    • Fixed the record button disappearing when interface is narrow (r266537)
    • Fixed the Stop Recording button to actually stop the recording (r267038)
  • Audit Tab
    • Allow audits to be created and edited in Edit mode in Web Inspector (r266317)
  • Miscellaneous
    • Fixed issue where the docking buttons wouldn’t work when docked if the window is too small (r267031)


  • Added Intl.DateTimeFormat dateStyle and timeStyle (r266035)
  • Added Intl.Segmenter (r266032)
  • Added a syntax error for async function in a single-statement context (r266340)
  • Added Object.getOwnPropertyNames caching and accelerated Object.getOwnPropertyDescriptor (r265934)
  • Aligned legacy Intl constructor behavior to spec (r266655)
  • Applied Intl.DateTimeFormat hour-cycle correctly when timeStyle is used (r267108)
  • Enabled Intl.DisplayNames (r266029)
  • Changed to not allow let [ sequence to appear in an ExpressionStatement context (r266327)
  • Changed to allow new super.property syntax (r266322)
  • Changed to allow new import.meta() syntax (r266318)
  • Changed to use locale-sensitive grouping for grouping options in IntlRelativeTimeFormat (r266341)
  • Implemented Intl.DateTimeFormat dayPeriod (r266323)
  • Implemented Intl Language Tag Parser (r266039)
  • Implemented Intl.DateTimeFormat.prototype.formatRange (r266033)
  • Implemented unified Intl.NumberFormat (r266031)
  • Fixed an invalid early error for object literal method named proto (r266117)
  • Fixed implementation of the class “extends” clause incorrectly using proto for setting prototypes (r266106)
  • Fixed Performance and PerformanceNavigation interfaces missing toJSON operations (r267316)
  • Updated Intl.Collator to take a collation option (r267102)
  • Updated Array.prototype.push to always perform Set in strict mode (r266581, r266641)
  • Updated Promise.prototype.finally to perform PromiseResolve (r266896)

Date and Time Inputs

  • Added editing to <input type="datetime-local"> (r266830)
  • Updated date inputs to contain editable components (r266351)
  • Updated date picker appearance to match system date pickers (r267085)
  • Updated date picker when the inner control is edited (r266461)
  • Updated date pickers to respect the document’s color scheme (r267131)
  • Updated date/time inputs to focus the next editable component when entering a separator key (r267281)
  • Updated date/time inputs to preserve focus on value change (r266739)
  • Updated date/time inputs to not use user-specified formats to prevent fingerprinting (r267283)

Web Audio

  • Added AudioParam.automationRate attribute (r265980)
  • Added proper support for AudioContextOptions.sampleRate (r267014)
  • Allowed direct creation of replacement codec (r266466)
  • Changed AudioParam.value setter to call setValueAtTime(value, now) (r266293)
  • Changed AudioParam.linearRampToValueAtTime() formula to match specification (r266261)
  • Changed AudioBufferSourceNode to use final values for playbackRate and detune (r265981)
  • Fixed AnalyserNode.getFloatFrequencyData() to fill array with -Infinity when input is silent (r267202)
  • Fixed AudioBufferSourceNode.start() behavior when the offset is past the end of the buffer (r267169)
  • Fixed AudioBufferSourceNode.start() ignoring when parameter when the pitch rate is 0 (r267170)
  • Fixed AudioContext not rendering until an AudioNode is constructed (r266922)
  • Fixed AudioDestinationNode.maxChannelCount always returning 0 (r266559)
  • Fixed AudioParam.linearRampToValueAtTime() and exponentialRampToValueAtTime() having no effect when there is no preceding event (r266788)
  • Fixed BiquadFilterNode.getFrequencyResponse() to return NaN for out-of-bounds frequencies (r266541)
  • Fixed the types of Panner.setPosition() and setOrientation() parameters to not be unrestricted float (r267071)
  • Dropped non-standard AudioBuffer.gain (r267065)
  • Made AudioParam.cancelScheduledValues() standards compliant (r266558)
  • Improved interpolation algorithm in OscillatorNode (r266627)
  • Introduced StereoPannerNode Interface (r265962)
  • Stopped performing “de-zippering” when applying gain (r266794)


  • Enabled MediaRecorder by default on macOS (r267225)
  • End of media capture should not be reported before 3 seconds of the start of capture (r267081)
  • MediaRecorder timeslice parameter causing internal error on longer videos (r266611)

Paint Timing

  • Enabled paint timing by default (r267235)


  • Enabled WebGL2 by default (r267027)
  • Added WebGL and WebGL2 context support to OffscreenCanvas (r266275)
  • WebGL goes in a bad state where glContext.createProgram() returns null (r266362)


  • Fixed text-transform inheritance to ::marker (r266288)
  • Changed to set available column space before grid items prelayout (r266173)
  • Added support for flow-relative shorthand and offset properties (r266674)
  • Changed to allow indefinite size flex items to be definite with respect to resolving percentages inside them (r266696)
  • Changed to not skip flexboxes with auto height for percentage computations in quirks mode (r266716)
  • Changed to use min-content size for intrinsic maximums resolution (r266675)
  • Fixed min-height: auto not getting applied to nested flexboxes (r266695)
  • Fixed :visited color taken on a non-visited link when using CSS variables (r266656)
  • Fixed CSS revert to serialize as “revert”, not “Revert” (r266660)
  • Updated to safely handle overly-long CSS variable values (r266989)


  • Aligned length properties of function prototypes with specificcations (r266018)
  • Updated ReadableStream.pipeTo implementation to match specifications (r266129)
  • Updated Web Share API to prevent non-HTTP(S) URLs (r266151)
  • Aligned ISO-8859-{3,6,7,8,8-I} and windows-{874,1253,1255,1257} encodings with specifications (r266527)
  • Changed XML documents in iframes to not inherit encoding from the parent frame (r266671)
  • Changed Element to not set an attribute inside its constructor (r267074)
  • Changed new URL("#") to throw an error (r266748)
  • Fixed consecutive requestAnimationFrame callbacks that may get passed the same timestamp (r266526)
  • Fixed XHR.timeout getting affected by long tasks (r267227)
  • Fixed taking too long to fetch images from memory cache (r266699)
  • Implemented encodeInto() TextEncoder method (r266533)
  • Updated the URL fragment percent encode set (r266399)

Lazy Loading


  • Fixed the PiP window getting closed when the video element is removed from the DOM (r265904)
  • Fixed an HDCP error for all streams on Netflix (r266176)
  • Fixed <video> element preventing screen from sleeping even after playback finishes (r266410)


  • Added RTCRtpSynchronizationSource.rtpTimestamp (r266052)
  • Exposed RTCPeerConnection.restartIce (r266511)
  • Safari is not able to hear audio when using WebRTC in multiple tabs (r266454)


  • Fixed animations invalidating too often (r266229)
  • Fixed flickering on sedona.dev (r266189)
  • Fixed the cut off scrollbar on Facebook posts with lots of comments has cut off scrollbar that couldn’t scroll to the bottom (r266156)
  • Changed to handle fonts that lie about being monospaced (r266118)
  • Fixed programmatic selection of text in a text field that causes the highlight overlay to spill out (r266051)
  • Fixed overflow: scroll rubber-banding getting interrupted by post-layout scrolling (r267002, r266337)
  • Fixed a flash when closing a webpage (r267250)

Text Rendering

  • Changed letter-spacing to disable ligatures (r266683)


  • Fixed vertical scrolling getting stuck when a horizontal scroller is under the mouse (r266292)
  • Fixed select element scrolling after scrolling the page (r266262)

Back-Forward Cache

  • Added support for third-party domains to get stored for back-forward navigations (r265916)

Storage Access API

  • Allowed requests for storage access from nested iframes (r266479)

October 08, 2020 05:40 PM

October 01, 2020

Sergio Villar: Closing the gap (in flexbox 😇)

Igalia WebKit

Flexbox had a lot of early problems, but by mid-May 2020 where our story begins, both Firefox and Chromium had done a lot of work on improving things with this feature. WebKit, however, hadn’t caught up. Prioritizing the incredible amounts of work a web engine requires is difficult. The WebKit implementation was still passable for very many (most) cases of the core features, and it didn’t have problems that caused crashes or something that urgently demanded attention, so engineers dedicated their limited time toward other things. The net result, however, was that as this choice repeated many times, the comparative state of WebKit’s flexbox implementation had fallen behind pretty significantly.
Web Platform Tests (WPT) is a huge ongoing effort from many people to come up with a very extensive list of tests that could help both spec editors and implementors to make sure we have great compatibility. In the case of flexbox, for example, there are currently 773 tests (2926 subtests) and WebKit was failing a good amount of them. This matters a lot because there are things that flexbox is ideal for, and it is exceptionally widely used. In mid-May, Igalia was contracted to improve things here, and in this post, I’ll explain and illustrate how we did that.

The Challenge

The main issues were (in no particular order):
  • min-width:auto and min-height:auto handling
  • Nested flexboxes in column flows
  • Flexboxes inside tables and viceversa
  • Percentages in heights with indefinite sizes
  • WebKit CI not runnning many WPT flexbox tests
  • and of course… lack of gap support in Flexbox
Modifying Flexbox layout code is a challenge by itself. Tiny modifications in the source code could cause huge differences in the final layout. You might even have a patch that passes all the tests and regresses multiple popular web sites.
Good news is that we were able to tackle most of those issues. Let’s review what changes you could eventually expect from future releases of Safari (note that Apple doesn’t disclose information about future products and/or releases) and the other WebKit based browsers (like GNOME Web).

Flexbox gaps 🥳🎉

Probably one of the most awaited features in WebKit by web developers. It’s finally here after Firefox and Chrome landed it not so long ago. The implementation was initially inspired by the one in Chrome but then it diverged a bit in the final version of the patch. The important thing is that the behaviour should be the same, at least all the tests in WPT related to gaps are passing now in WebKit trunk.

<div style="display: flex; flex-wrap: wrap; gap: 1ch">
  <div style="background: magenta; color: white">Lorem</div>
  <div style="background: green; color: white">ipsum</div>
  <div style="background: orange; color: white">dolor</div>
  <div style="background: blue; color: white">sit</div>
  <div style="background: brown; color: white">amet</div>

Tables as flex items

Tables should obey the flex container sizing whenever they are flex items. As it can be seen in the examples bellow, the tables’ layout code was kicking in and ignoring the constraints set by the flex container. Tables should do what the flex algorithm mandates and thus they should allow being stretched/squeezed as required.

<div style="display:flex; width:100px; background:red;">
  <div style="display:table; width:10px; max-width:10px; height:100px; background:green;">
    <div style="width:100px; height:10px; background:green;"></div>

Tables with items exceeding the 100% of available size

This is the case of tables placed inside flex items. The automatic layout table algorithm was generating tables with unlimited widths when the sum of the sizes of their columns (expressed in percentages) was exceeding the 100%. It was impossible to fulfill at the same time the constraints set by tables and flexbox algorithms.

<div style="display:flex; width:100px; height:100px; align-items:flex-start; background:green;">
  <div style="flex-grow:1; flex-shrink:0;">
    <table style="height:50px; background:green;" cellpadding="0" cellspacing="0">
        <td style="width:100%; background:green;"> </td>
        <td style="background:green;"> </td>

Note how the table was growing indefinitely (I cropped the “Before” picture to fit in the post) to the right before the fix.

Alignment in single-line flexboxes

Interesting case. The code was considering that single-line flexboxes were those where all the flex items were placed in a single line after computing the required space for them. Though sensible, that’s not what a single line flexbox is, it’s a flex container with flex-wrap:nowrap. This means that a flex container with flex-wrap:wrap whose children do not need more than 1 flex line to be placed is not a single-line flex container from the specs POV (corolary: implementing specs is hard).

<div style="display: flex; flex-wrap: wrap; align-content: flex-end; width: 425px; height: 70px; border: 2px solid black">
  <div style="height: 20px">This text should be at the bottom of its container</div>

Percentages in flex items with indefinite sizes

One of the trickiest ones. Although it didn’t involve a lot of code it caused two serious regressions in Youtube’s upload form and when viewing Twitter videos in fullscreen which required some previous fixes and delayed a bit the landing of this patch. Note that this behaviour was really conflictive from the pure specification POV as there were many changes over the time. Defining a good behaviour is really complicated. Without entering in too much details, flexbox has a couple of cases were sizes are considered as definite when they are theoretically indefinite. In this case we consider that if the flex container main size is definite then the post-flexing size of flex items is also treated as definite.

<div style="display: flex; flex-direction: column; height: 150px; width: 150px; border: 2px solid black;">
    <div style="height: 50%; overflow: hidden;">
      <div style="width: 50px; height: 50px; background: green;"></div>
  <div style="flex: none; width: 50px; height: 50px; background: green;"></div>

Hit testing with overlapping flex items

There were some issues with pointer events passing through overlapping flex items (due to negative margins for example). This was fixed by letting the hit testing code proceed in reverse (the opposite to painting) order-modified document order instead of using the raw order from the DOM.

<div style="display:flex; border: 1px solid black; width: 300px;">
  <a style="width: 200px;" href="#">Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua</a>
  <div style="margin-left: -200px; width: 130px; height: 50px; background: orange;"></div>

In the “Before” case hit testing was bypassing the orange block and thus, the cursor was showing a hand because it detected that it was hovering a link. After the fix, the cursor is properly rendered as an arrow because the orange block covers the underneath link.

Computing percentages with scrollbars

In this case the issue was that, in order to compute percentages in heights, we were incorrectly using the size of the scrollbars too.

<div style="display: inline-flex; height: 10em;">
  <div style="overflow-x: scroll;">
    <div style="width: 200px; height: 100%; background: green"></div>

Note that in the “After” picture the horizontal scrollbar background is visible while in the “Before” the wrong height computation made the flex item overlap the scrollbar.

Image items with specific sizes

The flex layout algorithm needs the intrinsic sizes of the flex items to compute their sizes and the size of the flex container. Changes to those intrinsic sizes should trigger new layouts, and the code was not doing that.

<!-- Just to showcase how the img bellow is not properly sized -->
<div style="position: absolute; background-color: red; width: 50px; height: 50px; z-index: -1;"></div>
<div style="display: flex; flex-direction: column; width: 100px; height: 5px;">
  <img style="width: 100px; height: 100px;" src="https://wpt.live/css/css-flexbox/support/100x100-green.png">

Nested flexboxes with ‘min-height: auto’

Another tricky one and another one related to the handling of nested column flexboxes. As in the previous issue with nested column flexboxes the problem was that we were not supporting this case. For those wanting to have a deeper understanding of the issue this bug was about implementing section 4.5 of the specs. This was one of the more complicated ones to fix, Edward Lorenz would love that part of the layout code, the slightest change in one of those source code lines could trigger huge changes in the final rendering.

<div style='display:flex; flex-direction: column; overflow-y: scroll; width: 250px; height: 250px; border: 1px solid black'>
  <div style='display:flex;'>
    <div style="width: 100px; background: blue"></div>
    <div style='width: 120px; background: orange'></div>
    <div style='width: 10px; background: yellow; height: 300px'></div>

As it can be seen, in the “Before” picture the blue and orange blocks are sized differently to the yellow one. That’s fixed in the “After” picture.

Percentages in quirks mode

Another one affecting how percentages are computed in heights, but this one specific to quirks mode. We’re matching now Firefox, Chrome and pre-Chromium Edge, i.e., flexbox should not care much about quirks mode since it was invented many years after quirky browsers dominated the earth.

<div style="width: 100px; height: 50px;">
  <div style="display: flex; flex-direction: column; outline: 2px solid blue;">
    <div style="flex: 0 0 50%"></div>

Percentages in ‘flex-basis’

Percentages were working generally fine inside flex-basis, however there was one particular problematic case. It arose whenever that percentage was refererring to, oh surprise, and indefinite height. And again, we’re talking about nested flexboxes with column flows. Indeed, definite/indefinite sizes is one of the toughest things to get right from the layout POV. In this particular case, the fix was to ignore the percentages and and treat them as height: auto.

<div style="display: flex; flex-direction: column; width: 200px;">
  <div style="flex-basis: 0%; height: 100px; background: red;">
    <div style="background: lime">Here's some text.</div>

Flex containers inside STF tables

Fixing a couple of test cases submitted by an anonymous Opera employee 8! years ago. This is another case of competing layout contexts trying to do things their own way.

<div style="display: table; background:red">
   <div style="display: flex; width: 0px">
      <p style="margin: 1em 1em;width: 50px">Text</p>
      <p style="margin: 1em 1em;width: 50px">Text</p>
      <p style="margin: 1em 1em;width: 50px">Text</p>

After the fix the table is properly sized to 0px width and thus no red is seen.


These examples are just some interesting ones I’ve chosen to highlight. In the end, almost 50 new flexbox tests are passing in WebKit that weren’t back in May!. I wouldn’t like to forget the great job done by my colleague Carlos Lopez who imported tons of WPT flexbox tests into the WebKit source tree. He also performed awesome triage work which made my life a lot easier.
Investing in interoperability is a huge deal for the web. It’s good for everyone, from spec authors to final users, including browser vendors, downstream ports or web authors. So if you care about the web, or your business orbits around web technologies, you should definitely promote and invest on interoperability.

Implementing standards or fixing bugs in web engines is the kind of work we happily do at Igalia on a daily basis. We are the second largest contributor to both WebKit and Chrome/Blink, so if you have an annoying bug on a particular web engine (Gecko and Servo as well) that you want to be fixed, don’t hesitate and contact us, we’d be glad to help. Also, should you want to be part of a workers-owned cooperative with an asambleary decision-making mechanism and a strong focus on free software technologies join us!.


Many thanks to WebKit reviewers from Apple and Igalia like Darin Adler, Manuel Rego, Javier Fernández or Daniel Bates who made the process really easy for me, always providing very nice feedback for the patches I submitted.
I’m also really thankful to Googlers like Christian Biesinger, David Grogan and Stephen McGruer who worked on the very same things in Blink and/or provided very nice guidance and support when porting patches.

By svillar at October 01, 2020 11:34 AM

September 28, 2020

Adrián Pérez de Castro: Sunsetting NPAPI support in WebKitGTK (and WPE)

Igalia WebKit

  1. Summary
  2. What is NPAPI?
  3. What is NPAPI used for?
  4. Why are NPAPI plug-ins being phased out?
  5. What are other browsers doing?
  6. Is WebKitGTK following suit?


Here’s a tl;dr list of bullet points:

  • NPAPI is an old mechanism to extend the functionality of a web browser. It is time to let it go.
  • One year ago, WebKitGTK 2.26.0 removed support for NPAPI plug-ins which used GTK2, but the rest of plug-ins kept working.
  • WebKitGTK 2.30.x will be the last stable series with support for NPAPI plug-ins at all. Version 2.30.0 was released a couple of weeks ago.
  • WebKitGTK 2.32.0, due in March 2021, will be the first stable release to ship without support for NPAPI plug-ins.
  • We have already removed the relevant code from the WebKit repository.
  • While the WPE WebKit port allowed running windowless NPAPI plug-ins, this was never advertised nor supported by us.

What is NPAPI?

In 1995, Netscape Navigator 2.0 introduced a mechanism to extend the functionality of the web browser. That was NPAPI, short for Netscape Plugin Application Programming Interface. NPAPI allowed third parties to add support for new content types; for example Future Splash (.spl files), which later became Flash (.swf).

When a NPAPI plug-in is used to render content, the web browser carves a hole in the rectangular location where content handled by the plug-in will be placed, and hands off the rendering responsibility to the plug-in. This would end up calling call for trouble, as we will see later.

What is NPAPI used for?

A number of technologies have used NPAPI along the years for different purposes:

  • Displaying of multimedia content using Flash Player or the Silverlight plug-ins.
  • Running rich Java™ applications in the browser.
  • Displaying documents in non-Web formats (PDF, DjVu) inside browser windows.
  • A number of questionable practices, like VPN client software using a browser plug‑in for configuration.

Why are NPAPI plug-ins being phased out?

The design of NPAPI makes the web browser give full responsibility to plug-ins: the browser has no control whatsoever over what plug-ins do to display content, which makes it hard to make them participate in styling and layout. More importantly, plug-ins are compiled, native code over which browser developers cannot exercise quality control, which resulted in a history of security incidents, crashes, and browser hangs.

Today, Web browsers’ rendering engines can do a better job than plug-ins, more securely and efficiently. The Web platform is mature and there is no place to blindly trust third party code to behave well. NPAPI is a 25 years old technology showing its age—it has served its purpose, but it is no longer needed.

The last nail in the coffin was Adobe’s 2017 announcement that the Flash plugin will be discontinued in January 2021.

What are other browsers doing?

Glad that you asked! It turns out that all major browsers have plans for incrementally reducing how much of NPAPI usage they allow, until they eventually remove it.


Let’s take a look at the Firefox roadmap first:

Version Date Plug-in support changes
47 June 2016 All plug-ins except Flash need the user to click on the element to activate them.
52 March 2017 Only loads the Flash plug‑in by default.
55 August 2017 Does not load the Flash plug‑in by default, instead it asks users to choose whether sites may use it.
56 September 2017 On top of asking the user, Flash content can only be loaded from http:// and https:// URIs; the Android version completely removes plug‑in support. There is still an option to allow always running the Flash plug-in without asking.
69 September 2019 The option to allow running the Flash plug-in without asking the user is gone.
85 January 2021 Support for plug-ins is gone.
Table: Firefox NPAPI plug-in roadmap.

In conclusion, the Mozilla folks have been slowly boiling the frog for the last four years and will completely remove the support for NPAPI plug-ins coinciding with the Flash player reaching EOL status.

Chromium / Chrome

Here’s a timeline of the Chromium roadmap, merged with some highlights from their Flash Roadmap:

Version Date Plug-in support changes
? Mid 2014 The interface to unblock running plug-ins is made more complicated, to discourage usage.
? January 2015 Plug-ins blocked by default, some popular ones allowed.
42 April 2015 Support for plug-ins disabled by default, setting available in chrome://flags.
45 September 2015 Support for NPAPI plug-ins is removed.
55 December 2016 Browser does not advertise Flash support to web content, the user is asked whether to run the plug-in for sites that really need it.
76 July 2019 Flash support is disabled by default, can still be enabled with a setting.
88 January 2021 Flash support is removed.
Table: Chromium NPAPI/Flash plug-in roadmap.

Note that Chromium continued supporting Flash content even when it already removed support for NPAPI in 2015: by means of their acute NIH syndrome, Google came up with PPAPI, which replaced NPAPI and which was basically designed to support Flash and is currently used by Chromium’s built-in PDF viewer—which will go away also coinciding with Flash being EOL, nevertheless.


On the Apple camp, the story is much easier to tell:

  • Their handheld devices—iPhone, iPad, iPod Touch—never supported NPAPI plug-ins to begin with. Easy-peasy.
  • On desktop, Safari has required explicit approval from the user to allow running plug-ins since June 2016. The Flash plug-in has not been preinstalled in Mac OS since 2010, requiring users to manually install it.
  • NPAPI plug-in support will be removed from WebKit by the end of 2020.

Is WebKitGTK following suit?

Yes. In September 2019 WebKitGTK 2.26 removed support for NPAPI plug-ins which use GTK2. This included Flash, but the PPAPI version could still be used via freshplayerplugin.

In March 2021, when the next stable release series is due, WebKitGTK 2.32 will remove the support for NPAPI plug-ins. This series will receive updates until September 2021.

The above gives a full two years since we started restricting which plug-ins can be loaded before they stop working, which we reckon should be enough. At the moment of writing this article, the support for plug-ins was already gone from the WebKit source the GTK and WPE ports.

Yes, you read well, WPE supported NPAPI plug-ins, but in a limited fashion: only windowless plug-ins worked. In practice, making NPAPI plug-ins work on Unix-like systems required using the XEmbed protocol to allow them to place their rendered content overlaid on top of WebKit’s, but the WPE port does not use X11. Provided that we never advertised nor officially supported the NPAPI support in the WPE port, we do not expect any trouble removing it.

September 28, 2020 09:50 PM

September 09, 2020

Release Notes for Safari Technology Preview 113

Surfin’ Safari

Safari Technology Preview Release 113 is now available for download for macOS Big Sur and macOS Catalina. If you already have Safari Technology Preview installed, you can update in the Software Update pane of System Preferences on macOS.

This release covers WebKit revisions 265179-265893.

Web Inspector

  • Timelines Tab
    • Fixed background colors for odd and even items in Dark Mode in the Timelines tab (r265498)
    • Media & Animations timeline shouldn’t shift when sorting (r265356)
  • Adapted Web Inspector’s user interface and styling to better match macOS Big Sur (r265237, r265507)

Web Audio

  • Added constructor for GainNode (r265227)
  • Added constructor for BiquadFilterNode (r265290)
  • Added constructor for ConvolverNode (r265298)
  • Added constructor for DelayNode (r265221)
  • Added constructor for AudioBuffer (r265210)
  • Added constructor for AnalyserNode (r265196)
  • Added support for suspending and resuming an OfflineAudioContext (r265701)
  • Added constructor for the MediaElementAudioSourceNode interface (r265330)
  • Aligned AudioListener with the W3C specification (r265266)
  • Aligned BiquadFilterNode.getFrequencyResponse() with the specification (r265291)
  • Fixed BiquadFilterNode’s lowpass filter (r265517)
  • Fixed missing length attribute on OfflineAudioContext (r265388)
  • Fixed missing baseLatency attribute on the AudioContext interface (r265393)


  • Added support for MediaRecorder bitrate options (r265328)


  • Updated to avoid triggering redundant compositing updates when trying to run a steps() animation on transform (r265358)
  • Fixed inconsistent spacing of Chinese characters in Safari for macOS Big Sur (r265488)


  • Enabled H.264 low latency code path by default for macOS (r265547)
  • Fixed the picture-in-picture button disappearing in the fullscreen YouTube player after starting a new video in a playlist (r265690)


  • Changed to apply aspect ratios when computing flex-basis (r265855)
  • Fixed updating min-height: auto after an image loads when the image has a specified height and width (r265858)
  • Fixed @font-face font-weight descriptor to reject bolder and lighter (r265677)
  • Fixed the CSS specificity of :host() pseudo-classes (r265812)


  • Fixed window.print to not invoke native UI (r265207)


  • Added VoiceOver access to font styling at insertion point (r265259)


  • Fixed font loads quickly followed by navigations failing indefinitely (r265603)


  • Implemented Canvas.transferControlToOffscreen and OffscreenCanvasRenderingContext2D.commit (r265543)
  • Implemented createImageBitmap(ImageData) (r265360)
  • Implemented PerfomanceObserverInit.buffered (r265390)
  • Fixed text input autocorrect="off" attribute getting ignored on macOS (r265509)

Gamepad API

  • Added a special HID mapping for the Google Stadia controller (r265180)
  • Added HID mapping for the Logitech F310/F710 controllers. (r265183)


  • Fixed table data incorrectly translated in some articles on wikipedia.org (r265188)
  • Fixed leading and trailing spaces to be ignored when comparing content (r265361)

September 09, 2020 05:32 PM

September 07, 2020

Víctor Jáquez: Review of Igalia Multimedia activities (2020/H1)

Igalia WebKit

This blog post is a review of the various activities the Igalia Multimedia team was involved in during the first half of 2020.

Our previous reports are:

Just before a new virus turned into pandemics we could enjoy our traditional FOSDEM. There, our colleague Phil gave a talk about many of the topics covered in this report.


GstWPE’s wpesrc element, produces a video texture representing a web page rendered off-screen by WPE.

We have worked on a new iteration of the GstWPE demo, focusing on one-to-many, web-augmented overlays, broadcasting with WebRTC and Janus.

Also, since the merge of gstwpe plugin in gst-plugins-bad (staging area for new elements) new users have come along spotting rough areas and improving the element along the way.

Video Editing

GStreamer Editing Services (GES) is a library that simplifies the creation of multimedia editing applications. It is based on the GStreamer multimedia framework and is heavily used by Pitivi video editor.

Implemented frame accuracy in the GStreamer Editing Services (GES)

As required by the industry, it is now possible to reference all time in frame number, providing a precise mapping between frame number and play time. Many issues were fixed in GStreamer to reach the precision enough for make this work. Also intensive regression tests were added.

Implemented time effects support in GES

Important refactoring inside GStreamer Editing Services have happened to allow cleanly and safely change playback speed of individual clips.

Implemented reverse playback in GES

Several issues have been fixed inside GStreamer core elements and base classes in order to support reverse playback. This allows us to implement reliable and frame accurate reverse playback for individual clips.

Implemented ImageSequence support in GStreamer and GES

Since OpenTimelineIO implemented ImageSequence support, many users in the community had said it was really required. We reviewed and finished up imagesequencesrc element, which had been awaiting review for years.

This feature is now also supported in the OpentimelineIO GES adapater.

Optimized nested timelines preroll time by an order of magnitude

Caps negotiation, done while the pipeline transitions from pause state to playing state, testing the whole pipeline functionality, was the bottleneck for nested timelines, so pipelines were reworked to avoid useless negotiations. At the same time, other members of GStreamer community have improved caps negotiation performance in general.

Last but not least, our colleague Thibault gave a talk in The Pipeline Conference about The Motion Picture Industry and Open Source Software: GStreamer as an Alternative, explaining how and why GStreamer could be leveraged in the motion picture industry to allow faster innovation, and solve issues by reusing all the multi-platform infrastructure the community has to offer.

WebKit multimedia

There has been a lot of work on WebKit multimedia, particularly for WebKitGTK and WPE ports which use GStreamer framework as backend.

WebKit Flatpak SDK

But first of all we would like to draw readers attention to the new WebKit Flatpak SDK. It was not a contribution only from the multimedia team, but rather a joint effort among different teams in Igalia.

Before WebKit Flatpak SDK, JHBuild was used for setting up a WebKitGTK/WPE environment for testing and development. Its purpose to is to provide a common set of well defined dependencies instead of relying on the ones available in the different Linux distributions, which might bring different outputs. Nonetheless, Flatpak offers a much more coherent environment for testing and develop, isolated from the rest of the building host, approaching to reproducible outputs.

Another great advantage of WebKit Flatpak SDK, at least for the multimedia team, is the possibility of use gst-build to setup a custom GStreamer environment, with latest master, for example.

Now, for sake of brevity, let us sketch an non-complete list of activities and achievements related with WebKit multimedia.

General multimedia

Media Source Extensions (MSE)

Encrypted Media Extension (EME)

One of the major results of this first half, is the upstream of ThunderCDM, which is an implementation of a Content Decryption Module, providing Widevine decryption support. Recently, our colleague Xabier, published a blog post on this regard.

And it has enabled client-side video rendering support, which ensures video frames remain protected in GPU memory so they can’t be reached by third-party. This is a requirement for DRM/EME.



Though we normally contribute in GStreamer with the activities listed above, there are other tasks not related with WebKit. Among these we can enumerate the following:

GStreamer VAAPI

  • Reviewed a lot of patches.
  • Support for media-driver (iHD), the new VAAPI driver for Intel, mostly for Gen9 onwards. There are a lot of features with this driver.
  • A new vaapioverlay element.
  • Deep code cleanups. Among these we would like to mention:
    • Added quirk mechanism for different backends.
    • Change base classes to GstObject and GstMiniObject of most of classes and buffers types.
  • Enhanced caps negotiation given current driver’s constraints


The multimedia team in Igalia has keep working, along the first half of this strange year, in our three main areas: browsers (mainly on WebKitGTK and WPE), video editing and GStreamer framework.

We worked adding and enhancing WebKitGTK and WPE multimedia features in order to offer a solid platform for media providers.

We have enhanced the Video Editing support in GStreamer.

And, along these tasks, we have contribuited as much in GStreamer framework, particulary in hardware accelerated decoding and encoding and VA-API.

By vjaquez at September 07, 2020 03:12 PM

September 02, 2020

Xabier Rodríguez Calvar: Serious Encrypted Media Extensions on GStreamer based WebKit ports

Igalia WebKit

Encrypted Media Extensions (a.k.a. EME) is the W3C standard for encrypted media in the web. This way, media providers such as Hulu, Netflix, HBO, Disney+, Prime Video, etc. can provide their contents with a reasonable amount of confidence that it will make it very complicated for people to “save” their assets without their permission. Why do I use the word “serious” in the title? In WebKit there is already support for Clear Key, which is the W3C EME reference implementation but EME supports more encryption systems, even privative ones (I have my opinion about this, you can ask me privately). No service provider (that I know) supports Clear Key, they usually rely on Widevine, PlayReady or some other.

Three years ago, my colleague Žan Doberšek finished the implementation of what was going to be the shell of WebKit’s modern EME implementation, following latest W3C proposal. We implemented that downstream (at Web Platform for Embedded) as well using Thunder, which includes as a plugin a fork of what was Open Content Decryption Module (a.k.a. OpenCDM). The OpenCDM API changed quite a lot during this journey. It works well and there are millions of set-top-boxes using it currently.

The delta between downstream and the upstream GStreamer based WebKit ports was quite big, testing was difficult and syncing was not always easy, so we decided reverse the situation.

Our first step was done by my colleague Charlie Turner, who made Clear Key work upstream again while adapted some changes the Apple folks had done meanwhile. It was amazing to see Clear Key tests passing again and his work with the CDMProxy related classes was awesome. After having ClearKey working, I had to adapt them a bit to accomodate Thunder. To explain a bit about the WebKit EME architecture, I must say that there are two layers. The first is the crossplatform one, which implements the W3C API (MediaKeys, MediaKeySession, CDM…). These classes rely on the platform ones (CDMPrivate, CDMInstance, CDMInstanceSession) to handle the platform management, message exchange, etc. which would be the second layer. Apple playback system is fully integrated with their DRM system so they don’t need anything else. We do because we need to integrate our own decryptors to defer to Thunder for decryption so in the GStreamer based ports we also need the CDMProxy related classes, which would be CDMProxy, CDMInstanceProxy, CDMInstanceSessionProxy… The last two extend CDMInstance and CDMInstanceSession respectively to be able to deal with the key management, that is abstracted to the KeyHandle and KeyStore.

Once the abstraction is there (let’s remember that the abstranction works both for Clear Key and Thunder), the Thunder implementation is quite simple, just gluing the CDMProxy, CDMInstanceProxy and CDMInstanceSessionProxy classes to the Thunder system and writing a GStreamer decryptor element for it. I might have made a mistake when selecting the files but considering Thunder classes + the GStreamer common decryptor code, cloc says it is just 1198 lines of platform code. I think it is pretty low for what it does. Apart from that, obviously, there are 5760 lines of crossplatform code.

To build and run all this you need to do several things:

  1. Build the dependencies with WEBKIT_JHBUILD=1 JHBUILD_ENABLE_THUNDER="yes" to enable the old fashioned JHBuild build and force it to build the Thunder dependencies. All dependendies are on JHBuild, even Widevine is referenced but to download it you need the proper credentials as it is closed source.
  2. Pass --thunder when calling build-webkit.sh.
  3. Run MiniBrowser with WEBKIT_GST_EME_RANK_PRIORITY="Thunder" and pass parameters --enable-mediasource=TRUE --enable-encrypted-media=TRUE --autoplay-policy=allow. The autoplay policy is usually optional but in this case it is necessary for the YouTube TV tests. We need to give the Thunder decryptor a higher priority because of WebM, that does not specify a key system and without it the Clear Key one can be selected and fail. MP4 does not create trouble because the protection system is specified and the caps negotiation does its magic.

As you could have guessed if you have a closer look at the GStreamer JHBuild moduleset, you’ll see that only Widevine is supported. To support more, you only have to make them build in the Thunder ecosystem and add them to CDMFactoryThunder::supportedKeySystems.

When I coded this, all YouTube TV tests for Widevine were green in the desktop. At the moment of writing this post they aren’t because of some problem with the Widevine installation that will be sorted quickly, I hope.

By calvaris at September 02, 2020 02:59 PM

August 27, 2020

Chris Lord: OffscreenCanvas, jobs, life

Igalia WebKit

Hoo boy, it’s been a long time since I last blogged… About 2 and a half years! So, what’s been happening in that time? This will be a long one, so if you’re only interested in a part of it (and who could blame you), I’ve titled each section.

Leaving Impossible

Well, unfortunately my work with Impossible ended, as we essentially ran out of funding. That’s really a shame, we worked on some really cool, open-source stuff, and we’ve definitely seen similar innovations in the field since we stopped working on it. We took a short break (during which we also, unsuccessfully, searched for further funding), after which Rob started working on a cool, related project of his own that you should check out, and I, being a bit less brave, starting seeking out a new job. I did consider becoming a full-time musician, but business wasn’t picking up as quickly as I’d hoped it might in that down-time, and with hindsight, I’m glad I didn’t (Covid-19 and all).

I interviewed with a few places, which was certainly an eye-opening experience. The last ‘real’ job interview I did was for Mozilla in 2011, which consisted mainly of talking with engineers that worked there, and working through a few whiteboard problems. Being a young, eager coder at the time, this didn’t really phase me back then. Turns out either the questions have evolved or I’m just not quite as sharp as I used to be in that very particular environment. The one interview I had that involved whiteboard coding was a very mixed bag. It seemed a mix of two types of questions; those that are easy to answer (but unless you’re in the habit of writing very quickly on a whiteboard, slow to write down) and those that were pretty impossible to answer without specific preparation. Perhaps this was the fault of recruiters, but you might hope that interviews would be catered somewhat to the person you’re interviewing, or the work they might actually be doing, neither of which seemed to be the case? Unsurprisingly, I didn’t get past that interview, but in retrospect I’m also glad I didn’t. Igalia’s interview process was much more humane, and involved mostly discussions about actual work I’ve done, hypothetical situations and ethics. They were very long discussions, mind, but I’m very glad that they were happy to hire me, and that I didn’t entertain different possibilities. If you aren’t already familiar with Igalia, I’d highly recommend having a read about them/us. I’ve been there a year now, and the feeling is quite similar to when I first joined Mozilla, but I believe with Igalia’s structure, this is likely to stay a happier and safer environment. Not that I mean to knock Mozilla, especially now, but anyone that has worked there will likely admit that along with the giddy highs, there are also some unfortunate lows.


I joined Igalia as part of the team that works on WebKit, and that’s what I’ve been doing since. It almost makes perfect sense in a way. Surprisingly, although I’ve spent overwhelmingly more time on Gecko, I did actually work with WebKit first while at OpenedHand, and for a short period at Intel. While celebrating my first commit to WebKit, I did actually discover it wasn’t my first commit at all, but I’d contributed a small embedding-related fix-up in 2008. So it’s nice to have come full-circle! My first work at Igalia was fixing up some patches that Žan Doberšek had prototyped to allow direct display of YUV video data via pixel shaders. Later on, I was also pleased to extend that work somewhat by fixing some vc3 driver bugs and GStreamer bugs, to allow for hardware decoding of YUV video on Raspberry Pi 3b (this, I believe, is all upstream at this point). WebKit Gtk and WPE WebKit may be the only Linux browser backends that leverage this pipeline, allowing for 1080p30 video playback on a Pi3b. There are other issues making this less useful than you might think, but either way, it’s a nice first achievement.


After that introduction, I was pointed at what could be fairly described as my main project, OffscreenCanvas. This was also a continuation of Žan’s work (he’s prolific!), though there has been significant original work since. This might be the part of this post that people find most interesting or relevant, but having not blogged in over 2 years, I can’t be blamed for waffling just a little. OffscreenCanvas is a relatively new web standard that allows the use of canvas API disconnected from the DOM, and within Workers. It also makes some provisions for asynchronously updated rendering, allowing canvas updates in Workers to bypass the main thread entirely and thus not be blocked by long-running processes on that thread. The most obvious use-case for this, and I think the most practical, is essentially non-blocking rendering of generated content. This is extremely handy for maps, for example. There are some other nice use-cases for this as well – you can, for example, show loading indicators that don’t stop animating while performing complex DOM manipulation, or procedurally generate textures for games, asynchronously. Any situation where you might want to do some long-running image processing without blocking the main thread (image editing also springs to mind).

Currently, the only complete implementation is within Blink. Gecko has a partial implementation that only supports WebGL contexts (and last time I tried, crashed the browser on creation…), but as far as I know, that’s it. I’ve been working on this, with encouragement and cooperation from Apple, on and off for the past year. In fact, as of August 12th, it’s even partially usable, though there is still a fair bit missing. I’ve been concentrating on the 2d context use-case, as I think it’s by far the most useful part of the standard. It’s at the point where it’s mostly usable, minus text rendering and minus some edge-case colour parsing. Asynchronous updates are also not yet supported, though I believe that’s fairly close for Linux. OffscreenCanvas is enabled with experimental features, for those that want to try it out.

My next goal, after asynchronous updates on Linux, is to enable WebGL context support. I believe these aren’t particularly tough goals, given where it is now, so hopefully they’ll happen by the end of the year. Text rendering is a much harder problem, but I hope that between us at Igalia and the excellent engineers at Apple, we can come up with a plan for it. The difficulty is that both styling and font loading/caching were written with the assumption that they’d run on just one thread, and that that thread would be the main thread. A very reasonable assumption in a pre-Worker and pre-many-core-CPU world of course, but increasingly less so now, and very awkward for this particular piece of work. Hopefully we’ll persevere though, this is a pretty cool technology, and I’d love to contribute to it being feasible to use widely, and lessen the gap between native and the web.

And that’s it from me. Lots of non-work related stuff has happened in the time since I last posted, but I’m keeping this post tech-related. If you want to hear more of my nonsense, I tend to post on Twitter a bit more often these days. See you in another couple of years 🙂

By Chris Lord at August 27, 2020 08:56 AM

August 18, 2020

Release Notes for Safari Technology Preview 112

Surfin’ Safari

Safari Technology Preview Release 112 is now available for download for macOS Big Sur and macOS Catalina. If you already have Safari Technology Preview installed, you can update in the Software Update pane of System Preferences on macOS.

This release covers WebKit revisions 264601-265179.

Web Inspector

  • Changed the default tab order to display most commonly used tabs first (r264959)
  • Changed the background, text, and border colors to match the OS (r265120)
  • Changed to only show scrollbars when needed (r265118)
  • Fixed issue where a failed initial subresource load would break the Sources Tab (r264717)
  • Fixed the ability to save files that are base64 encoded (r264669)
  • Prevented blurring the add class input when a class is added in the Styles sidebar of the Elements tab (r264667)


  • Fixed pop-up dialog sizing for percentage height values applied to <html> (r264960)
  • Added support for replacing a Safari App Extension with a Safari Web Extension by specifying the SFSafariAppExtensionBundleIdentifiersToReplace key in the NSExtension element in your Safari Web Extension Info.plist file. The value for the key should be an array of strings, each of which is the bundle identifier on a Safari App Extension you want to replace.


  • Fixed align-content in grid containers with small content area (r265020)
  • Fixed the CSS clip-path being applied to the view-box coordinates (r264622)
  • Fixed scroll snap when using RTL layout (r264908)


  • Implemented Intl.DisplayNames (r264639)
  • Changed eval?.() to be an indirect eval (r264633)


  • Added support for SVG <a> element’s rel and relList attributes (r264789)


  • Added behaviors for YouTube to offer HDR variants to devices which support HDR (r265167)
  • Adopted AVPlayer.videoRangeOverride (r264710)
  • Added HDR decode support in software-decoded VP9 (r265073)
  • Fixed becoming unresponsive after playing a video from a YouTube playlist in picture-in-picture mode (r264684)


  • Added OfflineAudioContext constructor (r264657)
  • Fixed scaleResolutionDownBy on RTCRtpSender (r265047)


  • Added support for the type attribute to PerformanceObserver (r265001)
  • Changed date and time input types to have a textfield appearance (r265157)
  • Changed to propagate the user gesture through Fetch API (r264853)
  • Fixed highlight color to update after being set it system preferences (r265072)
  • Fixed datalist dropdown scrollbar position to match the visible region (r264783)
  • Made mousemove event cancelable (r264658)

Text Manipulation

  • Changed text manipulation to not extract non-breaking spaces (r264947)
  • Fixed article headlines being split across multiple lines after translating (r264729)


  • Changed to allow IndexedDB in third-party frames (r264790)

August 18, 2020 06:20 PM

August 13, 2020

Javier Fernández: Improving CSS Custom Properties performance

Igalia WebKit

Chrome 84 reached the stable channel a few weeks ago, and there are already several great posts describing the many important additions, interesting new features, security fixes and improvements in privacy policies (([1], [2], [3], [4]) it contains. However, there is a change that I worked on in this release which might have passed unnoticed by most, but I think is very valuable: A change regarding CSS Custom Properties (variables) performance.

The design of CSS, in general, takes great care in considering how features are designed with respect to making it possible for them to perform well. However, implementations may not perform as well as they could, and it takes a considerable amount of time to understand how authors use the features and which cases are more relevant for them.

CSS Custom Properties are an interesting example to look at here: They are a wonderful feature that provides a lot of advantages for web authors. For a whole lot of cases, all of the implementations of CSS Custom Properties perform well enough that most people won’t notice. However, we at Igalia have been analyzing several use cases and looking at some reports around their performance in different implementations.

Let’s consider a fairly straightforward example in which an author sets a single property in a toggleable class in the body, and then uses that property several times deeper in the tree to change the foreground color of some text.

   .red { --prop: red; }
   .green { --prop: green; }

Only about 20% of those actually use this property, 5 elements deep into the tree, and only to change the foreground color.

To evaluate Chromium’s performance in a case like this we can define a new perf tests, using the perf tools the Chromium project has available for browser engineers. In this case, we want a huge tree so that we can evaluate better the impact of the different optimizations.

    .green { --prop: green; }
    .red { --prop: red; }


These are the results obtained runing the test in Chrome 83:

avg median



163.74 ms 163.79 ms 3.69 ms 158.59 ms 163.74 ms

I admit that it’s difficult to evaluate the results, especially considering the number of nodes of such a huge DOM tree. Lets compare the results of the same test on Firefox, using different number of nodes.

Nodes 50K 20K 10K 5K 1K 500
Chrome 83 163.74 ms 55.05 ms 25.12 ms 14.18 ms 2.74 ms 1.50 ms
FF 78 28.35 ms 12.05 ms 6.10 ms 3.50 ms 1.15 ms 0.55 ms
1/6 1/5 1/4 1/4 1/2 1/3

As I commented before, the data are more accurate when the DOM tree has a lot of nodes; in any case, the difference is quite clear and shows there is plenty room for improvement. WebKit based browsers have results more similar to Chromium as well.

Performance tests like the one above can be added to browsers for tracking improvements and regressions over time, so we’ve added (r763335) that to Chromium’s tree: We’d like to see it get faster over time, and definitely cannot afford regressions (see Chrome Performance Dashboard and the ChangeStyleCustomPropertyDeclaration test for details) .

So… What can we do?

In Chrome 83 and lower, whenever the custom property declaration changed, the new declaration would be inherited by the whole tree. This inheritance implied executing the whole CSS cascade and recalculating the styles of all the nodes in the entire tree, since with this approach, all nodes may be affected.

Chrome had already implemented an optimization on the CSS cascade implementation for regular CSS properties that don’t depend on any other to resolve their value. These subset of CSS properties are defined as Independent Properties in the Chromium codebase. The optimization mentioned before affects how the inheritance mechanism is implemented for these Independent properties. Whenever one of these properties changes, instead of recalculating the styles of the inherited properties, children can just copy the whole parent’s computed style. Blink’s style engine has a component known as Matched Properties Cache responsible of deciding when is possible to avoid the style resolution of an element and instead, performing an efficient copy of the matched computed style. I’ll get back to this concept in the last part of this post.

In the case of CSS Custom Properties, we could apply a similar approach as a good step. We can consider that the nodes with computed styles that don’t have references to custom properties declarations shouldn’t be affected by the new declaration, and we can implement the inheritance directly by copying the parent’s computed style. The patch with the optimization I’ve implemented in r765278 initially landed in Chrome 84.0.4137.0

Let’s look at the result of this one action in the Chrome Performance Dashboard:

That’s a really good improvement!

However, it’s also just a first step. It’s clear that Chrome still has a wide margin for improvement in this case, as well any WebKit based browser – Firefox is still, impressively, markedly faster as it’s been described in the bug report filed to track this issue. The following table shows the result of the different browsers together; even disabling the muti-thread capabilities of Firefox’s Stylo engine (STYLO_THREAD=1), FF is much faster than Chrome with the optimization applied.

Chrome 83 Chrome 84 FF 78 FF 78 th=1
163.74 ms
163.79 ms
3.69 ms
158.59 ms
163.74 ms
117.37 ms
117.52 ms
1.98 ms
113.66 ms
120.87 ms
28.35 ms
28.50 ms
0.93 ms
26.00 ms
30.00 ms
38.25 ms
38.50 ms
1.86 ms
35.00 ms
41.00 ms

Before continue, I want get back to the Matched Properties Cache (MPC) concept, since it has an important role on these style optimizations. This cache is not a new concept in the Chrome’s engine; as a matter of fact, it’s also used in WebKit, since it was implemented long ago, before the fork that created the new blink engine. However, Google has been working a lot on this area in the last years and some of the most recent changes in the MPC have had an important impact on style resolution performance. As a result of this work, elements with independent and non-independent properties using CSS Variables might produce cache hits in the MPC. The results of the Performance Dashboard show a considerable improvement in the mentioned ChangeStyleCustomPropertyDeclaration test (avg: 108.06 ms)

Additionally, there are several other cases where the use of CSS Variables has a considerable impact on performance, compared with using regular CSS properties. Obviously, resolving CSS Variables has a cost, so it’s clear that we could apply additional optimizations that reduce the impact of the variable resolution, especially for handling specific style changes that might not affect to a substantial portion of the DOM tree. I’ve been experimenting with the MPC to explore the idea an independent CSS Custom Properties cache; nodes with variables referencing the same custom property will produce cache hits in the MPC, even though other properties don’t match. The preliminary approach I’ve been implementing consists on a new matching function, specific for custom properties, and a mechanism to transfer/copy the property’s data to avoid resolving the variable again, since the property’s declaration hasn’t change. We would need to apply the css cascade again, but at least we could save the cost of the variable resolution.

Of course, at the end of the day, improving performance has costs and challenges – and it’s hard to keep performance even once you get it. Bit if we really want performant CSS Custom Properties, this means that we have to decide to prioritize this work. Currently there is reluctance to explore the concept of a new Custom Properties specific cache – the challenge is big and the risks are not non-existent; cache invalidation can get complicated. But, the point is that we have to understand that we aren’t all going to agree what is important enough to warrant attention, or how much investment, or when. Web authors must convince vendors that these use cases are worth being optimized and that the cost and risks of such a complex challenges should be assumed by them.

This work has been sponsored by Bloomberg, which I consider one of the most important contributors of the Web Platform. After several years, the vision of this company and its responsibility as consumer of the platform has lead to many and important contributions that we all enjoy now. Although CSS Grid Layout might be the most remarkable one, there are may other not that big, like this work on CSS Custom Properties, or several other new features of the CSS Text specification. This is a perfect example of an company that tries to change priorities and adapt the web platform to its needs and the use cases they consider more aligned with their business strategy.

I understand that not every user of the web platform can do this kind of investment. This is why I believe that initiatives like Open Priorization could help to move the web platform in a positive direction. By providing a way for us to move past a lot of these conversation and focus on the needs that some web authors and users of the platform consider more important, or higher priority. Improving performance for CSS Custom Properties isn’t currently one of the projects we’ve listed, but perhaps it would be an interesting one we might try in the future if we are successful with these. If you haven’t already, have a look and see if there is something there that is interesting to you or your company – pledges of any size are good – ten thousand $1 donations are every bit as good as ten $1000 donations. Together, we can make a difference, and we all benefit.

Also, we would love to hear about your ideas. Is improving CSS Custom Properties performance important to you? What else is? Share your comments with us on Twitter, either me (@lajava77) or our developer advocate Brian Kardell (@briankardell), or email me at jfernandez@igalia.com. I’d be glad to answer any question about the Open Priorization experiment.

By jfernandez at August 13, 2020 06:16 PM

July 29, 2020

Release Notes for Safari Technology Preview 111

Surfin’ Safari

Safari Technology Preview Release 111 is now available for download for macOS Big Sur and macOS Catalina. If you already have Safari Technology Preview installed, you can update in the Software Update pane of System Preferences on macOS.

This release covers WebKit revisions 263988-264601.

Web Inspector

  • Added an error message if unable to fetch shader source in the Canvas tab (r264045)
  • Fixed Heap Snapshot Object Graph view not getting populated in some cases when inspecting a JSContext (r264124)
  • Updated the tab bar colors of undocked Web Inspector to match Safari in macOS Big Sur (r264410)
  • Updated the title bar of undocked Web Inspector to be white in macOS Big Sur (r264204)

Web Extensions

  • Fixed chrome.tabs.update() so it does not open a new tab for safari-web-extension URLs
  • Fixed chrome.tabs.create() so it passes a valid tab object to the callback for relative extension URLs


  • Fixed content changes not triggering re-snapping with scroll snap after a scroll gesture (r264190)
  • Fixed scrolling pages with non-invertable transforms in children of an overflow: scroll element (r264031)
  • Fixed stuttery scrolling by ensuring a layout-triggered scroll snap does not happen if a user scroll is in progress on the scrolling thread (r264203)


  • Fixed high CPU usage on Bitbucket search results pages (r264008)


  • Fixed line name positions after implicit grid track (r264465)


  • Made String.protoytpe.toLocaleLowerCase'savailableLocales` HashSet more efficient (r264293)
  • Changed Intl.Locale maximize, minimize to return Intl.Locale instead of String (r264275)
  • Fixed Math.max() yielding the wrong result for max(0, -0) (r264507)
  • Fixed redefining a property that should not change its insertion index (Object.keys order) (r264574)

Web Authentication

  • Added a console message to indicate a user gesture is required to use the platform authenticator (r264490)
  • Relaxed the user gesture requirement to allow it to be propagated through XHR events (r264528)


  • Fixed the ability to pause to pause playback of MediaStream video track (r264312)
  • Added support for parsing VP-style codec strings. (r264367)


  • Changed URL.host to not override the port (r264516)
  • Fixed autocapitalize="words" capitalizing every word’s second character (r264112)
  • Multiplexed the HID and GameController gamepad providers on macOS (r264207)
  • Removed the concept of “initial connected gamepads” (r264004)

Storage Access API

  • Added the capability to open a popup and get user interaction so we can call the Storage Access API as a quirk, on behalf of websites that should be doing it themselves (r263992)

Intelligent Tracking Prevention

  • Added an artificial delay to WebSocket connections to mitigate port scanning attacks (r264306)


  • Implemented user action specifications for Escape action (r264000)

Text Manipulation

  • Fixed text manipulation to observe manipulated text after update (r264305)
  • Fixed text manipulation to ignore white spaces between nodes (r264120)
  • Fixed the caret leaving trails behind when the editable content is subpixel positioned (r264386)

July 29, 2020 05:32 PM

Speculation in JavaScriptCore

Surfin’ Safari

This post is all about speculative compilation, or just speculation for short, in the context of the JavaScriptCore virtual machine. Speculative compilation is ideal for making dynamic languages, or any language with enough dynamic features, run faster. In this post, we will look at speculation for JavaScript. Historically, this technique or closely related variants has been applied successfully to Smalltalk, Self, Java, .NET, Python, and Ruby, among others. Starting in the 90’s, intense benchmark-driven competition between many Java implementations helped to create an understanding of how to build speculative compilers for languages with small amounts of dynamism. Despite being a lot more dynamic than Java, the JavaScript performance war that started in the naughts has generally favored increasingly aggressive applications of the same speculative compilation tricks that worked great for Java. It seems like speculation can be applied to any language implementation that uses runtime checks that are hard to reason about statically.

This is a long post that tries to demystify a complex topic. It’s based on a two hour compiler lecture (slides also available in PDF). We assume some familiarity with compiler concepts like intermediate representations (especially Static Single Assignment Form, or SSA for short), static analysis, and code generation. The intended audience is anyone wanting to understand JavaScriptCore better, or anyone thinking about using these techniques to speed up their own language implementation. Most of the concepts described in this post are not specific to JavaScript and this post doesn’t assume prior knowledge about JavaScriptCore.

Before going into the details of speculation, we’ll provide an overview of speculation and an overview of JavaScriptCore. This will help provide context for the main part of this post, which describes speculation by breaking it down into five parts: bytecode (the common IR), control, profiling, compilation, and OSR (on stack replacement). We conclude with a small review of related work.

Overview of Speculation

The intuition behind speculation is to leverage traditional compiler technology to make dynamic languages as fast as possible. Construction of high-performance compilers is a well-understood art, so we want to reuse as much of that as we can. But we cannot do this directly for a language like JavaScript because the lack of type information means that the compiler can’t do meaningful optimizations for any of the fundamental operations (even things like + or ==). Speculative compilers use profiling to infer types dynamically. The generated code uses dynamic type checks to validate the profiled types. If the program uses a type that is different from what we profiled, we throw out the optimized code and try again. This lets the optimizing compiler work with a statically typed representation of the dynamically typed program.

Types are a major theme of this post even though the techniques we are describing are for implementing dynamically typed languages. When languages include static types, it can be to provide safety properties for the programmer or to help give an optimizing compiler leverage. We are only interested in types for performance and the speculation strategy in JavaScriptCore can be thought of in broad strokes as inferring the kinds of types that a C program would have, but using an internal type system purpose built for our optimizing compiler. More generally, the techniques described in this post can be used to enable any kind of profile-guided optimizations, including ones that aren’t related to types. But both this post and JavaScriptCore focus on the kind of profiling and speculation that is most natural to think if as being about type (whether a variable is an integer, what object shapes a pointer points to, whether an operation has effects, etc).

To dive into this a bit deeper, we first consider the impact of types. Then we look at how speculation gives us types.

Impact of Types

We want to give dynamically typed languages the kind of optimizing compiler pipeline that would usually be found in ahead-of-time compilers for high-performance statically typed languages like C. The input to such an optimizer is typically some kind of internal representation (IR) that is precise about the type of each operation, or at least a representation from which the type of each operation can be inferred.

To understand the impact of types and how speculative compilers deal with them, consider this C function:

int foo(int a, int b)
    return a + b;

In C, types like int are used to describe variables, arguments, return values, etc. Before the optimizing compiler has a chance to take a crack at the above function, a type checker fills in the blanks so that the + operation will be represented using an IR instruction that knows that it is adding 32-bit signed integers (i.e. ints). This knowledge is essential:

  • Type information tells the compiler’s code generator how to emit code for this instruction. We know to use integer addition instructions (not double addition or something else) because of the int type.
  • Type information tells the optimizer how to allocate registers for the inputs and outputs. Integers mean using general purpose registers. Floating point means using floating point registers.
  • Type information tells the optimizer what optimizations are possible for this instruction. Knowing exactly what it does allows us to know what other operations can be used in place of it, allows us to do some algebraic reasoning about the math the program is doing, and allows us to fold the instruction to a constant if the inputs are constants. If there are types for which + has effects (like in C++), then the fact that this is an integer + means that it’s pure. Lots of compiler optimizations that work for + would not work if it wasn’t pure.

Now consider the same program in JavaScript:

function foo(a, b)
    return a + b;

We no longer have the luxury of types. The program doesn’t tell us the types of a or b. There is no way that a type checker can label the + operation as being anything specific. It can do a bunch of different things based on the runtime types of a and b:

  • It might be a 32-bit integer addition.
  • It might be a double addition.
  • It might be a string concatenation.
  • It might be a loop with method calls. Those methods can be user-defined and may perform arbitrary effects. This’ll happen if a or b are objects.
Figure 1. The best that a nonspeculative compiler can do if given a JavaScript plus operation. This figure depicts a control flow graph as a compiler like JavaScriptCore’s DFG might see. The Branch operation is like an if and has outgoing edges for the then/else outcomes of the condition.

Based on this, it’s not possible for an optimizer to know what to do. Instruction selection means emitting either a function call for the whole thing or an expensive control flow subgraph to handle all of the various cases (Figure 1). We won’t know which register file is best for the inputs or results; we’re likely to go with general purpose registers and then do additional move instructions to get the data into floating point registers in case we have to do a double addition. It’s not possible to know if one addition produces the same results as another, since they have loops with effectful method calls. Anytime a + happens we have to allow for the the possibility that the whole heap might have been mutated.

In short, it’s not practical to use optimizing compilers for JavaScript unless we can somehow provide types for all of the values and operations. For those types to be useful, they need to help us avoid basic operations like + seeming like they require control flow or effects. They also need to help us understand which instructions or register files to use. Speculative compilers get speed-ups by applying this kind of reasoning to all of the dynamic operations in a language — ranging from those represented as fundamental operations (like + or memory accesses like o.f and o[i]) to those that involve intrinsics or recognizable code patterns (like calling Function.prototype.apply).

Speculated Types

This post focuses on those speculations where the collected information can be most naturally understood as type information, like whether or not a variable is an integer and what properties a pointed-to object has (and in what order). Let’s appreciate two aspects of this more deeply: when and how the profiling and optimization happen and what it means to speculate on type.

Figure 2. Optimizing compilers for C and JavaScript.

Let’s consider what we mean by speculative compilation for JavaScript. JavaScript implementations pretend to be interpreters; they accept JS source as input. But internally, these implementations use a combination of interpreters and compilers. Initially, code starts out running in an execution engine that does no speculative type-based optimizations but collects profiling about types. This is usually an interpreter, but not always. Once a function has a satisfactory amount of profiling, the engine will start an optimizing compiler for that function. The optimizing compiler is based on the same fundamentals as the one found in a C compiler, but instead of accepting types from a type checker and running as a command-line tool, here it accepts types from a profiler and runs in a thread in the same process as the program it’s compiling. Once that compiler finishes emitting optimized machine code, we switch execution of that function from the profiling tier to the optimized tier. Running JavaScript code has no way of observing this happening to itself except if it measures execution time. (However, the environment we use for testing JavaScriptCore includes many hooks for introspecting what has been compiled.) Figure 2 illustrates how and when profiling and optimization happens when running JavaScript.

Roughly, speculative compilation means that our example function will be transformed to look something like this:

function foo(a, b)
    return a + b;

The tricky thing is what exactly it means to speculate. One simple option is what we call diamond speculation. This means that every time that we perform an operation, we have a fast path specialized for what the profiler told us and a slow path to handle the generic case:

if (is int)
    int add
    Call(slow path)

To see how that plays out, let’s consider a slightly different example:

var tmp1 = x + 42;
... // things
var tmp2 = x + 100;

Here, we use x twice, both times adding it to a known integer. Let’s say that the profiler tells us that x is an integer but that we have no way of proving this statically. Let’s also say that x‘s value does not change between the two uses and we have proved that statically.

Figure 3. Diamond speculation that x is an integer.

Figure 3 shows what happens if we speculate on the fact that x is an integer using a diamond speculation: we get a fast path that does the integer addition and a slow path that bails out to a helper function. Speculations like this can produce modest speed-ups at modest cost. The cost is modest because if the speculation is wrong, only the operations on x pay the price. The trouble with this approach is that repeated uses of x must recheck whether it is an integer. The rechecking is necessary because of the control flow merge that happens at the things block and again at more things.

The original solution to this problem was splitting, where the region of the program between things and more things would get duplicated to avoid the branch. An extreme version of this is tracing, where the entire remainder of a function is duplicated after any branch. The trouble with these techniques is that duplicating code is expensive. We want to minimize the number of times that the same piece of code is compiled so that we can compile a lot of code quickly. The closest thing to splitting that JavaScriptCore does is tail duplication, which optimizes diamond speculations by duplicating the code between them if that code is tiny.

A better alternative to diamond speculations or splitting is OSR (on stack replacement). When using OSR, a failing type check exits out of the optimized function back to the equivalent point in the unoptimized code (i.e. the profiling tier’s version of the function).

Figure 4. OSR speculation that x is an integer.

Figure 4 shows what happens when we speculate that x is an integer using OSR. Because there is no control flow merge between the case where x is an int and the case where it isn’t, the second check becomes redundant and can be eliminated. The lack of a merge means that the only way to reach the second check is if the first check passed.

OSR speculations are what gives our traditional optimizing compiler its static types. After any OSR-based type check, the compiler can assume that the property that was checked is now fact. Moreover, because OSR check failure does not affect semantics (we exit to the same point in the same code, just with fewer optimizations), we can hoist those checks as high as we want and infer that a variable always has some type simply by guarding all assignments to it with the corresponding type check.

Note that what we call OSR exit in this post and in JavaScriptCore is usually called deoptimization elsewhere. We prefer to use the term OSR exit in our codebase because it emphasizes that the point is to exit an optimized function using an exotic technique (OSR). The term deoptimization makes it seem like we are undoing optimization, which is only true in the narrow sense that a particular execution jumps from optimized code to unoptimized code. For this post we will follow the JavaScriptCore jargon.

JavaScriptCore uses OSR or diamond speculations depending on our confidence that the speculation will be right. OSR speculation has higher benefit and higher cost: the benefit is higher because repeated checks can be eliminated but the cost is also higher because OSR is more expensive than calling a helper function. However, the cost is only paid if the exit actually happens. The benefits of OSR speculation are so superior that we focus on that as our main speculation strategy, with diamond speculation being the fallback if our profiling indicates lack of confidence in the speculation.

Figure 5. Speculating with OSR and exiting to bytecode.

OSR-based speculation relies on the fact that traditional compilers are already good at reasoning about side exits. Trapping instructions (like for null check optimization in Java virtual machines), exceptions, and multiple return statements are all examples of how compilers already support exiting from a function.

Assuming that we use bytecode as the common language shared between the unoptimizing profiled tier of execution and the optimizing tier, the exit destinations can just be bytecode instruction boundaries. Figure 5 shows how this might work. The machine code generated by the optimizing compiler contains speculation checks against unlikely conditions. The idea is to do lots of speculations. For example, the prologue (the enter instruction in the figure) may speculate about the types of the arguments — that’s one speculation per argument. An add instruction may speculate about the types of its inputs and about the result not overflowing. Our type profiling may tell us that some variable tends to always have some type, so a mov instruction whose source is not proved to have that type may speculate that the value has that type at runtime. Accessing an array element (what we call get_by_val) may speculate that the array is really an array, that the index is an integer, that the index is in bounds, and that the value at the index is not a hole (in JavaScript, loading from a never assigned array element means walking the array’s prototype chain to see if the element can be found there — something we avoid doing most of the time by speculating that we don’t have to). Calling a function may speculate that the callee is the one we expected or at least that it has the appropriate type (that it’s something we can call).

While exiting out of a function is straightforward without breaking fundamental assumptions in optimizing compilers, entering turns out to be super hard. Entering into a function somewhere other than at its primary entrypoint pessimises optimizations at any merge points between entrypoints. If we allowed entering at every bytecode instruction boundary, this would negate the benefits of OSR exit by forcing every instruction boundary to make worst-case assumptions about type. Even allowing OSR entry just at loop headers would break lots of loop optimizations. This means that it’s generally not possible to reenter optimized execution after exiting. We only support entry in cases where the reward is high, like when our profiler tells us that a loop has not yet terminated at the time of compilation. Put simply, the fact that traditional compilers are designed for single-entry multiple-exit procedures means that OSR entry is hard but OSR exit is easy.

JavaScriptCore and most speculative compilers support OSR entry at hot loops, but since it’s not an essential feature for most applications, we’ll leave understanding how we do it as an exercise for the reader.

Figure 6. Speculation broken into the five topics of this post.

The main part of this post describes speculation in terms of its five components (Figure 6): the bytecode, or common IR, of the virtual machine that allows for a shared understanding about the meaning of profiling and exit sites between the unoptimized profiling tier and the optimizing tier; the unoptimized profiling tier that is used to execute functions at start-up, collect profiling about them, and to serve as an exit destination; the control system for deciding when to invoke the optimizing compiler; the optimizing tier that combines a traditional optimizing compiler with enhancements to support speculation based on profiling; and the OSR exit technology that allows the optimizing compiler to use the profiling tier as an exit destination when speculation checks fail.

Overview of JavaScriptCore

Figure 7. The tiers of JavaScriptCore.

JavaScriptCore embraces the idea of tiering and has four tiers for JavaScript (and three tiers for WebAssembly, but that’s outside the scope of this post). Tiering has two benefits: the primary benefit, described in the previous section, of enabling speculation; and a secondary benefit of allowing us to fine-tune the throughput-latency tradeoff on a per-function basis. Some functions run for so short — like straight-line run-once initialization code — that running any compiler on those functions would be more expensive than interpreting them. Some functions get invoked so frequently, or have such long loops, that their total execution time far exceeds the time to compile them with an aggressive optimizing compiler. But there are also lots of functions in the grey area in between: they run for not enough time to make an aggressive compiler profitable, but long enough that some intermediate compiler designs can provide speed-ups. JavaScriptCore has four tiers as shown in Figure 7:

  • The LLInt, or low-level interpreter, which is an interpreter that obeys JIT compiler ABI. It runs on the same stack as the JITs and uses a known set of registers and stack locations for its internal state.
  • The Baseline JIT, also known as a bytecode template JIT, which emits a template of machine code for each bytecode instruction without trying to reason about relationships between multiple instructions in the function. It compiles whole functions, which makes it a method JIT. Baseline does no OSR speculations but does have a handful of diamond speculations based on profiling from the LLInt.
  • The DFG JIT, or data flow graph JIT, which does OSR speculation based on profiling from the LLInt, Baseline, and in some rare cases even using profiling data collected by the DFG JIT and FTL JIT. It may OSR exit to either baseline or LLInt. The DFG has a compiler IR called DFG IR, which allows for sophisticated reasoning about speculation. The DFG avoids doing expensive optimizations and makes many compromises to enable fast code generation.
  • The FTL JIT, or faster than light JIT, which does comprehensive compiler optimizations. It’s designed for peak throughput. The FTL never compromises on throughput to improve compile times. This JIT reuses most of the DFG JIT’s optimizations and adds lots more. The FTL JIT uses multiple IRs (DFG IR, DFG SSA IR, B3 IR, and Assembly IR).

An ideal example of this in action is this program:

"use strict";

let result = 0;
for (let i = 0; i < 10000000; ++i) {
    let o = {f: i};
    result += o.f;


Thanks to the object allocation inside the loop, it will run for a long time until the FTL JIT can compile it. The FTL JIT will kill that allocation, so then the loop finishes quickly. The long running time before optimization virtually guarantees that the FTL JIT will take a stab at this program’s global function. Additionally, because the function is clean and simple, all of our speculations are right and there are no OSR exits.

Figure 8. Example timeline of a simple long loop executing in JavaScriptCore. Execution times recorded on my computer one day.

Figure 8 shows the timeline of this benchmark executing in JavaScriptCore. The program starts executing in the LLInt. After about a thousand loop iterations, the loop trigger causes us to start a baseline compiler thread for this code. Once that finishes, we do an OSR entry into the baseline JITed code at the for loop’s header. The baseline JIT also counts loop iterations, and after about a thousand more, we spawn the DFG compiler. The process repeats until we are in the FTL. When I measured this, I found that the DFG compiler needs about 4× the time of the baseline compiler, and the FTL needs about 6× the time of the DFG. While this example is contrived and ideal, the basic idea holds for any JavaScript program that runs long enough since all tiers of JavaScriptCore support the full JavaScript language.

Figure 9. JavaScriptCore tier architecture.

JavaScriptCore is architected so that having many tiers is practical. Figure 9 illustrates this architecture. All tiers share the same bytecode as input. That bytecode is generated by a compiler pipeline that desugars many language features, such as generators and classes, among others. In many cases, it’s possible to add new language features just by modifying the bytecode generation frontend. Once linked, the bytecode can be understood by any of the tiers. The bytecode can be interpreted by the LLInt directly or compiled with the baseline JIT, which mostly just converts each bytecode instruction into a preset template of machine code. The LLInt and Baseline JIT share a lot of code, mostly in the slow paths of bytecode instruction execution. The DFG JIT converts bytecode to its own IR, the DFG IR, and optimizes it before emitting code. In many cases, operations that the DFG chooses not to speculate on are emitted using the same code generation helpers as the Baseline JIT. Even operations that the DFG does speculate on often share slow paths with the Baseline JIT. The FTL JIT reuses the DFG’s compiler pipeline and adds new optimizations to it, including multiple new IRs that have their own optimization pipelines. Despite being more sophisticated than the DFG or Baseline, the FTL JIT shares slow path implementations with those JITs and in some cases even shares code generation for operations that we choose not to speculate on. Even though the various tiers try to share code whenever possible, they aren’t required to. Take the get_by_val (access an array element) instruction in bytecode. This has duplicate definitions in the bytecode liveness analysis (which knows the liveness rules for get_by_val), the LLInt (which has a very large implementation that switches on a bunch of the common array types and has good code for all of them), the Baseline (which uses a polymorphic inline cache), and the DFG bytecode parser. The DFG bytecode parser converts get_by_val to the DFG IR GetByVal operation, which has separate definitions in the DFG and FTL backends as well as in a bunch of phases that know how to optimize and model GetByVal. The only thing that keeps those implementations in agreement is good convention and extensive testing.

To give a feeling for the relative throughput of the various tiers, I’ll share some informal performance data that I’ve gathered over the years out of curiosity.

Figure 10. Relative performance of the four tiers on JetStream 2 on my computer at the time of that benchmark’s introduction.

We’re going to use the JetStream 2 benchmark suite since that’s the main suite that JavaScriptCore is tuned for. Let’s first consider an experiment where we run JetStream 2 with the tiers progressively enabled starting with the LLInt. Figure 10 shows the results: the Baseline and DFG are more than 2× better than the tier below them and the FTL is 1.1× better than the DFG.

The FTL’s benefits may be modest but they are unique. If we did not have the FTL, we would have no way of achieving the same peak throughput. A great example is the gaussian-blur subtest. This is the kind of compute test that the FTL is built for. I managed to measure the benchmark’s performance when we first introduced it and did not yet have a chance to tune for it. So, this gives a glimpse of the speed-ups that we expect to see from our tiers for code that hasn’t yet been through the benchmark tuning grind. Figure 11 shows the results. All of the JITs achieve spectacular speed-ups: Baseline is 3× faster than LLInt, DFG is 6× faster than Baseline, and FTL is 1.6× faster than DFG.

Figure 11. Relative performance of the four tiers on the guassian-blur subtest of JetStream 2.

The DFG and FTL complement one another. The DFG is designed to be a fast-running compiler and it achieves this by excluding the most aggressive optimizations, like global register allocation, escape analysis, loop optimizations, or anything that needs SSA. This means that the DFG will always get crushed on peak throughput by compilers that have those features. It’s the FTL’s job to provide those optimizations if a function runs long enough to warrant it. This ensures that there is no scenario where a hypothetical competing implementation could outperform us unless they had the same number of tiers. If you wanted to make a compiler that compiles faster than the FTL then you’d lose on peak throughput, but if you wanted to make a compiler that generates better code than the DFG then you’d get crushed on start-up times. You need both to stay in the game.

Another way of looking at the performance of these tiers is to ask: how much time does a bytecode instruction take to execute in each of the tiers on average? This tells us just about the throughput that a tier achieves without considering start-up at all. This can be hard to estimate, but I made an attempt at it by repeatedly running each JetStream 2 benchmark and having it limit the maximum tier of each function at random. Then I employed a stochastic counting mechanism to get an estimate of the number of bytecode instructions executed at each tier in each run. Combined with the execution times of those runs, this gave a simple linear regression problem of the form:

ExecutionTime = (Latency of LLInt) * (Bytecodes in LLInt)
              + (Latency of Baseline) * (Bytecodes in Baseline)
              + (Latency of DFG) * (Bytecodes in DFG)
              + (Latency of FTL) * (Bytecodes in FTL)

Where the Latency of LLInt means the average amount of time it takes to execute a bytecode instruction in LLInt.

After excluding benchmarks that spent most of their time outside JavaScript execution (like regexp and wasm benchmarks) and fiddling with how to weight benchmarks (I settled on solving each benchmarks separately and computing geomean of the coefficients since this matches JetStream 2 weighting), the solution I arrived at was:

Execution Time = (3.97 ns) * (Bytecodes in LLInt)
               + (1.71 ns) * (Bytecodes in Baseline)
               + (.349 ns) * (Bytecodes in DFG)
               + (.225 ns) * (Bytecodes in FTL)

In other words, Baseline executes code about 2× faster than LLInt, DFG executes code about 5× faster than Baseline, and the FTL executes code about 1.5× faster than DFG. Note how this data is in the same ballpark as what we saw for gaussian-blur. That makes sense since that was a peak throughput benchmark.

Although this isn’t a garbage collection blog post, it’s worth understanding a bit about how the garbage collector works. JavaScriptCore picks a garbage collection strategy that makes the rest of the virtual machine, including all of the support for speculation, easier to implement. The garbage collector has the following features that make speculation easier:

  • The collector scans the stack conservatively. This means that compilers don’t have to worry about how to report pointers to the collector.
  • The collector doesn’t move objects. This means that if a data structure (like the compiler IR) has many possible ways of referencing some object, we only have to report one of them to the collector.
  • The collector runs to fixpoint. This makes it possible to invent precise rules for whether objects created by speculation should be kept alive.
  • The collector’s object model is expressed in C++. JavaScript objects look like C++ objects, and JS object pointers look like C++ pointers.

These features make the compiler and runtime easier to write, which is great, since speculation requires us to write a lot of compiler and runtime code. JavaScript is a slow enough language even with the optimizations we describe in this post that garbage collector performance is rarely the longest pole in the tent. Therefore, our garbage collector makes many tradeoffs to make it easier to work on the performance-critical parts of our engine (like speculation). It would be unwise, for example, to make it harder to implement some compiler optimization as a way of getting a small garbage collector optimization, since the compiler has a bigger impact on performance for typical JavaScript programs.

To summarize: JavaScriptCore has four tiers, two of which do speculative optimizations, and all of which participate in the collection of profiling. The first two tiers are an interpreter and bytecode template JIT while the last two are optimizing compilers tuned for different throughput-latency trade-offs.

Speculative Compilation

Now that we’ve established some basic background about speculation and JavaScriptCore, this section goes into the details. First we will discuss JavaScriptCore’s bytecode. Then we show the control system for launching the optimizing compiler. Next will be a detailed section about how JavaScriptCore’s profiling tiers work, which focuses mostly on how they collect profiling. Finally we discuss JavaScriptCore’s optimizing compilers and their approach to OSR.


Speculation requires having a profiling tier and an optimizing tier. When the profiling tier reports profiling, it needs to be able to say what part of the code that profiling is for. When the optimizing compiler wishes to compile an OSR exit, it needs to be able to identify the exit site in a way that both tiers understand. To solve both issues, we need a common IR that is:

  • Used by all tiers as input.
  • Persistent for as long as the function that it represents is still live.
  • Immutable (at least for those parts that all tiers are interested in).

In this post, we will use bytecode as the common IR. This isn’t required; abstract syntax trees or even SSA could work as a common IR. We offer some insights into how we designed our bytecode for JavaScriptCore. JavaScriptCore’s bytecode is register-based, compact, untyped, high-level, directly interpretable, and transformable.

Our bytecode is register-based in the sense that operations tend to be written as:

add result, left, right

Which is taken to mean:

result = left + right

Where result, left, and right are virtual registers. Virtual registers may refer to locals, arguments, or constants in the constant pool. Functions declare how many locals they need. Locals are used both for named variables (like var, let, or const variables) and temporaries arising from expression tree evaluation.

Our bytecode is compact: each opcode and operand is usually encoded as one byte. We have wide prefixes to allow 16-bit or 32-bit operands. This is important since JavaScript programs can be large and the bytecode must persist for as long as the function it represents is still live.

Our bytecode is untyped. Virtual registers never have static type. Opcodes generally don’t have static type except for the few opcodes that have a meaningful type guarantee on their output (for example, the | operator always returns int32, so our bitor opcode returns int32). This is important since the bytecode is meant to be a common source of truth for all tiers. The profiling tier runs before we have done type inference, so the bytecode can’t have any more types than the JavaScript language.

Our bytecode is almost as high-level as JavaScript. While we use desugaring for many JavaScript features, we only do that when implementation by desugaring isn’t thought to cost performance. So, even the “fundamental” features of our bytecode are high level. For example, the add opcode has all of the power of the JavaScript + operator, including that it might mean a loop with effects.

Our bytecode is directly interpretable. The same bytecode stream that the interpreter executes is the bytecode stream that we will save in the cache (to skip parsing later) and feed to the compiler tiers.

Finally, our bytecode is transformable. Normally, intermediate representations use a control flow graph and make it easy to insert and remove instructions. That’s not how bytecode works: it’s an array of instructions encoded using a nontrivial variable-width encoding. But we do have a bytecode editing API and we use it for generatorification (our generator desugaring bytecode-to-bytecode pass). We can imagine this facility also being useful for other desugarings or for experimenting with bytecode instrumentation.

Compared to non-bytecode IRs, the main advantages of bytecode are that it’s easy to:

  • Identify targets for OSR exit. OSR exit in JavaScriptCore requires entering into an unoptimized bytecode execution engine (like an interpreter) at some arbitrary bytecode instruction. Using bytecode instruction index as a way of naming an exit target is intuitive since it’s just an integer.
  • Compute live state at exit. Register-based bytecode tends to have dense register numberings so it’s straightforward to analyze liveness using bitvectors. That tends to be fast and doesn’t require a lot of memory. It’s practical to cache the results of bytecode liveness analysis, for example.

JavaScriptCore’s bytecode format is independently implemented by the execution tiers. For example, the baseline JIT doesn’t try to use the LLInt to create its machine code templates; it just emits those templates itself and doesn’t try to match the LLInt exactly (the behavior is identical but the implementation isn’t). The tiers do share a lot of code – particularly for inline caches and slow paths – but they aren’t required to. It’s common for bytecode instructions to have algorithmically different implementations in the four tiers. For example the LLInt might implement some instruction with a large switch that handles all possible types, the Baseline might implement the same instruction with an inline cache that repatches based on type, and the DFG and FTL might try to do some combination of inline speculations, inline caches, and emitting a switch on all types. This exact scenario happens for add and other arithmetic ops as well as get_by_val/put_by_val. Allowing this independence allows each tier to take advantage of its unique properties to make things run faster. Of course, this approach also means that adding new bytecodes or changing bytecode semantics requires changing all of the tiers. For that reason, we try to implement new language features by desugaring them to existing bytecode constructs.

It’s possible to use any sensible IR as the common IR for a speculative compiler, including abstract syntax trees or SSA, but JavaScriptCore uses bytecode so that’s what we’ll talk about in the rest of this post.


Speculative compilation needs a control system to decide when to run the optimizing compiler. The control system has to balance competing concerns: compiling functions as soon as it’s profitable, avoiding compiling functions that aren’t going to run long enough to benefit from it, avoiding compiling functions that have inadequate type profiling, and recompiling functions if a prior compilation did speculations that turned out to be wrong. This section describes JavaScriptCore’s control system. Most of the heuristics we describe were necessary, in our experience, to make speculative compilation profitable. Otherwise the optimizing compiler would kick in too often, not often enough, or not at the right rate for the right functions. This section describes the full details of JavaScriptCore’s tier-up heuristics because we suspect that to reproduce our performance, one would need all of these heuristics.

JavaScriptCore counts executions of functions and loops to decide when to compile. Once a function is compiled, we count exits to decide when to throw away compiled functions. Finally, we count recompilations to decide how much to back off from recompiling a function in the future.

Execution Counting

JavaScriptCore maintains an execution counter for each function. This counter gets incremented as follows:

  • Each call to the function adds 15 points to the execution counter.
  • Each loop execution adds 1 point to the execution counter.

We trigger tier-up once the counter reaches some threshold. Thresholds are determined dynamically. To understand our thresholds, first consider their static versions and then let’s look at how we modulate these thresholds based on other information.

  • LLInt→Baseline tier-up requires 500 points.
  • Baseline→DFG tier-up requires 1000 points.
  • DFG→FTL tier-up requires 100000 points.

Over the years we’ve found ways to dynamically adjust these thresholds based on other sources of information, like:

  • Whether the function got JITed the last time we encountered it (according to our cache). Let’s call this wasJITed.
  • How big the function is. Let’s call this S. We use the number of bytecode opcodes plus operands as the size.
  • How many times it has been recompiled. Let’s call this R.
  • How much executable memory is available. Let’s use M to say how much executable memory we have total, and U is the amount we estimate that we would use (total) if we compiled this function.
  • Whether profiling is “full” enough.

We select the LLInt→Baseline threshold based on wasJITed. If we don’t know (the function wasn’t in the cache) then we use the basic threshold, 500. Otherwise, if the function wasJITed then we use 250 (to accelerate tier-up) otherwise we use 2000. This optimization is especially useful for improving page load times.

Baseline→DFG and DFG→FTL use the same scaling factor based on S, R, M, and U. The scaling factor is defined as follows:

(0.825914 + 0.061504 * sqrt(S + 1.02406)) * pow(2, R) * M / (M - U)

We multiply this by 1000 for Baseline→DFG and by 100000 for DFG→FTL. Let’s break down what this scaling factor does:

First we scale by the square root of the size. The expression 0.825914 + 0.061504 * sqrt(S + 1.02406) gives a scaling factor that is between 1 and 2 for functions smaller than about 350 bytecodes, which we consider to be “easy” functions to compile. The scaling factor uses square root so it grows somewhat gently. We’ve also tried having the staling factor be linear, but that’s much worse. It is worth it to delay compilations of large functions a bit, but it’s not worth it to delay it too much. Note that the ideal delay doesn’t just have to do with the cost of compilation. It’s also about running long enough to get good profiling. Maybe there is some deep reason why square root works well here, but all we really care about is that scaling by this amount makes programs run faster.

Then we introduce exponential backoff based on the number of times that the function has been recompiled. The pow(2, R) expression means that each recompilation doubles the thresholds.

After that we introduce a hyperbolic scaling factor, M / (M - U), to help avoid cases where we run out of executable memory altogether. This is important since some configurations of JavaScriptCore run with a small available pool of executable memory. This expression means that if we use half of executable memory then the thresholds are doubled. If we use 3/4 of executable memory then the thresholds are quadrupled. This makes filling up executable memory a bit like going at the speed of light: the math makes it so that as you get closer to filling it up the thresholds get closer to infinity. However, it’s worth noting that this is imperfect for truly large programs, since those might have other reasons to allocate executable memory not covered by this heuristic. The heuristic is also imperfect in cases of multiple things being compiled in parallel. Using this factor increases the maximum program size we can handle with small pools of executable memory, but it’s not a silver bullet.

Finally, if the execution count does reach this dynamically computed threshold, we check that some kinds of profiling (specifically, value and array profiling, discussed in detail in the upcoming profiling section) are full enough. We say that profiling is full enough if more than 3/4 of the profiling sites in the function have data. If this threshold is not met, we reset the execution counters. We let this process repeat five times. The optimizing compilers tend to speculate that unprofiled code is unreachable. This is profitable if that code really won’t ever run, but we want to be extra sure before doing that, hence we give functions with partial profiling 5× the time to warm up.

This is an exciting combination of heuristics! These heuristics were added early in the development of tiering in JSC. They were all added before we built the FTL, and the FTL inherited those heuristics just with a 100× multiplier. Each heuristic was added because it produced either a speed-up or a memory usage reduction or both. We try to remove heuristics that are not known to be speed-ups anymore, and to our knowledge, all of these still contribute to better performance on benchmarks we track.

Exit Counting

After we compile a function with the DFG or FTL, it’s possible that one of the speculations we made is wrong. This will cause the function to OSR exit back to LLInt or Baseline (we prefer Baseline, but may throw away Baseline code during GC, in which case exits from DFG and FTL will go to LLInt). We’ve found that the best way of dealing with a wrong speculation is to throw away the optimized code and try optimizing again later with better profiling. We detect if a DFG or FTL function should be recompiled by counting exits. The exit count thresholds are:

  • For a normal exit, we require 100 * pow(2, R) exits to recompile.
  • If the exit causes the Baseline JIT to enter its loop trigger (i.e. we got stuck in a hot loop after exit), then it’s counted specially. We only allow 5 * pow(2, R) of those kinds of exits before we recompile. Note that this can mean exiting five times and tripping the loop optimization trigger each time or it can mean exiting once and tripping the loop optimization trigger five times.

The first step to recompilation is to jettison the DFG or FTL function. That means that all future calls to the function will call the Baseline or LLInt function instead.


If a function is jettisoned, we increment the recompilation counter (R in our notation) and reset the tier-up functionality in the Baseline JIT. This means that the function will keep running in Baseline for a while (twice as long as it did before it was optimized last time). It will gather new profiling, which we will be able to combine with the profiling we collected before to get an even more accurate picture of how types behave in the function.

It’s worth looking at an example of this in action. We already showed an idealized case of tier-up in Figure 8, where a function gets compiled by each compiler exactly once and there are no OSR exits or recompilations. We will now show an example where things don’t go so well. This example is picked because it’s a particularly awful outlier. This isn’t how we expect our engine to behave normally. We expect amusingly bad cases like the following to happen occasionally since the success or failure of speculation is random and random behavior means having bad outliers.

_handlePropertyAccessExpression = function (result, node)
    result.possibleGetOverloads = node.possibleGetOverloads;
    result.possibleSetOverloads = node.possibleSetOverloads;
    result.possibleAndOverloads = node.possibleAndOverloads;
    result.baseType = Node.visit(node.baseType, this);
    result.callForGet = Node.visit(node.callForGet, this);
    result.resultTypeForGet = Node.visit(node.resultTypeForGet, this);
    result.callForAnd = Node.visit(node.callForAnd, this);
    result.resultTypeForAnd = Node.visit(node.resultTypeForAnd, this);
    result.callForSet = Node.visit(node.callForSet, this);
    result.errorForSet = node.errorForSet;

This function belongs to the WSL subtest of JetStream 2. It’s part of the WSL compiler’s AST walk. It ends up being a large function after inlining Node.visit. When I ran this on my computer, I found that JSC did 8 compilations before hitting equilibrium for this function:

  1. After running the function in LLInt for a bit, we compile this with Baseline. This is the easy part since Baseline doesn’t need to be recompiled.
  2. We compile with DFG. Unfortunately, the DFG compilation exits 101 times and gets jettisoned. The exit is due to a bad type check that the DFG emitted on this.
  3. We again compile with the DFG. This time, we exit twice due to a check on result. This isn’t enough times to trigger jettison and it doesn’t prevent tier-up to the FTL.
  4. We compile with the FTL. Unfortunately, this compilation gets jettisoned due to a failing watchpoint. Watchpoints (discussed in greater detail in later sections) are a way for the compiler to ask the runtime to notify it when bad things happen rather than emitting a check. Failing watchpoints cause immediate jettison. This puts us back in Baseline.
  5. We try the DFG again. We exit seven times due to a bad check on result, just like in step 3. This still isn’t enough times to trigger jettison and it doesn’t prevent tier-up to the FTL.
  6. We compile with the FTL. This time we exit 402 times due to a bad type check on node. We jettison and go back to Baseline.
  7. We compile with the DFG again. This time there are no exits.
  8. We compile with the FTL again. There are no further exits or recompilations.

This sequence of events has some intriguing quirks in addition to the number of compilations. Notice how in steps 3 and 5, we encounter exits due to a bad check on result, but none of the FTL compilations encounter those exits. This seems implausible since the FTL will do at least all of the speculations that the DFG did and a speculation that doesn’t cause jettison also cannot pessimise future speculations. It’s also surprising that the speculation that jettisons the FTL in step 6 wasn’t encountered by the DFG. It is possible that the FTL does more speculations than the DFG, but that usually only happens in inlined functions, and this speculation on node doesn’t seem to be in inlined code. A possible explanation for all of these surprising quirks is that the function is undergoing phase changes: during some parts of execution, it sees one set of types, and during another part of execution, it sees a somewhat different set. This is a common issue. Types are not random and they are often a function of time.

JavaScriptCore’s compiler control system is designed to get good outcomes both for functions where speculation “just works” and for functions like the one in this example that need some extra time. To summarize, control is all about counting executions, exits, and recompilations, and either launching a higher tier compiler (“tiering up”) or jettisoning optimized code and returning to Baseline.


This section describes the profiling tiers of JavaScriptCore. The profiling tiers have the following responsibilities:

  • To provide a non-speculative execution engine. This is important for start-up (before we do any speculation) and for OSR exits. OSR exit needs to exit to something that does no speculation so that we don’t have chains of exits for the same operation.
  • To record useful profiling. Profiling is useful if it enables us to make profitable speculations. Speculations are profitable if doing them makes programs run faster.

In JavaScriptCore, the LLInt and Baseline are the profiling tiers while DFG and FTL are the optimizing tiers. However, DFG and FTL also collect some profiling, usually only when it’s free to do so and for the purpose of refining profiling collected by the profiling tiers.

This section is organized as follows. First we explain how JavaScriptCore’s profiling tiers execute code. Then we explain the philosophy of how to profile. Finally we go into the details of JavaScriptCore’s profiling implementation.

How Profiled Execution Works

JavaScriptCore profiles using the LLInt and Baseline tiers. LLInt interprets bytecode while Baseline compiles it. The two tiers share a nearly identical ABI so that it’s possible to jump from one to the other at any bytecode instruction boundary.

LLInt: The Low Level Interpreter

The LLInt is an interpreter that obeys JIT ABI (in the style of HotSpot‘s interpreter). To that end, it is written in a portable assembly language called offlineasm. Offlineasm has a functional macro language (you can pass macro closures around) embedded in it. The offlineasm compiler is written in Ruby and can compile to multiple CPUs as well as C++. This section tells the story of why this crazy design produces a good outcome.

The LLInt simultaneously achieves multiple goals for JavaScriptCore:

  • LLInt is JIT-friendly. The LLInt runs on the same stack that the JITs run on (which happens to be the C stack). The LLInt even agrees on register conventions with the JITs. This makes it cheap for LLInt to call JITed functions and vice versa. It makes LLInt→Baseline and Baseline→LLInt OSR trivial and it makes any JIT→LLInt OSR possible.
  • LLInt allows us to execute JavaScript code even if we can’t JIT. JavaScriptCore in no-JIT mode (we call it “mini mode”) has some advantages: it’s harder to exploit and uses less memory. Some JavaScriptCore clients prefer the mini mode. JSC is also used on CPUs that we don’t have JIT support for. LLInt works great on those CPUs.
  • LLInt reduces memory usage. Any machine code you generate from JavaScript is going to be big. Remember, there’s a reason why they call JavaScript “high level” and machine code “low level”: it refers to the fact that when you lower JavaScript to machine code, you’re going to get many instructions for each JavaScript expression. Having the LLInt means that we don’t have to generate machine code for all JavaScript code, which saves us memory.
  • LLInt starts quickly. LLInt interprets our bytecode format directly. It’s designed so that we could map bytecode from disk and point the interpreter at it. The LLInt is essential for achieving great page load time in the browser.
  • LLInt is portable. It can be compiled to C++.

It would have been natural to write the LLInt in C++, since that’s what most of JavaScriptCore is written in. But that would have meant that the interpreter would have a C++ stack frame constructed and controlled by the C++ compiler. This would have introduced two big problems:

  1. It would be unclear how to OSR from the LLInt to the Baseline JIT or vice versa, since OSR would have to know how to decode and reencode a C++ stack frame. We don’t doubt that it’s possible to do this with enough cleverness, but it would create constraints on exactly how OSR works and it’s not an easy piece of machinery to maintain.
  2. JS functions running in the LLInt would have two stack frames instead of one. One of those stack frames would have to go onto the C++ stack (because it’s a C++ stack frame). We have multiple choices of how to manage the JS stack frame (we could try to alloca it on top of the C++ frame, or allocate it somewhere else) but this inevitably increases cost: calls into the interpreter would have to do twice the work. A common optimization to this approach is to have interpreter→interpreter calls reuse the same C++ stack frame by managing a separate JS stack on the side. Then you can have the JITs use that separate JS stack. This still leaves cost when calling out of interpreter to JIT or vice versa.

A natural way to avoid these problems is to write the interpreter in assembly. That’s basically what we did. But a JavaScript interpreter is a complex beast. It would be awful if porting JavaScriptCore to a new CPU meant rewriting the interpreter in another assembly language. Also, we want to use abstraction to write it. If we wrote it in C++, we’d probably have multiple functions, templates, and lambdas, and we would want all of them to be inlined. So we designed a new language, offlineasm, which has the following features:

  • Portable assembly with our own mnemonics and register names that match the way we do portable assembly in our JIT. Some high-level mnemonics require lowering. Offlineasm reserves some scratch registers to use for lowering.
  • The macro construct. It’s best to think of this as a lambda that takes some arguments and returns void. Then think of the portable assembly statements as print statements that output that assembly. So, the macros are executed for effect and that effect is to produce an assembly program. These are the execution semantics of offlineasm at compile time.

Macros allow us to write code with rich abstractions. Consider this example from the LLInt:

macro llintJumpTrueOrFalseOp(name, op, conditionOp)
    llintOpWithJump(op_%name%, op, macro (size, get, jump, dispatch)
        get(condition, t1)
        loadConstantOrVariable(size, t1, t0)
        btqnz t0, ~0xf, .slow
        conditionOp(t0, .target)



This is a macro that we use for implementing both jtrue and jfalse and opcodes. There are only three lines of actual assembly in this listing: the btqnz (branch test quad not zero) and the two labels (.target and .slow). This also shows the use of first-class macros: on the second line, we call llintOpWithJump and pass it a macro closure as the third argument. The great thing about having a lambda-like construct like macro is that we don’t need much else to have a pleasant programming experience. The LLInt is written in about 5000 lines of offlineasm (if you only count the 64-bit version).

To summarize, LLInt is an interpreter written in offlineasm. LLInt understands JIT ABI so calls and OSR between LLInt and JIT are cheap. The LLInt allows JavaScriptCore to load code more quickly, use less memory, and run on more platforms.

Baseline: The Bytecode Template JIT

The Baseline JIT achieves a speed-up over the LLInt at the cost of some memory and the time it takes to generate machine code. Baseline’s speed-up is thanks to two factors:

  • Removal of interpreter dispatch. Interpreter dispatch is the costliest part of interpretation, since the indirect branches used for selecting the implementation of an opcode are hard for the CPU to predict. This is the primary reason why Baseline is faster than LLInt.
  • Comprehensive support for polymorphic inline caching. It is possible to do sophisticated inline caching in an interpreter, but currently our best inline caching implementation is the one shared by the JITs.

The Baseline JIT compiles bytecode by turning each bytecode instruction into a template of machine code. For example, a bytecode instruction like:

add loc6, arg1, arg2

Is turned into something like:

0x2f8084601a65: mov 0x30(%rbp), %rsi
0x2f8084601a69: mov 0x38(%rbp), %rdx
0x2f8084601a6d: cmp %r14, %rsi
0x2f8084601a70: jb 0x2f8084601af2
0x2f8084601a76: cmp %r14, %rdx
0x2f8084601a79: jb 0x2f8084601af2
0x2f8084601a7f: mov %esi, %eax
0x2f8084601a81: add %edx, %eax
0x2f8084601a83: jo 0x2f8084601af2
0x2f8084601a89: or %r14, %rax
0x2f8084601a8c: mov %rax, -0x38(%rbp)

The only parts of this code that would vary from one add instruction to another are the references to the operands. For example, 0x30(%rbp) (that’s x86 for the memory location at frame pointer plus 0x30) is the machine code representation of arg1 in bytecode.

The Baseline JIT does few optimizations beyond just emitting code templates. It does no register allocation between instruction boundaries, for example. The Baseline JIT does some local optimizations, like if an operand to a math operation is a constant, or by using profiling information collected by the LLInt. Baseline also has good support for code repatching, which is essential for implementing inline caching. We discuss inline caching in detail later in this section.

To summarize, the Baseline JIT is a mostly unoptimized JIT compiler that focuses on removing interpreter dispatch overhead. This is enough to make it a ~2× speed-up over the LLInt.

Profiling Philosophy

Profiling in JSC is designed to be cheap and useful.

JavaScriptCore’s profiling aims to incur little or no cost in the common case. Running with profiling turned on but never using the results to do optimizations should result in throughput that is about as good as if all of the profiling was disabled. We want profiling to be cheap because even in a long running program, lots of functions will only run once or for too short to make an optimizing JIT profitable. Some functions might finish running in less time than it takes to optimize them. The profiling can’t be so expensive that it makes functions like that run slower.

Profiling is meant to help the compiler make the kinds of speculations that cause the program to run faster when we factor in both the speed-ups from speculations that are right and the slow-downs from speculations that are wrong. It’s possible to understand this formally by thinking of speculation as a bet. We say that profiling is useful if it turns the speculation into a value bet. A value bet is one where the expected value (EV) is positive. That’s another way of saying that the average outcome is profitable, so if we repeated the bet an infinite number of times, we’d be richer. Formally the expected value of a bet is:

p * B - (1 - p) * C

Where p is the probability of winning, B is the benefit of winning, and C is the cost of losing (both B and C are positive). A bet is a value bet iff:

p * B - (1 - p) * C > 0

Let’s view speculation using this formula. The scenario in which we have the choice to make a bet or not is that we are compiling a bytecode instruction, we have some profiling that implies that we should speculate, and we have to choose whether to speculate or not. Let’s say that B and C both have to do with the latency, in nanoseconds, of executing a bytecode instruction once. B is the improvement to that latency if we do some speculation and it turns out to be right. C is the regression to that latency if the speculation we make is wrong. Of course, after we have made a speculation, it will run many times and may be right sometimes and wrong sometimes. But B is just about the speed-up in the right cases, and C is just about the slow-down in the wrong cases. The baseline relative to which B and C are measured is the latency of the bytecode instruction if it was compiled with an optimizing JIT but without that particular OSR-exit-based speculation.

For example, we may have a less-than operation, and we are considering whether to speculate that neither input is double. We can of course compile less-than without making that speculation, so that’s the baseline. If we do choose to speculate, then B is the speed-up to the average execution latency of that bytecode in those cases when neither input is double. Meanwhile, C is the slow-down to the average execution latency of that bytecode in those cases when at least one input is a double.

For B, let’s just compute some bounds. The lower bound is zero, since some speculations are not profitable. A pretty good first order upper bound for B is the difference in per-bytecode-instruction latency between the baseline JIT and the FTL. Usually, the full speed-up of a bytecode instruction between baseline to FTL is the result of multiple speculations as well as nonspeculative compiler optimizations. So, a single speculation being responsible for the full difference in performance between baseline and FTL is a fairly conservative upper bound for B. Previously, we said that on average in the JetStream 2 benchmark on my computer, a bytecode instruction takes 1.71 ns to execute in Baseline and .225 ns to execute in FTL. So we can say:

B <= 1.71 ns - .225 ns = 1.48 ns

Now let’s estimate C. C is how many more nanoseconds it takes to execute the bytecode instruction if we have speculated and we experience speculation failure. Failure means executing an OSR exit stub and then reexecuting the same bytecode instruction in baseline or LLInt. Then, all subsequent bytecodes in the function will execute in baseline or LLInt rather than DFG or FTL. Every 100 exits or so, we jettison and eventually recompile. Compiling is concurrent, but running a concurrent compiler is sure to slow down the main thread even if there is no lock contention. To fully capture C, we have to account for the cost of the OSR exit itself and then amortize the cost of reduced execution speed of the remainder of the function and the cost of eventual recompilation. Fortunately, it’s pretty easy to measure this directly by hacking the DFG frontend to randomly insert pointless OSR exits with low probability and by having JSC report a count of the number of exits. I did an experiment with this hack for every JetStream 2 benchmark. Running without the synthetic exits, we get an execution time and a count of the number of exits. Running with synthetic exits, we get a longer execution time and a larger number of exits. The slope between these two points is an estimate of C. This is what I found, on the same machine that I used for running the experiments to compute B:

[DFG] C = 2499 ns
[FTL] C = 9998 ns

Notice how C is way bigger than B! This isn’t some slight difference. We are talking about three orders of magnitude for the DFG and four orders of magnitude for the FTL. This paints a clear picture: speculation is a bet with tiny benefit and enormous cost.

For the DFG, this means that we need:

p > 0.9994

For speculation to be a value bet. p has to be even closer to 1 for FTL. Based on this, our philosophy for speculation is we won’t do it unless we think that:

p ~ 1

Since the cost of speculation failure is so enormous, we only want to speculate when we know that we won’t fail. The speed-up of speculation happens because we make lots of sure bets and only a tiny fraction of them ever fail.

It’s pretty clear what this means for profiling:

  • Profiling needs to focus on noting counterexamples to whatever speculations we want to do. We don’t want to speculate if profiling tells us that the counterexample ever happened, since if it ever happened, then the EV of this speculation is probably negative. This means that we are not interested in collecting probability distributions. We just want to know if the bad thing ever happened.
  • Profiling needs to run for a long time. It’s common to wish for JIT compilers to compile hot functions sooner. One reason why we don’t is that we need about 3-4 “nines” of confidence that that the counterexamples didn’t happen. Recall that our threshold for tiering up into the DFG is about 1000 executions. That’s probably not a coincidence.

Finally, since profiling is a bet, it’s important to approach it with a healthy gambler’s philosophy: the fact that a speculation succeeded or failed in a particular program does not tell us if the speculation is good or bad. Speculations are good or bad only based on their average behavior. Focusing too much on whether profiling does a good job for a particular program may result in approaches that cause it to perform badly on average.

Profiling Sources in JavaScriptCore

JavaScriptCore gathers profiling from multiple different sources. These profiling sources use different designs. Sometimes, a profiling source is a unique source of data, but other times, profiling sources are able to provide some redundant data. We only speculate when all profiling sources concur that the speculation would always succeed. The following sections describe our profiling sources in detail.

Case Flags

Case flags are used for branch speculation. This applies anytime the best way to implement a JS operation involves branches and multiple paths, like a math operation having to handle either integers or doubles. The easiest way to profile and speculate is to have the profiling tiers implement both sides of the branch and set a different flag on each side. That way, the optimizing tier knows that it can profitably speculate that only one path is needed if the flags for the other paths are not set. In cases where there is clearly a preferred speculation — for example, speculating that an integer add did not overflow is clearly preferred overspeculating that it did overflow — we only need flags on the paths that we don’t like (like the overflow path).

Let’s consider two examples of case flags in more detail: integer overflow and property accesses on non-object values.

Say that we are compiling an add operation that is known to take integers as inputs. Usually the way that the LLInt interpreter or Baseline compiler would “know” this is that the add operation we’ll talk about is actually the part of a larger add implementation after we’ve already checked that the inputs are integers. Here’s the logic that the profiling tier would use written as if it was C++ code to make it easy to parse:

int32_t left = ...;
int32_t right = ...;
ArithProfile* profile = ...; // This is the thing with the case flags.
int32_t intResult;
JSValue result; // This is a tagged JavaScript value that carries type.
if (UNLIKELY(addOverflowed(left, right, &intResult))) {
    result = jsNumber(static_cast<double>(left) +

    // Set the case flag indicating that overflow happened.
} else
    result = jsNumber(intResult);

When optimizing the code, we will inspect the ArithProfile object for this instruction. If !profile->didObserveInt32Overflow(), we will emit something like:

int32_t left = ...;
int32_t right = ...;
int32_t result;
speculate(!addOverflowed(left, right, &result));

I.e. we will add and branch to an exit on overflow. Otherwise we will just emit the double path:

double left = ...;
double right = ...;
double result = left + right;

Unconditionally doing double math is not that expensive; in fact on benchmarks that I’ve tried, it’s cheaper than doing integer math and checking overflow. The only reason why integers are profitable is that they are cheaper to use for bit operations and pointer arithmetic. Since CPUs don’t accept floats or doubles for bit and pointer math, we need to convert the double to an integer first if the JavaScript program uses it that way (pointer math arises when a number is used as an array index). Such conversions are relatively expensive even on CPUs that support them natively. Usually it’s hard to tell, using profiling or any static analysis, whether a number that a program computed will be used for bit or pointer math in the future. Therefore, it’s better to use integer math with overflow checks so that if the number ever flows into an operation that requires integers, we won’t have to pay for expensive conversions. But if we learn that any such operation overflows — even occasionally — we’ve found that it’s more efficient overall to unconditionally switch to double math. Perhaps the presence of overflows is strongly correlated with the result of those operations not being fed into bit math or pointer math.

A simpler example is how case flags are used in property accesses. As we will discuss in the inline caches section, property accesses have associated metadata that we use to track details about their behavior. That metadata also has flags, like the sawNonCell bit, which we set to true if the property access ever sees a non-object as the base. If the flag is set, the optimizing compilers know not to speculate that the property access will see objects. This typically forces all kinds of conservatism for that property access, but that’s better than speculating wrong and exiting in this case. Lots of case flags look like sawNonCell: they are casually added as a bit in some existing data structure to help the optimizing compiler know which paths were taken.

To summarize, case flags are used to record counterexamples to the speculations that we want to do. They are a natural way to implement profiling in those cases where the profiling tiers would have had to branch anyway.

Case Counts

A predecessor to case flags in JavaScriptCore is case counts. It’s the same idea as flags, but instead of just setting a bit to indicate that a bad thing happened, we would count. If the count never got above some threshold, we would speculate.

Case counts were written before we realized that the EV of speculation is awful unless the probability of success is basically 1. We thought that we could speculate in cases where we knew we’d be right a majority of the time, for example. Initial versions of case counts had variable thresholds — we would compute a ratio with the execution count to get a case rate. That didn’t work as well as fixed thresholds, so we switched to a fixed count threshold of 100. Over time, we lowered the threshold to 20 or 10, and then eventually found that the threshold should really be 1, at which point we switched to case flags.

Some functionality still uses case counts. We still have case counts for determining if the this argument is exotic (some values of this require the function to perform a possibly-effectful conversion in the prologue). We still have case counts as a backup for math operations overflowing, though that is almost certainly redundant with our case flags for math overflow. It’s likely that we will remove case counts from JavaScriptCore eventually.

Value Profiling

Value profiling is all about inferring the types of JavaScript values (JSValues). Since JS is a dynamic language, JSValues have a runtime type. We use a 64-bit JSValue representation that uses bit encoding tricks to hold either doubles, integers, booleans, null, undefined, or pointers to cell, which may be JavaScript objects, symbols, or strings. We refer to the act of encoding a value in a JSValue as boxing it and the act of decoding as unboxing (note that boxing is a term used in other engines to refer specifically to the act of allocating a box object in the heap to hold a value; our use of the term boxing is more like what others call tagging). In order to effectively optimize JavaScript, we need to have some way of inferring the type so that the compiler can assume things about it statically. Value profiling tracks the set of values that a particular program point saw so that we can predict what types that program point will see in the future.

Figure 12. Value profiling and prediction propagation for a sample data flow graph.

We combine value profiling with a static analysis called prediction propagation. The key insight is that prediction propagation can infer good guesses for the types for most operations if it is given a starting point for certain opaque operations:

  • Arguments incoming to the function.
  • Results of most load operations.
  • Results of most calls.

There’s no way that a static analysis running just on some function could guess what types loads from plain JavaScript arrays or calls to plain JavaScript functions could have. Value profiling is about trying to help the static analysis guess the types of those opaque operations. Figure 12 shows how this plays out for a sample data flow graph. There’s no way static analysis can tell the type of most GetByVal and GetById oerations, since those are loads from dynamically typed locations in the heap. But if we did know what those operations return then we can infer types for this entire graph by using simple type rules for Add (like that if it takes integers as inputs and the case flags tell us there was no overflow then it will produce integers).

Let’s break down value profiling into the details of how exactly values are profiled, how prediction propagation works, and how the results of prediction propagation are used.

Recording value profiles. At its core, value profiling is all about having some program point (either a point in the interpreter or something emitted by the Baseline JIT) log the value that it saw. We log values into a single bucket so that each time the profiling point runs, it overwrites the last seen value. The code looks like this in the LLInt:

macro valueProfile(op, metadata, value)
    storeq value, %op%::Metadata::profile.m_buckets[metadata]

Let’s look at how value profiling works for the get_by_val bytecode instruction. Here’s part of the code for get_by_val in LLInt:

    op_get_by_val, OpGetByVal,
    macro (size, get, dispatch, metadata, return)
        macro finishGetByVal(result, scratch)
            get(dst, scratch)
            storeq result, [cfr, scratch, 8]
            valueProfile(OpGetByVal, t5, result)

        ... // more code for get_by_val

The implementation of get_by_val includes a finishGetByVal helper macro that stores the result in the right place on the stack and then dispatches to the next instruction. Note that it also calls valueProfile to log the result just before finishing.

Each ValueProfile object has a pair of buckets and a predicted type. One bucket is for normal execution. The valueProfile macro in the LLInt uses this bucket. The other bucket is for OSR exit: if we exit due to a speculation on a type that we got from value profiling, we feed the value that caused OSR exit back into the second bucket of the ValueProfile.

Each time that our execution counters (used for controlling when to invoke the next tier) count about 1000 points, the execution counting slow path updates all predicted types for the value profiles in that function. Updating value profiles means computing a predicted type for the value in the bucket and merging that type with the previously predicted type. Therefore, after repeated predicted type updates, the type will be broad enough to be valid for multiple different values that the code saw.

Predicted types use the SpeculatedType type system. A SpeculatedType is a 64-bit integer in which we use the low 40 bits to represent a set of 40 fundamental types. The fundamental types, shown in Figure 13, represent non-overlapping set of possible JSValues. 240 SpeculatedTypes are possible by setting any combination of bits.

Figure 13. All of the fundamental SpeculatedTypes.

This allows us to invent whatever types are useful for optimization. For example, we distinguish between 32-bit integers whose value is either 0 or 1 (BoolInt32) versus whose value is anything else (NonBoolInt32). Together these form the Int32Only type, which just has both bits set. BoolInt32 is useful for cases there integers are converted to booleans.

Prediction propagation. We use value profiling to fill in the blanks for the prediction propagation pass of the DFG compiler pipeline. Prediction propagation is an abstract interpreter that tracks the set of types that each variable in the program can have. It’s unsound since the types it produces are just predictions (it can produce any combination of types and at worst we will just OSR exit too much). However, it can be said that we optimize it to be sound; the more sound it is, the fewer OSR exits we have. Prediction propagation fills in the things that the abstract interpreter can’t reason about (loads from the heap, results returned by calls, arguments to the function, etc.) using the results of value profiling. On the topic of soundness, we would consider it to be a bug if the prediction propagation was unsound in a world where value profiling is never wrong. Of course, in reality, we know that value profiling will be wrong, so we know that prediction propagation is unsound.

Let’s consider some of the cases where prediction propagation can be sure about the result type of an operation based on the types of its inputs.

Figure 14. Some of the prediction propagation rules for Add. This figure doesn’t show the rules for string concatenation and objects. Figure 15. Some of the prediction propagation rules for GetByVal (the DFG opcode for subscript access like array[index]). This figure only shows a small sample of the GetByVal rules.

Figure 14 shows some of the rules for the Add operation in DFG IR. Prediction propagation and case flags tell us everything we want to know about the output of Add. If the inputs are integers and the overflow flag isn’t set, the output is an integer. If the inputs are any other kinds of numbers or there are overflows, the output is a double. We don’t need anything else (like value profiling) to understand the output type of Add.

Figure 15 shows some of the rules for GetByVal, which is the DFG representation of array[index]. In this case, there are types of arrays that could hold any type of value. So, even knowing that it is a JSArray isn’t enough to know the types of values inside the array. Also, if the index is a string, then this could be accessing some named property on the array object or one of its prototypes and those could have any type. It’s in cases like GetByVal that we leverage value profiling to guess what the result type is.

Prediction propagation combined with value profiling allows the DFG to infer a predicted type at every point in the program where a variable is used. This allows operations that don’t do any profiling on their own to still perform type-based speculations. It’s of course possible to also have bytecode instructions that can speculate on type collect case flags (or use some other mechanism) to drive those speculations — and that approach can be more precise — but value profiling means that we don’t have to do this for every operation that wants type-based speculation.

Using predicted types. Consider the CompareEq operation in DFG IR, which is used for the DFG lowering of the eq, eq_null, neq, neq_null, jeq, jeq_null, jneq, and jneq_null bytecodes. These bytecodes do no profiling of their own. But CompareEq is one of the most aggressive type speculators in all of the DFG. CompareEq can speculate on the types it sees without doing any profiling of its own because the values it uses will either have value profiling or will have a predicted type filled in by prediction propagation.

Type speculations in the DFG are written like:

CompareEq(Int32:@left, Int32:@right)

This example means that the CompareEq will specuate that both operands are Int32. CompareEq supports the following speculations, plus others we don’t list here:

CompareEq(Boolean:@left, Boolean:@right)
CompareEq(Int32:@left, Int32:@right)
CompareEq(Int32:BooleanToNumber(Boolean:@left), Int32:@right)
CompareEq(Int32:BooleanToNumber(Untyped:@left), Int32:@right)
CompareEq(Int32:@left, Int32:BooleanToNumber(Boolean:@right))
CompareEq(Int32:@left, Int32:BooleanToNumber(Untyped:@right))
CompareEq(Int52Rep:@left, Int52Rep:@right)
CompareEq(DoubleRep:DoubleRep(Int52:@left), DoubleRep:DoubleRep(Int52:@right))
CompareEq(DoubleRep:DoubleRep(Int52:@left), DoubleRep:DoubleRep(RealNumber:@right))
CompareEq(DoubleRep:DoubleRep(Int52:@left), DoubleRep:DoubleRep(Number:@right))
CompareEq(DoubleRep:DoubleRep(Int52:@left), DoubleRep:DoubleRep(NotCell:@right))
CompareEq(DoubleRep:DoubleRep(RealNumber:@left), DoubleRep:DoubleRep(RealNumber:@right))
CompareEq(DoubleRep:..., DoubleRep:...)
CompareEq(StringIdent:@left, StringIdent:@right)
CompareEq(String:@left, String:@right)
CompareEq(Symbol:@left, Symbol:@right)
CompareEq(Object:@left, Object:@right)
CompareEq(Other:@left, Untyped:@right)
CompareEq(Untyped:@left, Other:@right)
CompareEq(Object:@left, ObjectOrOther:@right)
CompareEq(ObjectOrOther:@left, Object:@right)
CompareEq(Untyped:@left, Untyped:@right)

Some of these speculations, like CompareEq(Int32:, Int32:) or CompareEq(Object:, Object:), allow the compiler to just emit an integer compare instruction. Others, like CompareEq(String:, String:), emit a string compare loop. We have lots of variants to optimally handle bizarre comparisons that are not only possible in JS but that we have seen happen frequently in the wild, like comparisons between numbers and booleans and comparisons between one value that is always a number and another that is either a number or a boolean. We provide additional optimizations for comparisons between doubles, comparisons between strings that have been hash-consed (so-called StringIdent, which can be compared using comparison of the string pointer), and comparisons where we don’t know how to speculate (CompareEq(Untyped:, Untyped:)).

The basic idea of value profiling — storing a last-seen value into a bucket and then using that to bootstrap a static analysis — is something that we also use for profiling the behavior of array accesses. Array profiles and array allocation profiles are like value profiles in that they save the last result in a bucket. Like value profiling, data from those profiles is incorporated into prediction propagation.

To summarize, value profiling allows us to predict the types of variables at all of their use sites by just collecting profiling at those bytecode instructions whose output cannot be predicted with abstract interpretation. This serves as the foundation for how the DFG (and FTL, since it reuses the DFG’s frontend) speculates on the types of JSValues.

Inline Caches

Property accesses and function calls are particularly difficult parts of JavaScript to optimize:

  • Objects behave as if they were just ordered mappings from strings to JSValues. Lookup, insertion, deletion, replacement, and iteration are possible. Programs do these operations a lot, so they have to be fast. In some cases, programs use objects the same way that programs in other languages would use hashtables. In other cases, programs use objects the same way that they would in Java or some sensibly-typed object-oriented language. Most programs do both.
  • Function calls are polymorphic. You can’t make static promises about what function will be called.

Both of these dynamic features are amenable to optimization with Deutsch and Schiffman’s inline caches (ICs). For dynamic property access, we combine this with structures, based on the idea of maps in the Chambers, Ungar, and Lee’s Self implementation. We also follow Hölzle, Chambers, and Ungar: our inline caches are polymorphic and we use data from these caches as profiling of the types observed at a particular property access or call site.

It’s worth dwelling a bit on the power of inline caching. Inline caches are great optimizations separately from speculative compilation. They make the LLInt and Baseline run faster. Inline caches are our most powerful profiling source, since they can precisely collect information about every type encountered by an access or call. Note that we previously said that good profiling has to be cheap. We think of inline caches as negative cost profiling since inline caches make the LLInt and Baseline faster. It doesn’t get cheaper than that!

This section focuses on inline caching for dynamic property access, since it’s strictly more complex than for calls (accesses use structures, polymorphic inline caches (PICs), and speculative compilation; calls only use polymorphic inline caches and speculative compilation). We organize our discussion of inline caching for dynamic property access as follows. First we describe how structures work. Then we show the JavaScriptCore object model and how it incorporates structures. Next we show how inline caches work. Then we show how profiling from inline caches is used by the optimizing compilers. After that we show how inline caches support polymorphism and polyvariance. Finally we talk about how inline caches are integrated with the garbage collector.

Structures. Objects in JavaScript are just mappings from strings to JSValues. Lookup, insertion, deletion, replacement, and iteration are all possible. We want to optimize those uses of objects that would have had a type if the language had given the programmer a way to say it.

Figure 16. Some JavaScript objects that have x and y properties. Some of them have exactly the same shape (only x and y in the same order).

Consider how to implement a property access like:

var tmp = o.x;


o.x = tmp;

One way to make this fast is to use hashtables. That’s certainly a necessary fallback mechanism when the JavaScript program uses objects more like hashtables than like objects (i.e. it frequently inserts and deletes properties). But we can do better.

This problem frequently arises in dynamic programming languages and it has a well-understood solution. The key insight of Chambers, Ungar, and Lee’s Self implementation is that property access sites in the program will typically only see objects of the same shape. Consider the objects in Figure 16 that have x and y properties. Of course it’s possible to insert x and y in two possible orders, but folks will tend to pick some order and stick to it (like x first). And of course it’s possible to also have objects that have a z property, but it’s less likely that a property access written as part of the part of the program that works with {x, y} objects will be reused for the part that uses {x, y, z}. It’s possible to have shared code for many different kinds of objects but unshared code is more common. Therefore, we split the object representation into two parts:

  • The object itself, which only contains the property values and a structure pointer.
  • The structure, which is a hashtable that maps property names (strings) to indices in the objects that have that structure.
Figure 17. The same objects as in Figure 16, but using structures.

Figure 17 shows objects represented using structures. Objects only contain object property values and a pointer to a structure. The structure tells the property names and their order. For example, if we wanted to ask the {1, 2} object in Figure 17 for the value of property x, we would load the pointer to its structure, {x, y}, and ask that structure for the index of x. The index is 0, and the value at index 0 in the {1, 2} object is 1.

A key feature of structures is that they are hash consed. If two objects have the same properties in the same order, they are likely to have the same structure. This means that checking if an object has a certain structure is O(1): just load the structure pointer from the object header and compare the pointer to a known value.

Structures can also indicate that objects are in dictionary or uncacheable dictionary mode, which are basically two levels of hashtable badness. In both cases, the structure stops being hash consed and is instead paired 1:1 with its object. Dictionary objects can have new properties added to them without the structure changing (the property is added to the structure in-place). Uncacheable dictionary objects can have properties deleted from them without the structure changing. We won’t go into these modes in too much detail in this post.

To summarize, structures are hashtables that map property names to indices in the object. Object property lookup uses the object’s structure to find the index of the property. Structures are hash consed to allow for fast structure checks.

Figure 18. The JavaScriptCode object model.

JavaScriptCore object model. JavaScriptCore uses objects with a 64-bit header that includes a 32-bit structure ID and 32 bits worth of extra state for GC, type checks, and arrays. Figure 18 shows the object model. Named object properties may end up either in the inline slots or the out-of-line slots. Objects get some number of inline slots based on simple static analysis around the allocation site. If a property is added that doesn’t fit in the inline slots, we allocate a butterfly to store additional properties out-of-line. Accessing out-of-line properties in the butterfly costs one extra load.

Figure 19 shows an example object that only has two inline properties. This is the kind of object you would get if you used the object literal {f:5, g:6} or if you assigned to the f and g properties reasonably close to the allocation.

Figure 19. Example JavaScriptCore object together with its structure.

Simple inline caches. Let’s consider the code:

var v = o.f;

Let’s assume that all of the objects that flow into this have structure 42 like the object in Figure 19. Inline caching this property access is all about emitting code like the following:

if (o->structureID == 42)
    v = o->inlineStorage[0]
    v = slowGet(o, "f")

But how do we know that o will have structure 42? JavaScript does not give us this information statically. Inline caches get this information by filling it in once the code runs. There are a number of techniques for this, all of which come down to self-modifying code. Let’s look at how the LLInt and Baseline do it.

In the LLInt, the metadata for get_by_id contains a cached structure ID and a cached offset. The cached structure ID is initialized to an absurd value that no structure can have. The fast path of get_by_id loads the property at the cached offset if the object has the cached structure. Otherwise, we take a slow path that does the full lookup. If that full lookup is cacheable, it stores the structure ID and offset in the metadata.

The Baseline JIT does something more sophisticated. When emitting a get_by_id, it reserves a slab of machine code space that the inline caches will later fill in with real code. The only code in this slab initially is an unconditional jump to a slow path. The slow path does the fully dynamic lookup. If that is deemed cacheable, the reserved slab is replaced with code that does the right structure check and loads at the right offset. Here’s an example of a get_by_id initially compiled with Baseline:

0x46f8c30b9b0: mov 0x30(%rbp), %rax
0x46f8c30b9b4: test %rax, %r15
0x46f8c30b9b7: jnz 0x46f8c30ba2c
0x46f8c30b9bd: jmp 0x46f8c30ba2c
0x46f8c30b9c2: o16 nop %cs:0x200(%rax,%rax)
0x46f8c30b9d1: nop (%rax)
0x46f8c30b9d4: mov %rax, -0x38(%rbp)

The first thing that this code does is check that o (stored in %rax) is really an object (using a test and jnz). Then notice the unconditional jmp followed by two long nop instructions. This jump goes to the same slow path that we would have branched to if o was not an object. After the slow path runs, this is repatched to:

0x46f8c30b9b0: mov 0x30(%rbp), %rax
0x46f8c30b9b4: test %rax, %r15
0x46f8c30b9b7: jnz 0x46f8c30ba2c
0x46f8c30b9bd: cmp $0x125, (%rax)
0x46f8c30b9c3: jnz 0x46f8c30ba2c
0x46f8c30b9c9: mov 0x18(%rax), %rax
0x46f8c30b9cd: nop 0x200(%rax)
0x46f8c30b9d4: mov %rax, -0x38(%rbp)

Now, the is-object check is followed by a structure check (using cmp to check that the structure is 0x125) and a load at offset 0x18.

Inline caches as a profiling source. The metadata we use to maintain inline caches makes for a fantastic profiling source. Let’s look closely at what this means.

Figure 20. Timeline of using an inline cache at each JIT tier. Note that we end up having to generate code for this `get_by_id` *six times* in the berst case that each tier compiles this only once.

Figure 20 shows a naive use of inline caches in a multi-tier engine, where the DFG JIT forgets everything that we learned from the Baseline inline cache and just compiles a blank inline cache. This is reasonably efficient and we fall back on this approach when the inline caches from the LLInt and Baseline tell us that there is unmanagable polymorphism. Before we go into how polymorphism is profiled, let’s look at how a speculative compiler really wants to handle simple monomorphic inline caches like the one in Figure 20, where we only see one structure (S1) and the code that the IC emits is trivial (load at offset 10 from %rax).

When the DFG frontend (shared by DFG and FTL) sees an operation like get_by_id that can be implemented with ICs, it reads the state of all ICs generated for that get_by_id. By “all ICs” we mean all ICs that are currently in the heap. This usually just means reading the LLInt and Baseline ICs, but if there exists a DFG or FTL function that generated an IC for this get_by_id then we will also read that IC. This can happen if a function gets compiled multiple times due to inlining — we may be compiling function bar that inlines a call to function foo and foo already got compiled with FTL and the FTL emitted an IC for our get_by_id.

If all ICs for a get_by_id concur that the operation is monomorphic and they tell us the structure to use, then the DFG frontend converts the get_by_id into inline code that does not get repatched. This is shown in Figure 21. Note that this simple get_by_id is lowered to two DFG operations: CheckStructure, which OSR exits if the given object does not have the required structure, and GetByOffset, which is just a load with known offset and field name.

Figure 21. Inlining a simple momomorphic inline cache in DFG and FTL.

CheckStructure and GetByOffset are understood precisely in DFG IR:

  • CheckStructure is a load to get the structure ID of an object and a branch to compare that structure ID to a constant. The compiler knows what structures are. After a CheckStructcure, the compiler knows that it’s safe to execute loads to any of the properties that the structure says that the object has.
  • GetByOffset is a load from either an inline or out-of-line property of a JavaScript object. The compiler knows what kind of property is being loaded, what its offset is, and what the name of the property would have been.

The DFG knows all about how to model these operations and the dependency between them:

  • The DFG knows that neither operation causes a side effect, but that the CheckStructure represents a conditional side exit, and both operations read the heap.
  • The DFG knows that two CheckStructures on the same structure are redundant unless some operation between them could have changed object structure. The DFG knows a lot about how to optimize away redundant structure checks, even in cases where there is a function call between two of them (more on this later).
  • The DFG knows that two GetByOffsets that speak of the same property and object are loading from the same memory location. The DFG knows how to do alias analaysis on those properties, so it can precisely know when a GetByOffset’s memory location got clobbered.
  • The DFG knows that if it wants to hoist a GetByOffset then it has to ensure that the corresponding CheckStructure gets hoisted first. It does this using abstract interpretation, so there is no need to have a dependency edge between these operations.
  • The DFG knows how to generate either machine code (in the DFG tier) or B3 IR (in the FTL tier) for CheckStructure and GetByOffset. In B3, CheckStructure becomes a Load, NotEqual, and Check, while GetByOffset usually just becomes a Load.
Figure 22. Inlining two momomorphic inline caches, for different properties on the same object, in DFG and FTL. The DFG and FTL are able to eliminate the CheckStructure for the second IC.

The biggest upshot of lowering ICs to CheckStructure and GetByOffset is the redundancy elimination. The most common redundancy we eliminate is multiple CheckStrutures. Lots of code will do multiple loads from the same object, like:

var f = o.f;
var g = o.g;

With ICs, we would check the structure twice. Figure 22 shows what happens when the speculative compilers inline these ICs. We are left with just a single CheckStructure instead of two thanks to the fact that:

  • CheckStructure is an OSR speculation.
  • CheckStructure is not an IC. The compiler knows exactly what it does, so that it can model it, so that it can eliminate it.

Let’s pause to appreciate what this technique gives us so far. We started out with a language in which property accesses seem to need hashtable lookups. A o.f operation requires calling some procedure that is doing hashing and so forth. But by combining inline caches, structures, and speculative compilation we have landed on something where some o.f operations are nothing more than load-at-offset like they would have been in C++ or Java. But this assumes that the o.f operation was monomorphic. The rest of this section considers minimorphism, polymorphism, and polyvariance.

Minimorphism. Certain kinds of polymorphic accesses are easier to handle than others. Sometimes an access will see two or more structures but all of those structures have the property at the same offset. Other times an access will see multiple structures and those structures do not agree on the offset of the property. We say that an access is minimorphic if it sees more than one structure and all structures agree on the offset of the property.

Our inline caches handle all forms of polymorphism by generating a stub that switches on the structure. But in the DFG, minimorphic accesses are special because they still qualify for full inlining. Consider an access o.f that sees structures S1 and S2, and both agree that f is at offset 0. Then we would have:

CheckStructure(@o, S1, S2)
GetByOffset(@o, 0)

This minimorphic CheckStructure will OSR exit if @o has none of the listed structures. Our optimizations for CheckStructure generally work for both monomorphic and minimorphic variants. So, minimorphism usually doesn’t hurt performance much compared to monomorphism.

Polymorphism. But what about some access sees different structures, and those structures have the property at different offsets? Consider an access to o.f that sees structures S1 = {f, g}, S2 = {f, g, h}, and S3 = {g, f}. This would be a minimorphic access if it was just S1 or S2, but S3 has f at a different offset. In this case, the FTL will convert this to:

MultiGetByOffset(@o, [S1, S2] => 0, [S3] => 1)

in DFG IR and then lower it to something like:

if (o->structureID == S1 || o->structureID == S2)
    result = o->inlineStorage[0]
    result = o->inlineStorage[1]

in B3 IR. In fact, we would use B3’s Switch since that’s the canonical form for this code pattern in B3.

Note that we only do this optimization in the FTL. The reason is that we want polymorphic accesses to remain ICs in the DFG so that we can use them to collect refined profiling.

Figure 23. Polyvariant inlining of an inline cache. The FTL can inline the inline cache in foo-inlined-into-bar after DFG compiles bar and uses an IC to collect polyvariant profiling about the get_by_id.

Polyvariance. Polyvariance is when an analysis is able to reason about a function differently depending on where it is called from. We achieve this by inlining in the DFG tier and keeping polymorphic ICs as ICs. Consider the following example. Function foo has an access to o.f that is polymorphic and sees structures S1 = {f, g}, S2 = {f, g, h}, and S3 = {g, f}:

function foo(o)
    // o can have structure S1, S2, or S3.
    return o.f;

This function is small, so it will be inlined anytime our profiling tells us that we are calling it (or may be calling it, since call inlining supports inlining polymorphic calls). Say that we have another function bar that always passes objects with structure S1 = {f, g} to foo:

function bar(p)
    // p.g always happens to have structure S1.
    return foo(p.g);

Figure 23 shows what happens. When the DFG compiles bar (step 3), it will inline foo based on the profiling of its call opcode (in step 2). But it will leave foo‘s get_by_id as an IC because foo‘s Baseline version told us that it’s polymorphic (also step 2). But then, since the DFG’s IC for foo‘s get_by_id is the context of that call from bar, it only ever sees S1 (step 4). So, when the FTL compiles bar and inlines foo, it knows that this get_by_id can be inlined with a monomorphic structure check for just S1 (step 5).

Inline caches also support more exotic forms of property access, like loading from objects in the prototype chain, calling accessors, adding/replacing properties, and even deleting properties.

Inline caches, structures, and garbage collection. Inline caches results in objects that are allocated and referenced only for inline caching. Structures are the most notorious example of these kinds of objects. Structures are particularly problematic because they need strong references to both the object’s prototype and its global object. In some cases, a structure will only be reachable from some inline cache, that inline cache will never run again (but we can’t prove it), and there is a large global object only referenced by that structure. It can be difficult to determine if that means that the structure has to be deleted or not. If it should be deleted, then the inline cache must be reset. If any optimized code inlined that inline cache, then that code must be jettisoned and recompiled. Fortunately, our garbage collector allows us to describe this case precisely. Since the garbage collector runs to fixpoint, we simply add the constraint that the pointer from an inline cache to a structure only marks the structure if the structure’s global object and prototype are already marked. Otherwise, the pointer behaves like a weak pointer. So, an inline cache will only be reset if the only way to reach the structure is through inline caches and the corresponding global object and prototype are dead. This is an example of how our garbage collector is engineered to make speculation easy.

To summarize, inline caching is an optimization employed by all of our tiers. In addition to making code run faster, inline caching is a high-precision profiling source that can tell us about the type cases that an operation saw. Combined with structures, inline caches allow us to turn dynamic property accesses into easy-to-optimize instructions.


We allow inline caches and speculative compilers to set watchpoints on the heap. A watchpoint in JavaScriptCore is nothing more than a mechanism for registering for notification that something happened. Most watchpoints are engineered to trigger only the first time that something bad happens; after that, the watchpoint just remembers that the bad thing had ever happened. So, if an optimizing compiler wants to do something that is valid only if some bad thing never happened, and the bad thing has a watchpoint, the compiler just checks if the watchpoint is still valid (i.e. the bad thing hasn’t happened yet) and then associates its generated code with the watchpoint (so the code will only get installed if the watchpoint is still valid when the code is done getting compiled, and will be jettisoned as soon as the watchpoint is fired). The runtime allows for setting watchpoints on a large number of activities. The following stick out:

  • It’s possible to set a watchpoint on structures to get a notification whenever any object switches from that structure to another one. This only works for structures whose objects have never transitioned to any other structure. This is called a structure transition watchpoint. It establishes a structure as a leaf in the structure transition tree.
  • It’s possible to set a watchpoint on properties in a structure to get a notification whenever the property is overwritten. Overwriting a property is easy to detect because the first time this happens, it usually involves repatching a put_by_id inline cache so that it’s in the property replacement mode. This is called a property replacement watchpoint.
  • It’s possible to set a watchpoint on the mutability of global variables.

Putting these watchpoints together gives the speculative compiler the ability to constant-fold object properties that happen to be immutable. Let’s consider a simple example:

Math.pow(42, 2)

Here, Math is a global property lookup. The base object is known to the compiler: it’s the global object that the calling code belongs to. Then, Math.pow is a lookup of the pow propery on the Math object. It’s extremely unlikely that the Math property of the global object or the pow property of the Math object had ever been overwritten. Both the global object and the Math object have structures that are unique to them (both because those structures have special magic since those are special objects and because those objects have what is usually a globally unique set of properties), which guarantees that they have leaf structures, so the structure transition watchpoint can be set. Therefore, except for pathological programs, the expression Math.pow is compiled to a constant by the speculative compiler. This makes lots of stuff fast:

  • It’s common to have named and scoped enumerations using objects and object properties, like TypeScript.NodeType.Error in the typescript compiler benchmark in JetStream 2. Watchpoints make those look like a constant to the speculative compiler.
  • Method calls like o.foo(things) are usually turned just into a structure check on o and a direct call. Once the structure is checked, watchpoints establish that the object’s prototype has a property called foo and that this property has some constant value.
  • Inline caches use watchpoints to remove some checks in their generated stubs.
  • The DFG can use watchpoints to remove redundant CheckStructures even when there is a side effect between them. If we set the structure transition watchpoint then we know that no effect can change the structure of any object that has this structure.
  • Watchpoints are used for lots of miscellaneous corner cases of JavaScript, like having a bad time.

To summarize, watchpoints let inline caches and the speculative compilers fold certain parts of the heap’s state to constants by getting a notification when things change.

Exit Flags

All of the profiling sources in our engine have a chance of getting things wrong. Profiling sources get things wrong because:

  • The program may change behavior between when we collected the profiling and when we speculated on it.
  • The profiling has some stochastic element and the program is getting unlucky, leading to wrong profiling.
  • The profiling source has a logic bug that makes it not able to see that something happened.
  • We neglected to implement a profiler for something and instead just speculated blind.

The first of these issues – behavior change over time – is inevitable and is sure to happen for some functions in any sufficiently large program. Big programs tend to experience phase changes, like some subroutine going from being called from one part of a larger library that uses one set of types, to being called from a different part with different types. Those things inevitably cause exits. The other three issues are all variants of the profiling being broken. We don’t want our profiling to be broken, but we’re only human. Recall that for speculation to have good EV, the probability of being right has to be about 1. So, it’s not enough to rely on profiling that was written by imperfect lifeforms. Exit flags are a check on the rest of the profiling and are there to ensure that we get things right eventually for all programs.

In JavaScriptCore, every OSR exit is tagged with an exit kind. When a DFG or FTL function exits enough times to get jettisoned, we record all of the exit kinds that happened along with the bytecode locations that semantically caused the exits (for example if we do a type check for add at bytecode #63 but then hoist the check so that it ends up exiting to bytecode #45, then we will blame #63 not #45). Whenever the DFG or FTL decide whether to perform a kind of speculation, they are expected to check whether there is an exit flag for that speculation at the bytecode that we’re compiling. Our exit flag checking discipline tends to be strictly better than our profiling discipline, and it’s way easier to get right — every phase of the DFG has fast access to exit flags.

Here’s an example of an actual OSR exit check in DFG:

    OutOfBounds, JSValueRegs(), 0,
        MacroAssembler::Address(storageReg, Butterfly::offsetOfPublicLength())));

Note that the first argument is OutOfBounds. That’s an example exit kind. Here’s another example, this time from the FTL:

speculate(NegativeZero, noValue(), nullptr, m_out.lessThan(left, m_out.int32Zero));

Again, the the first argument is the exit kind. This time it’s NegativeZero. We have 26 exit kinds, most of which describe a type check condition (some are used for other uses of OSR, like exception handling).

We use the exit kinds by querying if an exit had happened at the bytecode location we are compiling when choosing whether to speculate. We typically use the presence of an exit flag as an excuse not to speculate at all for that bytecode. We effectively allow ourselves to overcompensate a bit. The exit flags are a check on the rest of the profiler. They are telling the compiler that the profiler had been wrong here before, and as such, shouldn’t be trusted anymore for this code location.

Summary of Profiling

JavaScriptCore’s profiling is designed to be cheap and useful. Our best profiling sources tend to either involve minimal instrumentation (like just setting a flag or storing a value to a known location) or be intertwined with optimizations (like inline caching). Our profilers gather lots of rich information and in some cases we even collect information redundantly. Our profiling is designed to help us avoid making speculative bets that turn out to be wrong even once.

Compilation and OSR

Now that we have covered bytecode, control, and profiling, we can get to the really fun part: how to build a great speculative optimizing compiler. We will discuss the OSR aspect of speculation in tandem with our descriptions of the two optimizing compilers.

This section is organized into three parts. First we give a quick and gentle introduction to DFG IR, the intermediate representation used by both the DFG and FTL tiers. Then we describe the DFG tier in detail, including how it handles OSR. Finally we describe how the FTL tier works.


The most important component of a powerful optimizing compiler is the IR. We want to have the best possible speculative optimizing compiler for JavaScript, so we have the following goals for our IR:

  • The IR has to describe all of the parts of the program that are interesting to the optimizer. Like other high quality optimizing IRs, DFG IR has good support for talking about data flow, aliasing, effects, control flow, and debug information. Additionally, it’s also good at talking about profiling data, speculation decisions, and OSR.
  • The IR has to be mutable. Anything that is possible to express when first lowering a program to the IR should also be expressible during some later optimization. We prefer that decisions made during lowering to the IR can be be refined by optimizations later.
  • The IR has to have some validation support. It’s got to be possible to catch common mistakes in a validator instead of debugging generated code.
  • The IR has to be purpose-built. If there exists an optimization whose most comprehensive implementation requires a change to the IR or one of its core data structures, then we need to be able to make that change without asking anyone for permission.

Note that IR mutability is closely tied to how much it describes and how easy it is to validate. Any optimization that tries to transform one piece of code into a different, better, piece of code needs to be able to determine if the new code is a valid replacement for the old code. Generally, the more information the IR carries and the easier it is to validate, the easier it is to write the analyses that guard optimizations.

Let’s look at what the DFG IR looks like using a simple example:

function foo(a, b)
    return a + b;

This results in bytecode like:

[   0] enter             
[   1] get_scope         loc3
[   3] mov               loc4, loc3
[   6] check_traps       
[   7] add               loc6, arg1, arg2
[  12] ret               loc6

Note that only the last two lines (add and ret) are important. Let’s look at the DFG IR that we get from lowering those two bytecode instructions:

  23:  GetLocal(Untyped:@1, arg1(B<Int32>/FlushedInt32), R:Stack(6), bc#7)
  24:  GetLocal(Untyped:@2, arg2(C<BoolInt32>/FlushedInt32), R:Stack(7), bc#7)
  25:  ArithAdd(Int32:@23, Int32:@24, CheckOverflow, Exits, bc#7)
  26:  MovHint(Untyped:@25, loc6, W:SideState, ClobbersExit, bc#7, ExitInvalid)
  28:  Return(Untyped:@25, W:SideState, Exits, bc#12)

In this example, we’ve lowered the add opcode to four operations: two GetLocals to get the argument values from the stack (we load them lazily and this is the first operation that needs them), a speculative ArithAdd instruction, and a MovHint to tell the OSR part of the compiler about the ArithAdd. The ret opcode is just lowered to a Return.

In DFG jargon, the instructions are usually called nodes, but we use the terms node, instruction, and operation interchangeably. DFG nodes are simultaneously nodes in a data flow graph and instructions inside of a control flow graph, with semantics defined “as if” they executed in a particular order.

Figure 24. Explanation of an example ArithAdd DFG instruction.

Let’s consider the ArithAdd in greater detail (Figure 24). This instruction is interesting because it’s exactly the sort of thing that the DFG is designed to optimize: it represents a JavaScript operation that is dynamic and impure (it may call functions) but here we have inferred it to be free of side effects using the Int32: type speculations. These indicate that that before doing anything else, this instruction will check that its inputs are Int32’s. Note that the type speculations of DFG instructions should be understood like function overloads. ArithAdd also allows for both operands to be double or other kinds of integer. It’s as if ArithAdd was a C++ function that had overloads that took a pair of integers, a pair of doubles, etc. It’s not possible to add any type speculation to any operand, since that may result in an instruction overload that isn’t supported.

Another interesting feature of this ArithAdd is that it knows exactly which bytecode instruction it originated from and where it will exit to. These are separate fields in the IR (the semantic and forExit origins) but when the are equal we dump them as one, bc#7 in the case of this instruction.

Any DFG node that may exit will have the Exits flag. Note that we set this flag conservatively. For example, the Return in our example has it set not because Return exits but because we haven’t found a need to make the exit analysis any more precise for that instruction.

Figure 25. Example data flow graph.

DFG IR can be simultaneously understood as a sequence of operations that should be performed as if in the given order and as a data flow graph with backwards pointers. The data flow graph view of our running example is shown in Figure 25. This view is useful since lots of optimizations are concerned with asking questions like: “what instructions produce the values consumed by this instruction?” These data flow edges are the main way that values move around in DFG IR. Also, representing programs this way makes it natural to add SSA form, which we do in the FTL.

Figure 26. DFG and FTL compiler architecture. The pass pipeline depicted above the dotten line is shared between the DFG and FTL compilers. Everything below the dotted line is specialized for DFG or FTL.

DFG, in both non-SSA and SSA forms, forms the bulk of the DFG and FTL compilers. As shown in Figure 26, both JITs share the same frontend for parsing bytecode and doing some optimizations. The difference is what happens after the DFG optimizer. In the DFG tier, we emit machine code directly. In the FTL tier, we convert to DFG SSA IR (which is almost identical to DFG IR but uses SSA to represent data flow) and do more optimizations, and then lower through two additional optimizers (B3 and Assembly IR or Air). The remaining sections talk about the DFG and FTL compilers. The section on the DFG compiler covers the parts of DFG and FTL that are common.

DFG Compiler

The point of the DFG compiler is to remove lots of type checks quickly. Fast compilation is the DFG feature that differentiates it from the FTL. To get fast compilation, the DFG lacks SSA, can only do very limited code motion, and uses block-local versions of most optimizations (common subexpression elimination, register allocation, etc). The DFG has two focus areas where it does a great job despite compiling quickly: how it handles OSR and how it uses static analysis.

This section explains the DFG by going into these three concepts in greater detail:

  • OSR exit as a first-class concept in the compiler.
  • Static analysis as the main driver of optimization.
  • Fast compilation so that we get the benefits of optimization as soon as possible.

OSR Exit

OSR is all about flattening control flow by making failing checks exit sideways. OSR is a difficult optimization to get right. It’s especially difficult to reason about at a conceptual level. This section tries to demystify OSR exit. We’re going to explain the DFG compiler’s approach to OSR, which includes both parts that are specific to the DFG tier and parts that are shared with the FTL. The FTL section explains extensions to this approach that we use to do more aggressive optimizations.

Our discussion proceeds as follows. First we use a high-level example to illustrate what OSR exit is all about. Then we describe what OSR exit means at the machine level, which will take us into the details of how optimizing compilers handle OSR. We will show a simple OSR exit IR idea based on stackmaps to give a sense of what we’re trying to achieve and then we describe how DFG IR compresses stackmaps. Finally we talk about how OSR exit is integrated with watchpoints and invalidation.

High-level OSR example. To start to demystify DFG exit, let’s think of it as if it was an optimization we were doing to a C program. Say we had written code like:

int foo(int* ptr)
    int w, x, y, z;
    w = ... // lots of stuff
    x = is_ok(ptr) ? *ptr : slow_path(ptr);
    y = ... // lots of stuff
    z = is_ok(ptr) ? *ptr : slow_path(ptr);
    return w + x + y + z;

Let’s say we wanted to optimize out the second is_ok check. We could do that by duplicating all of the code after the first is_ok check, and having one copy statically assume that is_ok is true while another copy either assumes it’s false or makes no assumptions. This might make the fast path look like:

int foo(int* ptr)
    int w, x, y, z;
    w = .. // lots of stuff
    if (!is_ok(ptr))
        return foo_base1(ptr, w);
    x = *ptr;
    y = ... // lots of stuff
    z = *ptr;
    return w + x + y + z;

Where foo_base1 is the original foo function after the first is_ok check. It takes the live state at that point as an argument and looks like this:

int foo_base1(int* ptr, int w)
    int x, y, z;
    x = is_ok(ptr) ? *ptr : slow_path(ptr);
    y = ... // lots of stuff
    z = is_ok(ptr) ? *ptr : slow_path(ptr);
    return w + x + y + z;

What we’ve done here is OSR exit. We’re optimizing control flow on the fast path (removing one is_ok check) by exiting (tail-calling foo_base1) if !is_ok. OSR exit requires:

  • Somewhere to exit, like foo_base1 in this case. It should be a thing that can complete execution of the current function without getting stuck on the same speculation.
  • The live state at exit, like ptr and w in this case. Without that, the exit target can’t pick up where we left off.

That’s OSR exit at a high level. We’re trying to allow an optimizing compiler to emit checks that exit out of the function on failure so that the compiler can assume that the same check won’t be needed later.

OSR at the machine level. Now let’s look at what OSR exit looks like at a lower level. Figure 27 shows an example of OSR at a particular bytecode index.

Figure 27. OSR exit at the machine level for an example bytecode instruction.

OSR is all about replacing the current stack frame and register state, which correspond to some bytecode index in the optimizing tier, with a different frame and register state, which correspond to the same point in the profiling tier. This is all about shuffling live data from one format to another and jumping to the right place.

Knowing where to jump to is easy: each DFG node (aka instruction or operation) has forExit, or just exit, origin that tells us which bytecode location to exit to. This may even be a bytecode stack in case of inlining.

The live data takes a bit more effort. We have to know what the set of live data is and what its format is in both the profiling and optimizing tiers. It turns out that knowing what the set of live data is and how to represent it for the profiling tiers is easy, but extracting that data from the optimizing tier is hard.

First let’s consider what’s live. The example in Figure 27 says that we’re exiting at an add and it has loc3, loc4, and loc8 live before. We can solve for what’s live at any bytecode instruction by doing a liveness analysis. JavaScriptCore has an optimized bytecode liveness analysis for this purpose.

Note that the frame layout in the profiling tier is an orderly representation of the bytecode state. In particular, locN just means framePointer - 8 * N and argN just means framePointer + FRAME_HEADER_SIZE + 8 * N, where FRAME_HEADER_SIZE is usually 40. The only difference between frame layouts between functions in the profiling tier is the frame size, which is determined by a constant in each bytecode function. Given the frame pointer and the bytecode virtual register name, it’s always possible to find out where on the stack the profiling tiers would store that variable. This makes it easy to figure out how to convert any bytecode live state to what the Baseline JIT or LLInt would expect.

The hard part is the optimizing tier’s state. The optimizing compiler might:

  • Allocate the stack in any order. Even if a variable is on the stack, it may be anywhere.
  • Register-allocate a variable. In that case there may not be any location on the stack that contains the value of that variable.
  • Constant-fold a variable. In that case there may not be any location on the stack or in the register file that contains the value of that variable.
  • Represent a variable’s value in some creative way. For example, your program might have had a statement like x = y + z but the compiler chose to never actually emit the add except lazily at points of use. This can easily happen because of pattern-matching instruction selection on x86 or ARM, where some instructions (like memory accesses) can do some adds for free as part of address computation. We do an even more aggressive version of this for object allocations: some program variable semantically points to an object, but because our compiler is smart, we never actually allocated any object and the object’s fields may be register-allocated, constant-folded, or represented creatively.

We want to allow the optimizing compiler to do things like this, since we want OSR exit to be an enabler of optimization rather than an inhibitor. This turns out to be tricky: how do we let the optimizing compiler do all of the optimizations that it likes to do while still being able to tell us how to recover the bytecode state?

The trick to extracting the optimized-state-to-bytecode-state shuffle from the optimizing compiler is to leverage the original bytecode→IR conversion. The main difference between an SSA-like IR (like DFG IR) and bytecode is that it represents data flow relationships instead of variables. While bytecode says add x, y, z, DFG IR would have an Add node that points to the nodes that produced y and z (like in Figure 25). The conversion from bytecode to DFG IR looks like this pseudocode:

case op_add: {
    VirtualRegister result = instruction->result();
    VirtualRegister left   = instruction->left();
    VirtualRegister right  = instruction->right();

    stackMap[result] = createAdd(
        stackMap[left], stackMap[right]);

This uses a standard technique for converting variable-based IRs to data-flow-based IRs: the converter maintains a mapping from variables in the source IR to data flow nodes in the target IR. We’re going to call this the stackMap for now. Each bytecode instruction is handled by modeling the bytecode’s data flow: we load the left and right operands from the stackMap, which gives us the DFG nodes for those locals’ values. Then we create an ArithAdd node and store it into the result local in the stackMap to model the fact that the bytecode wanted to store the result to that local. Figure 28 shows the before-and-after of running this on the add bytecode in our running example.

Figure 28. Example of stackMap before and after running the SSA conversion on add at bc#42 along with an illustration of the data flow graph around the resulting ArithAdd.

The stackMap, pruned to bytecode liveness as we are doing in these examples, represents the set of live state that would be needed to be recovered at any point in bytecode execution. It tells us, for each live bytecode local, what DFG node to use to recover the value of that local. A simple way to support OSR would be to give each DFG node that could possibly exit a data flow edge to each node in the liveness-pruned stackMap.

This isn’t what the DFG actually does; DFG nodes do not have data flow edges for the stackmap. Doing literally that would be too costly in terms of memory usage since basically every DFG node may exit and stackmaps have O(live state) entries. The DFG’s actual approach is based on delta-compression of stackmaps. But it’s worth considering exactly how this uncompressed stackmap approach would work because it forms part of the FTL’s strategy and it gives a good mental model for understanding the DFG’s more sophisticated approach. So, we will spend some time describing the DFG IR as if it really did have stackmaps. Then we will show how the stackmap is expressed using delta compression.

OSR exit with uncompressed stackmaps. Imagine that DFG nodes really had extra operands for the stackmap. Then we would have an ArithAdd like the following, assuming that bc#42 is the exit origin and that loc3, loc4, and loc8 are live, as they are in Figures 27 and 28:

c: ArithAdd(@a, @b, loc3->@s, loc4->@a, loc8->@b, bc#42)

In this kind of IR, we’d let the first two operands of ArithAdd behave the expected way (they are the actual operands to the add), and we’d treat all of the other operands as the stackmap. The exit origin, bc#42, is a control flow label. Together, this tells the ArithAdd where to exit (bc#42) and the stackmap (@s, @a, and @b). The compiler treats the ArithAdd, and the stackmap operands, as if the ArithAdd had a side exit from the function the compiler was compiling.

One way to think about it is in terms of C pseudocode. We are saying that the semantics of ArithAdd and any other instruction that may exit are as if they did the following before any of their effects:

if (some conditions)
    return OSRExit(bc#42, {loc3: @s, loc4: @a, loc8: @b});

Where the return statement is an early return from the compiled function. So, this terminates the execution of the compiled function by tail-calling (jumping to) the OSRExit. That operation will transfer control to bc#42 and pass it the given stackmap.

Figure 29. Example of control flow in a compiler with OSR exit. OSR exit means having an additional implicit set of control flow edges that come out of almost every instruction and represent a side exit from the control flow graph.

This is easy to model in a compiler. We don’t allocate any kind of control flow constructs to represent the condition check and side exit but we assume it to exist implicitly when analyzing ArithAdd or any other node that may exit. Note that in JavaScript, basically every instruction is going to possibly exit, and JavaScriptCore’s may exit analysis defaults to true for most operations. Figure 29 illustrates what this looks like. We are going to have three kinds of control flow edges instead of the usual two:

  1. The normal control flow edges between basic blocks. This is what you normally think of as “control flow”. These edges are explicitly represented in the IR, as in, there is an actual data structure (usually vector of successors and vector of predecessors) that each block uses to tell what control flow edges it participates in.
  2. The implicit fall-through control flow for instructions within blocks. This is standard for compilers with basic blocks.
  3. A new kind of control flow edge due to OSR, which goes from instructions in blocks to OSR exit landing sites. This means changing the definition of basic blocks slightly. Normally the only successors of basic blocks are the ones in the control flow graph. Our basic blocks have a bunch of OSR exit successors as well. Those successors don’t exist in the control flow graph, but we have names for them thanks to the exit origins found in the exiting instructions. The edges to those exit origins exit out of the middle of blocks, so they may terminate the execution of blocks before the block terminal.

The OSR landing site is understood by the compiler as having the following behaviors:

  • It ends execution of this function in DFG IR. This is key, since it means that there is no merge point in our control flow graph that has to consider the consequences of exit.
  • It possibly reads and writes the whole world. The DFG has to care about the reads (since they may observe whatever happened just before the exit) but not the writes (since they affect execution after execution exited DFG).
  • It reads some set of values, namely those passed as the stackmap.

This understanding is abstract, so the compiler will just assume the worst case (after exit every location in memory is read and written and all of the bits in all of the values in the stackmap are etched into stone).

This approach is great because it allows precise reconstruction of baseline state when compiling OSR exit and it mostly doesn’t inhibit optimization because it “only” involves adding a new kind of implicit control flow edge to the control flow graph.

This approach allows for simple reconstruction of state at exit because the backend that compiles the DFG nodes would have treated the stackmap data flow edges (things like loc3->@s in our example) the same way it would have treated all other edges. So, at the ArithAdd, the backend would know which registers, stack slots, or constant values to use to materialize the stackmap values. It would know how to do this for the same reason that it would know how to materialize the two actual add operands.

If we survey the most common optimizations that we want the compiler to do, we find that only one major optimization is severely inhibited by this approach to OSR exit. Let’s first review the optimizations this doesn’t break. It’s still possible to perform common subexpression elimination (CSE) on ArithAdd. It’s still possible to hoist it out of loops, though if we do that then we have to edit the exit metadata (the exit destination and stackmap will have to be overwritten to be whatever they are at the loop pre-header). It’s still possible to model the ArithAdd to be pure in lots of the ways that matter, like that if there are two loads, one before and one after the ArithAdd, then we can assume them to be redundant. The ArithAdd could only cause effects on the exit path, in which case the second load doesn’t matter. It’s still possible to eliminate the ArithAdd if it’s unreachable.

The only thing we cannot easily do is what compilers call dead code elimination, i.e. the elimination of instructions if their results are not used. Note that the compiler terminology is confusing here. Outside the compiler field we use the term dead code to mean something that compilers call unreachable code. Code is unreachable if control flow doesn’t reach it and so it doesn’t execute. Outside the compiler field, we would say that such code is dead. It’s important that compilers be able to eliminate unreachable code. Happily, our approach to OSR has no impact on unreachable code elimination. What compilers call dead code is code that is reached by control flow (so live in the not-compiler sense) but that produces a result that no subsequent code uses. Here’s an example of dead code in the compiler sense:

int tmp = a + b;
// nobody uses tmp.

Dead code elimination (DCE) is the part of a compiler that removes this kind of code. Dead code elimination doesn’t quite work for the ArithAdd because:

  • ArithAdd’s speculation checks must be assumed live even if the result of the add is unused. We may do some optimization to a later check because we find that it is subsumed by checks done by this ArithAdd. That’s a pretty fundamental optimization that we do for OSR checks and it’s the reason why OSR ultimately flattens control flow. But we don’t bother recording whenever this ArithAdd’s check is used to unlock a later optimization, so we have to assume that some later operation is already depending on the ArithAdd doing all of its checks. This means that: say that the result of some operation A is used by a dead operation B. B will still have to do whatever checks it was doing on its inputs, which will keep A alive even though B is dead. This is particularly devastating for ArithAdd, since ArithAdd usually does an overflow check. You have to do the add to check overflow. So, ArithAdd is never really dead. Consider the alternative: if we did not considider the ArithAdd’s overflow check’s effect on abstract state, then we wouldn’t be able to do our range analysis, which uses the information inferred from overflow checks to remove array bounds checks and vice versa.
  • The ArithAdd is almost sure to end up in the stackmap of some later operation, as is basically every node in the DFG program, unless the node represents something that was dead in bytecode. Being dead in bytecode is particularly unlikely because in bytecode we must assume that everything is polymorphic and possibly effectful. Then the add is really not dead: it might be a loop with function calls, after all.

The DFG and FTL still do DCE, but it’s hard and usually only worth the effort for the most expensive constructs. We support decaying an operation just to its checks, for those rare cases where we can prove that the result is not used. We also support sinking to OSR, where an operation is replaced by a phantom version of itself that exists only to tell OSR how to perform the operation for us. We mainly use this complex feature for eliminating object allocations.

To summarize the effect on optimizations: we can still do most of the optimizations. The optimization most severely impacted is DCE, but even there, we have found ways to make it work for the most important cases.

The only real downside of this simple approach is repetition: almost every DFG operation may exit and the state at exit may easily have tens or hundreds of variables, especially if we have done significant inlining. Storing the stackmap in each DFG node would create a case of O(n2) explosion in memory usage and processing time within the compiler. Note that the fact that this explosion happens is somewhat of a JavaScript-specific problem, since JavaScript is unusual in the sheer number of speculations we have to make per operation (even simple ones like add or get_by_id). If the speculations were something we did seldom, like in Java where they are mostly used for virtual calls, then the simple approach would be fine.

Stackmap compression in DFG IR. Our solution to the size explosion of repeated stackmaps is to use a delta encoding. The stackmaps don’t change much. In our running example, the add just kills loc8 and defines loc7. The kill can be discovered by analyzing bytecode, so there’s no need to record it. All we have to record about this operation is that it defines loc7 to be the ArithAdd node.

We use an operation called MovHint as our delta encoding. It tells which bytecode variable is defined by which DFG node. For example, let’s look at the MovHint we would emit for the add in Figure 28:

c: ArithAdd(@a, @b, bc#42)
   MovHint(@c, loc7, bc#42)

We need to put some care into how we represent MovHints so that they are easy to preserve and modify. Our approach is two-fold:

  • We treat MovHint as a store effect.
  • We explicitly label the points in the IR where we expect it to be valid to exit based on the state constructed out of the MovHint deltas.

Let’s first look at how we use the idea of store effects to teach the compiler about MovHint. Imagine a hypothetical DFG IR interpreter and how it would do OSR exit. They key idea is that in that interpreter, the state of the DFG program comprises not just the mapping from DFG nodes to their values, but also an OSR exit state buffer containing values indexed by bytecode variable name. That OSR exit state buffer contains exactly the stack frame that the profiling tiers would use. MovHint’s interpreter semantics are to store the value of its operand into some slot in the OSR exit state buffer. This way, the DFG interpreter is able to always maintain an up-to-date bytecode stack frame in tandem with the optimized representation of program state. Although no such interpreter exists, we make sure that the way we compile MovHint produces something with semantics consistent with what this interpreter would have done.

MovHint is not compiled to a store. But any phase operating on MovHints or encountering MovHints just needs to understand it as a store to some abstract location. The fact that it’s a store means that it’s not dead code. The fact that it’s a store means that it may need to be ordered with other stores or loads. Lots of desirable properties we need for soundly preserving MovHints across compiler optimizations fall out naturally from the fact that we tell all the phases that it’s just a store.

The compiler emits zero code for MovHint. Instead, we use a reaching defs analysis of MovHints combined with a bytecode liveness analysis to rebuild the stackmaps that we would have had if each node carried a stackmap. We perform this analysis in the backend and as part of any optimization that needs to know what OSR is doing. In the DFG tier, the reaching defs analysis happens lazily (when the OSR exit actually occurs — so could be long after the DFG compiled the code), which ensures that the DFG never experiences the O(n2) blow-up of stackmaps. OSR exit analysis is not magical: in the “it’s just a store” model of MovHint, this analysis reduces to load elimination.

DFG IR’s approach to OSR means that OSR exit is possible at some points in DFG IR and not at others. Consider some examples:

  • A bytecode instruction may define multiple bytecode variables. When lowered to DFG IR, we would have two or more MovHints. It’s not possible to have an exit between those MovHints, since the OSR exit state is only partly updated at that point.
  • It’s not possible to exit after a DFG operation that does an observable effect (like storing to a JS object property) but before its corresponding MovHint. If we exit to the current exit origin, we’ll execute the effect again (which is wrong), but if we exit to the next exit origin, we’ll neglect to store the result into the right bytecode variable.

We need to make it easy for DFG transformations to know if it’s legal to insert operations that may exit at any point in the code. For example, we may want to write instrumentation that adds a check before every use of @x. If that use is a MovHint, then we need to know that it may not be OK to add that check right before that MovHint. Our approach to this is based on the observation that the lowering of a bytecode instruction produces two phases of execution in DFG IR of that instruction:

  • The speculation phase: at the start of execution of a bytecode, it’s both necessary and possible to speculate. It’s necessary to speculate since those speculations guard the optimizations that we do in the subsequent DFG nodes for that bytecode instruction. It’s possible to speculate because we haven’t done any of the instruction’s effects, so we can safely exit to the start of that bytecode instruction.
  • The effects phase: as soon as we perform any effect, we are no longer able to do any more speculations. That effect could be an actual effect (like storing to a property or making a call) or an OSR effect (like MovHint).

To help validate this, all nodes in DFG IR have an exitOK flag that they use to record whether they think that they are in the speculative phase (exitOK is true) or if they think that they might be in the effects phase (exitOK is false). It’s fine to say that exitOK is false if we’re not sure, but to say exitOK is true, we have to be completely sure. The IR validation checks that exitOK must become false after operations that do effects, that it becomes true again only at prescribed points (like a change in exit origin suggesting that we’ve ended the effects phase of one instruction and begun the speculation phase of the next one), and that no node that may exit has exitOK set to false. This validator helps prevent errors, like when dealing with bytecode operations that can be lowered to multiple effectful DFG nodes. One example is when put_by_id (i.e. something like o.f = v) is inferred to be a transition (the property f doesn’t exist on o so we need to add it), which results in two effects:

  • Storing a value v into the memory location for property o.f.
  • Changing o‘s structure to indicate that it now has an f.

The DFG IR for this will look something like:

CheckStructure(@o, S1)
PutByOffset(@o, @v, f)
PutStructure(@o, S2, ExitInvalid)

Note that PutStructure will be flagged with ExitInvalid, which is the way we say that exitOK is false in IR dumps. Failing to set exitOK to false for PutStructure would cause a validation error since PutByOffset (right before it) is an effect. This prevents us from making mistakes like replacing all uses of @o with some operation that could speculate, like:

a: FooBar(@o, Exits)
   CheckStructure(@a, S1)
b: FooBar(@o, Exits)
   PutByOffset(@b, @v, f)
c: FooBar(@o, Exits)
   PutStructure(@c, S2, ExitInvalid)

In this example, we’ve used some new FooBar operation, which may exit, as a filter on @o. It may seem absurd to instrument code this way, but it is a goal of DFG IR to:

  • Allow replacing uses of nodes with uses of other nodes that produce an equivalent value. Let’s assume that FooBar is an identity that also does some checks that may exit.
  • Allow inserting new nodes anywhere.

Therefore, the only bug here is that @c is right after the PutByOffset. The validator will complain that it is not marked ExitInvalid. It should be marked ExitInvalid because the previous node (PutByOffset) has an effect. But if you add ExitInvalid to @c, then the validator will complain that a node may exit with ExitInvalid. Any phase that tries to insert such a FooBar would have all the API it needs to realize that it will run into these failures. For example, it could ask the node that it’s inserting itself in front of (the PutStructure) whether it has ExitInvalid. Since it is ExitInvalid, we could do either of these things instead of inserting @c just before the PutStructure:

  1. We could use some other node that does almost what FooBar does but without the exit.
  2. We could insert @c earlier, so it can still exit.

Let’s look at what the second option would look like:

a: FooBar(@o, Exits)
   CheckStructure(@a, S1)
b: FooBar(@o, Exits)
c: FooBar(@o, Exits)
   PutByOffset(@b, @v, f)
   PutStructure(@c, S2, ExitInvalid)

Usually this is all it takes to deal with regions of code with !exitOK.

Note that in cases where something like FooBar absolutely needs to do a check after an effect, DFG IR does support exiting into the middle of a bytecode instruction. In some cases, we have no choice but to use that feature. This involves introducing extra non-bytecode state that can be passed down OSR exit, issuing OSR exit state updates before/after effects, using an exit origin that indicates that we’re exiting to some checkpoint in the middle of a bytecode instruction’s execution, and implementing a way to execute a bytecode starting at a checkpoint during OSR exit. It’s definitely possible, but not the sort of thing we want to have to do every time that some DFG node needs to do an effect. For this reason, canonical DFG IR use implies having !exitOK phases (aka effect phases) during some bytecode instructions’ execution.

Watchpoints and Invalidation. So far we have considered OSR exit for checks that the compiler emits. But the DFG compiler is also allowed to speculate by setting watchpoints in the JavaScript heap. If it finds something desirable — like that Math.sqrt points to the sqrt intrinsic function — it can often incorporate it into optimization without emitting checks. All that is needed is for the compiler to set a watchpoint on what it wants to prove (that the Math and sqrt won’t change). When the watchpoint fires, we want to invalidate the compiled code. That means making it so that the code never runs again:

  • no new calls to that function go to the optimized version and
  • all returns into that optimized function are redirected to go to baseline code instead.

Ensuring that new calls avoid optimized code is easy: we just patch all calls to the function to call the profiled code (Baseline, if available, or LLInt) instead. Handling returns is the interesting part.

One approach to handling invalidation is to walk the stack to find all returns to the invalidated code, and repoint those returns to an OSR exit. This would be troublesome for us due to our use of effects phases: it’s possible for multiple effects to happen in a row in a phase of DFG IR execution where it is not possible to exit. So, the DFG approach to invalidation involves letting the remaining effects of the current bytecode instruction finish executing in optimized code and then triggering an OSR exit right before the start of the next bytecode instruction.

Figure 30. How OSR exit and invalidation might work for hypothetical bytecodes.

Invalidation in DFG IR is enabled by the InvalidationPoint instruction, which is automatically inserted by the DFG frontend at the start of every exit origin that is preceded by effects that could cause a watchpoint to fire. InvalidationPoint is modeled as if it was a conditional OSR exit, and is given an OSR exit jump label as if there was a branch to link it to. But, InvalidationPoint emits no code. Instead, it records the location in the machine code where the InvalidationPoint would have been emitted. When a function is invalidated, all of those labels are overwritten with unconditional jumps to the OSR exit.

Figure 30 shows how OSR exit concepts like speculation and effect phases combine with InvalidationPoint for three hypothetical bytecode instructions. We make up intentionally absurd instructions because we want to show the range of possibilities. Let’s consider wat in detail. The first DFG IR node for wat is an InvalidationPoint, automatically inserted because the previous bytecode (foo) had an effect. Then wat does a CheckArray, which may exit but has no effects. So, the next DFG node, Wat, is still in the speculation phase. Wat is in a sort of perfect position in DFG IR: it is allowed to perform speculations and effects. It can perform speculations because no previous node for wat‘s exit origin has performed effects. It can also perform effects, but then the nodes after it (Stuff and Derp) cannot speculate anymore. But, they can perform more effects. Since wat has effects, an InvalidationPoint is immediately inserted at the start of the next bytecode (bar). Note that in this example, Foo, Wat, and StartBar are all in the perfect position (they can exit and have effects). Since Stuff, Derp, and FinishBar are in the effects region, the compiler will assert if they try to speculate.

Note that InvalidationPoint makes code layout tricky. On x86, the unconditional jump used by invalidation is five bytes. So, we must ensure that there are no other jump labels in the five bytes after an invalidation label. Otherwise, it would be possible for invalidation to cause one of those labels to point into the middle of a 5-byte invalidation jump. We solve this by adding nop padding to create at least a 5-byte gap between a label used for invalidation and any other kind of label.

To summarize, DFG IR has extensive support for OSR exit. We have a compact delta encoding of changes to OSR exit state. Exit destinations are encoded as an exit origin field in every DFG node. OSR exit due to invalidation is handled by automatic InvalidationPoint insertion.

Static Analysis

The DFG uses lots of static analysis to complement how it does speculation. This section covers three static analyses in the DFG that have particularly high impact:

  • We use prediction propagation to fill in predicted types for all values based on value profiling of some values. This helps us figure out where to speculate on type.
  • We use the abstract interpreter (or just AI for short in JavaScriptCore jargon) to find redundant OSR speculations. This helps us emit fewer OSR checks. Both the DFG and FTL include multiple optimization passes in their pipelines that can find and remove redundant checks but the abstract interpreter is the most powerful one. The abstract interpreter is the DFG tier’s primary optimization and it is reused with small enhancements in the FTL.
  • We use clobberize to get aliasing information about DFG operations. Given a DFG instruction, clobberize can describe the aliasing properties. In almost all cases that description is O(1) in time and space. That description implicitly describes a rich dependency graph.

Both the prediction propagator and the abstract interpreter work by forward-propagating type infromation. They’re both built on the principles of abstract interpretation. It’s useful to understand at least some of that theory, so let’s do a tiny review. Abstract interpreters are like normal interpreters, except that they describe program state abstractly rather than considering exact values. A classic example due to Kildall involves just remembering which variables have known constant values and forgetting any variable that may have more than one value. Abstract interpreters are run to fixpoint: we keep executing every instruction until we no longer observe any changes. We can execute forward (like Kildall) or backward (like liveness analysis). We can either have sets that shrink as we learn new things (like Kildall, where variables get removed if we learn that they may have more than one value) or we can have sets that grow (like liveness analysis, where we keep adding variables to the live set).

Now let’s go into more details about the two abstract interpreters and the alias analysis.

Prediction propagation. The prediction propagator’s abstract state comprises variable to speculated type (Figure 13) mappings. The speculated type is a set of fundamental types. The sets tell which types a value is predicted to have. The prediction propagator is not flow sensitive; it has one copy of the abstract state for all program statements. So, each execution of a statement considers the whole set of input types (even from program statements that can’t reach us) and joins the result with the speculated type of the result variable. Note that the input to the prediction propagator is a data flow IR, so multiple assignments to the same variable aren’t necessarily joined.

The prediction propagator doesn’t have to be sound. The worst case outcome of the prediction propagator being wrong is that we either:

  • do speculations that are too strong, and so we exit too much and then recompile.
  • do speculations that are too weak, so we run slower than we could forever.

Note that the second of those outcomes is generally worse. Recompiling and then speculating less at least means that the program eventually runs with the optimal set of speculations. Speculating too weakly and never recompiling means that we never get to optimal. Therefore, the prediction propagator is engineered to sometimes be unsound instead of conservative, since unsoundness can be less harmful.

The abstract interpreter. The DFG AI is the DFG tier’s most significant optimization. While there are many abstract interpreters throughout JavaScriptCore, this one is the biggest in terms of total code and the number of clients — hence to us it is the abstract interpreter.

The DFG AI’s abstract state comprises variable to abstract value mappings where each abstract value represents a set of possible JSValues that the variable could have. Those sets describe what type information we have proved from past checks. We join abstract states at control flow merge points. The solution after the fixpoint is a minimal solution (smallest possible sets that have a fixpoint). The DFG AI is flow-sensitive: it maintains a separate abstract state per instruction boundary. AI looks at the whole control flow graph at once but does not look outside the currently compiled function and whatever we inlined into it. AI is also sparse conditional.

The DFG abstract value representation has four sub-values:

  • Whether the value is known to be a constant, and if so, what that constant is.
  • The set of possible types (i.e. a SpeculatedType bitmap, shown in Figure 13).
  • The set of possible indexing types (also known as array modes) that the object pointed to by this value can have.
  • The set of possible structures that the object pointed to by this value can have. This set has special infinite set powers.

The last two sub-values can be mutated by effects. DFG AI assumes that all objects have escaped, so if an effect happens that can change indexing types and structures, then we have to clobber those parts of all live abstract values.

We interpret the four sub-values as follows: the abstract value represents the set of JSValues that reside in the intersection of the four sub-value sets. This means that when interpreting abstract values, we have the option of just looking at whichever sub-value is interesting to us. For example, an optimization that removes structure checks only needs to look at the structure set field.

Figure 31. Examples of check elimination with abstract interpretation.

The DFG AI gives us constant and type propagation simultaneously. The type propagation is used to remove checks, simplify checks, and replace dynamic operations with faster versions.

Figure 31 shows examples of checks that the DFG AI lets us remove. Note that in addition to eliminating obvious same-basic-block check redundancies (Figure 31(a)), AI lets us remove redundancies that span multiple blocks (like Figure 31(b) and (c)). For example, in Figure 31(c), the AI is able to prove that @x is an Int32 at the top of basic block #7 because it merges the Int32 states of @x from BB#5 and #6. Check elimination is usually performed by mutating the IR so that later phases know which checks are really necessary without having to ask the AI.

The DFG AI has many clients, including the DFG backend and the FTL-to-B3 lowering. Being an AI client means having access to its estimate of the set of JSValues that any variable or DFG node can have at any program point. The backends use this to simplify checks that were not removed. For example, the backend may see an Object-or-Undefined, ask AI about it, and find that AI already proved that that we must have either an object or a string. The backend will be able to combine those two pieces of information to only emit an is-object check and ignore the possibility of the value being undefined.

Type propagation also allows us to replace dynamic heap accesses with inlined ones. Most fast property accesses in DFG IR arise from inline cache feedback telling us that we should speculate, but sometimes the AI is able to prove something stronger than the profiler told us. This is especially likely in inlined code.

Clobberize. Clobberize is the alias analysis that the DFG uses to describe what parts of the program’s state an instruction could read and write. This allows us to see additional dependency edges between instructions beyond just the ones expressed as data flow. Dependency information tells the compiler what kinds of instruction reorderings are legal. Clobberize has many clients in both the DFG and FTL. In the DFG, it’s used for common subexpression elimination, for example.

To understand clobberize, it’s worth considering what it is about a program’s control flow that a compiler needs to remember. The control flow graph shows us one possible ordering of the program and we know that this ordering is legal. But both the DFG and FTL tiers want to move code around. The DFG tier mostly only moves code around within basic blocks rather than between them while the FTL tier can also move code between basic blocks. Even with the DFG’s block-local code motion, it’s necessary to know more than just the current ordering of the program. It’s also necessary to know how that ordering can be changed.

Some of this is already solved by the data flow graph. DFG IR provides a data flow graph that shows some of the dependencies between instructions. It’s obvious that if one instruction has a data flow edge to another, then only one possible ordering (source executes before sink) is valid. But what about:

  • Stores to memory.
  • Loads from memory.
  • Calls that can cause any effects.
  • OSR effects (like MovHint).

Data flow edges don’t talk about those dependencies. Data flow also cannot tell which instructions have effects at all. So, the data flow graph cannot tell us anything about the valid ordering of instructions if those instructions have effects.

The issue of how to handle dependencies that arise from effects is particularly relevant to JavaScript compilation — and speculative compilation in general — because of the precision about aliasing that speculation gives us. For example, although the JavaScript o.f operation could have any effect, after speculation we often know that it can only affect properties named f. Additionally, JavaScript causes us to have to emit lots of loads to fields that are internal to our object model and it’s good to know exactly when those loads are redundant so that we can remove as many of them as possible. So, we need to have the power to ask, for any operation that may access internal VM state, whether that state could be modified by any other operation, and we want that answer to be as precise as it can while being O(1)-ish.

Clobberize is a static analysis that augments the data flow and control flow graphs by telling us constraints on how instructions can be reordered. The neat thing about clobberize is that it avoids storing dependency information in the instructions themselves. So, while the compiler is free to query dependency information anytime it likes by running the analysis, it doesn’t have to do anything to maintain it.

Figure 32. Some of the abstract heap hierarchy. All heaps are subsets of World, which is subdivided into Heap, Stack and SideState. For example, JS function calls say that they write(Heap) and read(World). Subheaps of Heap include things like JSObject_butterfly, which refer to fields that are internal to the JSC object model and are not directly user-visible, and things like NamedProperties, a heap that contains subheaps for every named property the function accesses.

For each DFG instruction, clobberize reports zero or more reads or writes. Each read or write says which abstract heaps it is accessing. Abstract heaps are sets of memory locations. A read (or write) of an abstract heap means that the program will read (or write) from zero or more actual locations in that abstract heap. Abstract heaps form a hierarchy with World at the top (Figure 32). A write to World means that the effect could write to anything, so any read might see that write. The hierarchy can get very specific. For example, fully inferred, direct forms of property access like GetByOffset and PutByOffset report that they read and write (respectively) an abstract heap that names the property. So, accesses to properties of different names are known not to alias. The heaps are known to alias if either one is a descendant of the other.

It’s worth appreciating how clobberize combined with control flow is just a way of encoding a dependence graph. To build a dependence graph from clobberize information, we apply the following rule. If instruction B appears after instruction A in control flow, then we treat B as having a dependence edge to A (B depends on A) if:

  • any heap read by B overlaps any heap written by A, or
  • any heap written by B overlaps any heap read or written by A.

Conversely, any dependence graph can be expressed using clobberize. An absurd but correct representation would involve giving each edge in the dependence graph its own abstract heap and having the source of the edge write the heap while the sink reads it. But what makes clobberize such an efficient representation of dependence graphs is that every dependence that we’ve tried to represent can be intuitively described by reads and writes to a small collection of abstract heaps.

Those abstract heaps are either collections of concrete memory locations (for example the "foo" abstract heap is the set of memory locations used to represent the values of properties named “foo”) or they are metaphorical. Let’s explore some metaphorical uses of abstract heaps:

  • MovHint wants to say that it is not dead code, that it must be ordered with other MovHints, and that it must be ordered with any real program effects. We say this in clobberize by having MovHint write SideState. SideState is a subheap of World but disjoint from other things, and we have any operation that wants to be ordered with OSR exit state either read or write something that overlaps SideState. Note that DFG assumes that operations that may exit implicitly read(World) even if clobberize doesn’t say this, so MovHint’s write of SideState ensures ordering with exits.
  • NewObject wants to say that it’s not valid to hoist it out of loops because two successive executions of NewObject may produce different results. But it’s not like NewObject clobbers the world; for example if we had two accesses to the same property on either sides of a NewObject then we’d want the second one to be eliminated. DFG IR has many NewObject-like operations that also have this behavior. So, we introduce a new abstract heap called HeapObjectCount and we say that NewObject is metaphorically incrementing (reading and writing) the HeapObjectCount. HeapObjectCount is treated as a subheap of Heap but it’s disjoint from the subheaps that describe any state visible from JS. This is sufficient to block hoisting of NewObject while still allowing interesting optimizations to happen around it.
Figure 33. Sample sequence of DFG IR instructions and their dependence graph. DFG IR never stores the dependence graph in memory because we get the information implicitly by running clobberize.

The combination of clobberize and the control flow graph gives a scalable and intuitive way of expressing the dependence graph. It’s scalable because we don’t actually have to express any of the edges. Consider for example a dynamic access instruction that could read any named JavaScript property, like the Call instruction in Figure 33. Clobberize can say this in O(1) space and time. But a dependence graph would have to create an edge from that instruction to any instruction that accesses any named property before or after it. In short, clobberize gives us the benefit of a dependence graph without the cost of allocating memory to represent the edges.

The abstract heaps can also be efficiently collected into a set, which we use to summarize the aliasing effects of basic blocks and loops.

To summarize, the DFG puts a big emphasis on static analysis. Speculation decisions are made using a combination of profiling and an abstract interpreter called prediction propagation. Additionally, we have an abstract interpreter for optimization, simply called the DFG abstract interpreter, which serves as the main engine for redundant check removal. Abstract interpreters are a natural fit for the DFG because they give us a way to forward-propagate information about types. Finally, the DFG uses the clobberize analysis to describe dependencies and aliasing.

Fast Compilation

The DFG is engineered to compile quickly so that the benefits of OSR speculations can be realized quickly. To help reduce compile times, the DFG is focused about what optimizations it does and how it does them. The static analysis and OSR exit optimizations discussed so far represent the most powerful things that the DFG is capable of. The DFG does a quick and dirty job with everything else, like instruction selection, register allocation, and removal of redundant code that isn’t checks. Functions that benefit from the compiler doing a good job on those optimizations will get them if they run long enough to tier up into the FTL.

The DFG’s focus on fast compilation happened organically, as a result of many separate throughput-latency trade-offs. Initially, JavaScriptCore just had the Baseline JIT and then later Baseline as the profiling tier and DFG as the optimizing tier. The DFG experienced significant evolution during this time, and then experienced additional evolution after the FTL was introduced. While no single decision led to the DFG’s current design, we believe that it was most significantly shaped by tuning for short-running benchmarks and the introduction of the FTL.

The DFG was tuned for a diverse set of workloads. On the one hand, it was tuned for long-running tests in which one full second of warm-up was given to the speculative compiler for free (like the old V8 benchmarks, which live on in the form of Octane and JetStream, albeit without the freebie warmup), but on the other hand, it was also tuned for shorter-running benchmarks like SunSpider and page load tests. SunSpider focused on smallish programs running for very short bursts of time with little opportunity for warm-up. Compilers that do more optimizations than the DFG tend to lose to it on SunSpider because they fail to complete their optimizations before SunSpider finishes running. We continue to use tests that are in the spirit of SunSpider, like Speedometer and JetStream. Speedometer has a similar code-size-to-running-time ratio, so like SunSpider, it benefits a lot from DFG. JetStream includes a subset of SunSpider and puts a big emphasis on short-running code in all of its other tests. That’s not to say that we don’t also care about long-running code. It’s just that our methodology for improving the DFG was to try to get speed-ups on both short-running things and long-running things with the same engine. Since any long-running optimization would regress the short-running tests, we often avoided adding any long-running optimizations to the DFG. But we did add cheap versions of many sophisticated optimizations, giving respectable speed-ups on both short-running and long-running workloads.

The introduction of the FTL solidified the DFG’s position as the compiler that optimizes less. So long as the DFG generates reasonably good code quickly, we can get away with putting lots of expensive optimizations into the FTL. The FTL’s long compile times mean that many programs do not run long enough to benefit from the FTL. So, the DFG is there to give those programs a speculative optimization boost in way less time than an FTL-like compiler could do. Imagine a VM that only had one optimizing compiler. Unless that one compiler compiled as fast as the DFG and generated code that was as good as the FTL, it would end up being reliably slower than JavaScriptCore on some workloads. If that compiler compiled as fast as the DFG but didn’t have the FTL’s throughput then any program that ran long enough would run faster in JavaScriptCore. If that compiler generated code that was as good as the FTL but compiled slower than the DFG then any program that ran short enough to tier up into the DFG but not that compiler would run faster in JavaScriptCore. JavaScriptCore has multiple compiler tiers because we believe that it is not possible to build a compiler that compiles as fast as the DFG while generating code that is as good as the FTL.

To summarize, the DFG focuses on fast compilation because of the combination of the history of how it was tuned and the fact that it sits as the tier below the FTL JIT.

Figure 34. Illustration of a sample DFG IR program with all three graphs: local data flow, global data flow, and control flow.

The DFG compiler’s speed comes down to an emphasis on block-locality in the IR. The DFG IR used by the DFG tier has a two-level data flow graph:

  • Local data flow graph. The local data flow graph is used within basic blocks. This graph is a first-class citizen in the IR, when working with data flow in the DFG’s C++ code, it sometimes seems like this is the only data flow graph. DFG IR inside a basic block resembles SSA form in the sense that there’s a 1:1 mapping between instructions and the variables they assign and data flow is represented by having users of values point at the instructions (nodes) that produce those values. This representation does not allow you to use a value produced by an instruction in a different block except through tedious escape hatches.
  • Global data flow graph. We say global to mean the entire compilation unit, so some JS function and whatever the DFG inlined into it. So, global just means spanning basic blocks. DFG IR maintains a secondary data flow graph that spans basic blocks. DFG IR’s approach to global data flow is based on spilling: to pass a value to a successor block, you store it to a spill slot on the stack, and then that block loads it. But in DFG IR, we also thread data flow relationships through those loads and stores. This means that if you are willing to perform the tedious task of traversing this secondary data flow graph, you can get a global view of data flow.

Figure 34 shows an example of how this works. The compilation unit is represented as three graphs: a control flow graph, local data flow, and global data flow. Data flow graphs are represented with edges going from the user to the value being used. The local data flow graphs work like they do in SSA, so any SSA optimization can be run in a block-local manner on this IR. The global data flow graph is made of SetLocal/GetLocal nodes that store/load values into the stack. The data flow between SetLocal and GetLocal is represented completely in DFG IR, by threading data flow edges through special Phi nodes in each basic block where a local is live.

From the standpoint of writing outstanding high-throughput optimizations, this approach to IR design is like kneecapping the compiler. Compilers thrive on having actual SSA form, where there is a single data flow graph, and you don’t have to think about an instruction’s position in control flow when traversing data flow. The emphasis on locality is all about excellent compile times. We believe that locality gives us compile time improvements that we can’t get any other way:

  • Instruction selection and register allocation for a basic block can be implemented as a single pass over that basic block. The instruction selector can make impromptu register allocation decisions during that pass, like deciding that it needs any number of scratch registers to emit code for some DFG node. The combined instruction selector and register allocator (aka the DFG backend) compiles basic blocks independently of one another. This kind of code generation is good at register allocating large basic blocks but bad for small ones. For functions that only have a single basic block, the DFG often generates code that is as good as the FTL.
  • We never have to decompress the delta encoding of OSR exit. We just have the backend record a log of its register allocation decisions (the variable event stream). While the DFG IR for a function is thrown out after compilation, this log along with a minified version of the DFG IR (that only includes MovHints and the things they reference) is saved so that we can replay what the backend did whenever an OSR exit happens. This makes OSR exit handling super cheap in the DFG – we totally avoid the O(n2) complexity explosion of OSR stackmaps despite the fact that we speculate like crazy.
  • There is no need to enter or exit SSA. On the one hand, SSA conversion performance is a solved problem: it’s a nearly-linear-time operation. Even so, the constant factors are high enough that avoiding it entirely is profitable. Converting out of SSA is worse. If we wanted to combine SSA with our block-local backend, we’d have to add some sort of transformation that discovers how to load/store live state across basic blocks. DFG IR plays tricks where the same store that passes data flow to another block doubles as the OSR exit state update. It’s not obvious that exiting out of SSA would discover all of the cases where the same store can be reused for both OSR exit state update and the data flow edge. This suggests that any version of exiting out of SSA would make the DFG compiler either generate worse code or run slower. So, not having SSA makes the compiler run faster because entering SSA is not free and exiting SSA is awful.
  • Every optimization is faster if it is block-local. Of course, you could write block-local optimizations in an SSA IR. But having an IR that emphasizes locality is like a way to statically guarantee that we won’t accidentally introduce expensive compiler passes to the DFG.

The one case where global data flow is essential to the DFG’s mission is static analysis. This comes up in the prediction propagator and the abstract interpreter. Both of them use the global data flow graph in addition to the local data flow graphs, so that they can see how type information flows through the whole compilation unit. Fortunately, as shown in Figure 34, the global data flow graph is available. It’s in a format that makes it hard to edit but relatively easy to analyze. For example, it implicitly reports the set of live variables at each basic block boundary, which makes merging state in the abstract interpreter relatively cheap.

Figure 35. The DFG pipeline.

Figure 35 shows the complete DFG optimization pipeline. This is a fairly complete pipeline: it has classics like constant folding, control flow simplification, CSE, and DCE. It also has lots of JavaScript-specifics like deciding where to put checks (unification, prediction injection and propagation, prediction propagation, and fixup), a pass just to optimize common patterns of varargs, some passes for GC barriers, and passes that help OSR (CPS rethreading and phantom insertion). We can afford to do a lot of optimizations in the DFG so long as those optimizations are block-local and don’t try too hard. Still, this pipeline is way smaller than the FTL’s and runs much faster.

To summarize, the DFG compiler uses OSR exit and static analysis to emit an optimal set of type checks. This greatly reduces the number of type checks compared to running JavaScript in either of the profiled tiers. Because the benefit of type check removal is so big, the DFG compiler tries to limit how much time it spends doing other optimizations by restricting itself to a mostly block-local view of the program. This is a trade off that the DFG makes to get fast compile times. Functions that run long enough that they’d rather pay the compile time to get those optimizations end up tiering up to the FTL, which just goes all out for throughput.

FTL Compiler

We’ve previously documented some aspects of the FTL’s architecture in the original blog post and when we introduced B3. This section provides an updated description of this JIT’s capabilities as well as a deep dive into how FTL does OSR. We will structure our discussion of the FTL as follows. First we will enumerate what optimizations it is capable of. Then we will describe how it does OSR exit in detail. Finally we will talk about patchpoints — an IR operation based on a lambda.

All The Optimizations

The point of the FTL compiler is to run all the optimizations. This is a compiler where we never compromise on peak throughput. All of the DFG’s decisions that were known trade-offs in favor of compile time at the expense of throughput are reversed in the FTL. There is no upper limit on the amount of cycles that a function compiled with the FTL will run for, so it’s the kind of compiler where even esoteric optimizations have a chance to pay off eventually. The FTL combines multiple optimization strategies:

  • We reuse the DFG pipeline, including the weird IR. This ensures that any good thing that the DFG tier ever does is also available in the FTL.
  • We add a new DFG SSA IR and DFG SSA pipeline. We adapt lots of DFG phases to DFG SSA (which usually makes them become global rather than local). We add lots of new phases that are only possible in SSA (like loop invariant code motion).
  • We lower DFG SSA IR to B3 IR. B3 is an SSA-based optimizing JIT compiler that operates at the abstraction level of C. B3 has lots of optimizations, including global instructcion selection and graph coloring register allocation. The FTL was B3’s first customer, so B3 is tailored for optimizing at the abstraction level where DFG SSA IR can’t.

Having multiple ways of looking at the program gives the FTL maximum opportunities to optimize. Some of the compiler’s functionality, particularly in the part that decides where to put checks, thrives on the DFG’s weird IR. Other parts of the compiler work best in DFG SSA, like the DFG’s loop-invariant code motion. Lots of things work best in B3, like most reasoning about how to simplify arithmetic. B3 is the first IR that doesn’t know anything about JavaScript, so it’s a natural place to implement textbook optimization that would have difficulties with JavaScript’s semantics. Some optimizations, like CSE, work best when executed in every IR because they find unique opportunities in each IR. In fact, all of the IRs have the same fundamental optimization capabilities in addition to their specialized optimizations: CSE, DCE, constant folding, CFG simplification, and strength reductions (sometimes called peephole optimizations or instruction combining).

Figure 36. The FTL pipeline. Note that Lower DFG to B3 is in bold because it’s FTL’s biggest phase; sometimes when we say “FTL” we are just referring to this phase.

The no-compromise approach is probably best appreciated by looking at the FTL optimization pipeline in Figure 36. The FTL runs 93 phases on the code in encounters. This includes all phases from Figure 35 (the DFG pipeline), except Varargs Forwarding, only because it’s completely subsumed by the FTL’s Arguments Elimination. Let’s review some of the FTL’s most important optimizations:

  • DFG AI. This is one of the most important optimizations in the FTL. It’s mostly identical to the AI we run in the DFG tier. Making it work with SSA makes it slightly more precise and slightly more expensive. We run the AI a total of six times.
  • CSE (common subexpression elimination). We run this in DFG IR (Local Common Subexpression Elimination), DFG SSA IR (Global Common Subexpression Elimination), B3 IR (Reduce Strength and the dedicated Eliminate Common Subexpressions), and even in Air (Fix Obvious Spills, a CSE focused on spill code). Our CSEs can do value numbering and load/store elimination.
  • Object Allocation Sinking is a must-points-to analysis that we use to eliminate object allocations or sink them to slow paths. It can eliminate graphs of object allocations, including cyclic graphs.
  • Integer Range Optimization is a forward flow-sensitive abtract interpreter in which the state is a system of equations and inequalities that describe known relationships between variables. It can eliminate integer overflow checks and array bounds checks.
  • The B3 Reduce Strength phase runs a fixpoint that includes CFG simplification, constant folding, reassociation, SSA clean-up, dead code elimination, a light CSE, and lots of miscellaneous strength reductions.
  • Duplicate Tails, aka tail duplication, flattens some control flow diamonds, unswitches small loops, and undoes some cases of relooping. We duplicate small tails blindly over a CFG with critical edges broken. This allows us to achieve some of what splitting achieved for the original speculative compilers.
  • Lower B3 to Air is a global pattern matching instruction selector.
  • Allocate Registers By Graph Coloring implements the IRC and Briggs register allocators. We use IRC on x86 and Briggs on arm64. The difference is that IRC can find more opportunities for coalescing assignments into a single register in cases where there is high register pressure. Our register allocators have special optimizations for OSR exit, especially the OSR exits we emit for integer overflow checks.

OSR Exit in the FTL

Now that we have enumerated some of the optimizations that the FTL is capable of, let’s take a deep dive into how the FTL works by looking at how it compiles and does OSR. Let’s start with this example:

function foo(a, b, c)
    return a + b + c;

The relevant part of the bytecode sequence is:

[   7] add loc6, arg1, arg2
[  12] add loc6, loc6, arg3
[  17] ret loc6

Which results in the following DFG IR:

  24:  GetLocal(Untyped:@1, arg1(B<Int32>/FlushedInt32), R:Stack(6), bc#7)
  25:  GetLocal(Untyped:@2, arg2(C<BoolInt32>/FlushedInt32), R:Stack(7), bc#7)
  26:  ArithAdd(Int32:@24, Int32:@25, CheckOverflow, Exits, bc#7)
  27:  MovHint(Untyped:@26, loc6, W:SideState, ClobbersExit, bc#7, ExitInvalid)
  29:  GetLocal(Untyped:@3, arg3(D<Int32>/FlushedInt32), R:Stack(8), bc#12)
  30:  ArithAdd(Int32:@26, Int32:@29, CheckOverflow, Exits, bc#12)
  31:  MovHint(Untyped:@30, loc6, W:SideState, ClobbersExit, bc#12, ExitInvalid)
  33:  Return(Untyped:@3, W:SideState, Exits, bc#17)

The DFG data flow from the snippet above is illustrated in Figure 37 and the OSR exit sites are illustrated in Figure 38.

Figure 37. Data flow graph for FTL code generation example. Figure 38. DFG IR example with the two exiting nodes highlighted along with where they exit and what state is live when they exit.

We want to focus our discussion on the MovHint @27 and how it impacts the code generation for the ArithAdd @30. That ArithAdd is going to exit to the second add in the bytecode, which requires restoring loc6 (i.e. the result of the first add), since it is live at that point in bytecode (it also happens to be directly used by that add).

This DFG IR is lowered to the following in B3:

Int32 @42 = Trunc(@32, DFG:@26)
Int32 @43 = Trunc(@27, DFG:@26)
Int32 @44 = CheckAdd(@42:WarmAny, @43:WarmAny, generator = 0x1052c5cd0,
                     earlyClobbered = [], lateClobbered = [], usedRegisters = [],
                     ExitsSideways|Reads:Top, DFG:@26)
Int32 @45 = Trunc(@22, DFG:@30)
Int32 @46 = CheckAdd(@44:WarmAny, @45:WarmAny, @44:ColdAny, generator = 0x1052c5d70,
                     earlyClobbered = [], lateClobbered = [], usedRegisters = [],
                     ExitsSideways|Reads:Top, DFG:@30)
Int64 @47 = ZExt32(@46, DFG:@32)
Int64 @48 = Add(@47, $-281474976710656(@13), DFG:@32)
Void @49 = Return(@48, Terminal, DFG:@32)

CheckAdd is the B3 way of saying: do an integer addition, check for overflow, and if it overflows, execute an OSR exit governed by a generator. The generator is a lambda that is given a JIT generator object (that it can use to emit code at the jump destination of the OSR exit) and a stackmap generation parameters that tells the B3 value representation for each stackmap argument. The B3 value reps tell you which register, stack slot, or constant to use to get the value. B3 doesn’t know anything about how exit works except that it involves having a stackmap and a generator lambda. So, CheckAdd can take more than 2 arguments; the first two arguments are the actual add operands and the rest are the stackmap. It’s up to the client to decide how many arguments to pass to the stackmap and only the generator will ever get to see their values. In this example, only the second CheckAdd (@46) is using the stackmap. It passes one extra argument, @44, which is the result of the first add — just as we would expect based on MovHint @27 and the fact that loc6 is live at bc#12. This is the result of the FTL decompressing the delta encoding given by MovHints into full stackmaps for B3.

Figure 39. The stackmaps and stackmap-like mappings maintained by the FTL to enable OSR.

FTL OSR exit means tracking what happens with the values of bytecode locals through multiple stages of lowering. The various stages don’t know a whole lot about each other. For example, the final IRs, B3 and Air, know nothing about bytecode, bytecode locals, or any JavaScript concepts. We implement OSR exit by tracking multiple stackmap-like mappings per exit site that give us the complete picture when we glue them together (Figure 39):

  • The DFG IR stackmaps that we get be decompressing MovHint deltas. This gives a mapping from bytecode local to either a DFG node or a stack location. In some cases, DFG IR has to store some values to the stack to support dynamic variable introspection like via function.arguments. DFG OSR exit analysis is smart enough recognize those cases, since it’s more optimal to handle those cases by having OSR exit extract the value from the stack. Hence, OSR exit analysis may report that a bytecode local is available through a DFG node or a stack location.
  • The B3 value reps array inside the stackmap generation parameters that B3 gives to the generator lambdas of Check instructions like CheckAdd. This is a mapping from B3 argument index to a B3 value representation, which is either a register, a constant, or a stack location. By argument index we mean index in the stackmap arguments to a Check. This is three pieces of information: some user value (like @46 = CheckAdd(@44, @45, @44)), some index within its argument list (like 2), and the value that index references (@44). Note that since this CheckAdd has two argument indices for @44, that means that they may end up having different value representations. It’s not impossible for one to be a constant and another to be a register or spill slot, for example (though this would be highly unlikely; if it happened then it would probably be the result of some sound-but-inefficient antipattern in the compiler). B3’s client gets to decide how many stackmap arguments it will pass and B3 guarantees that it will give the generator a value representation for each argument index in the stackmap (so starting with argument index 2 for CheckAdd).
  • The FTL OSR exit descriptor objects, which the FTL’s DFG→B3 lowering creates at each exit site and holds onto inside the generator lambda it passes to the B3 check. Exit descriptors are based on DFG IR stackmaps and provide a mapping from bytecode local to B3 argument index, constant, stack slot, or materialization. If the DFG IR stackmap said that a bytecode local is a Node that has a constant value, then the OSR exit descriptor will just tell us that value. If the DFG stackmap said that a local is already on the stack, then the OSR exit descriptor will just tell that stack slot. It could be that the DFG stackmap tells us that the node is a phantom object allocation — an object allocation we optimized out but that needs to be rematerialized on OSR exit. If it is none of those things, the OSR exit descriptor will tell us which B3 argument index has the value of that bytecode local.
  • The FTL’s DFG→B3 lowering already maintains a mapping from DFG node to B3 value.
  • The FTL OSR Exit object, which is a mapping from bytecode local to register, constant, stack slot, or materialization. This is the final product of the FTL’s OSR exit handling and is computed lazily from the B3 value reps and FTL OSR exit descriptor.

These pieces fit together as follows. First we compute the DFG IR stackmap and the FTL’s DFG node to B3 value mapping. We get the DFG IR stackmap from the DFG OSR exit analysis, which the FTL runs in tandem with lowering. We get the DFG to B3 mapping implicitly from lowering. Then we use that to compute the FTL OSR exit descriptor along with the set of B3 values to pass to the stackmap. The DFG IR stackmap tells us which DFG nodes are live, so we turn that into B3 values using the DFG to B3 mapping. Some nodes will be excluded from the B3 stackmap, like object materializations and constants. Then the FTL creates the Check value in B3, passes it the stackmap arguments, and gives it a generator lambda that closes over the OSR exit descriptor. B3’s Check implementation figures out which value representations to use for each stackmap argument index (as a result of B3’s register allocator doing this for every data flow edge), and reports this to the generator as an array of B3 value reps. The generator then creates a FTL::OSRExit object that refers to the FTL OSR exit descriptor and value reps. Users of the FTL OSR exit object can figure out which register, stack slot, constant value, or materialization to use for any bytecode local by asking the OSR exit descriptor. That can tell the constant, spill slot, or materialization script to use. It can also give a stackmap argument index, in which case we load the value rep at that index, and that tells us the register, spill slot, or constant.

This approach to OSR exit gives us two useful properties. First, it empowers OSR-specific optimization. Second, it empowers optimizations that don’t care about OSR. Let’s go into these in more detail.

FTL OSR empowers OSR-specific optimizations. This happens in DFG IR and B3 IR. In DFG IR, OSR exit is a mutable part of the IR. Any operation can be optimized by adding more OSR exits and we even have the ability to move checks around. The FTL does sophisticated OSR-aware optimizations using DFG IR, like object allocation sinking. In B3 IR, OSR exit gets special register allocation treatment. The stackmap arguments of Check are understood by B3 to be cold uses, which means that it’s not expensive if those uses are spilled. This is powerful information for a register allocator. Additionally, B3 does special register allocation tricks for addition and subtraction with overflow checks (for example we can precisely identify when the result register can reuse a stackmap register and when we can coalesce the result register with one of the input registers to produce optimal two-operand form on x86).

FTL OSR also empowers optimizations that don’t care about OSR exit. In B3 IR, OSR exit decisions get frozen into stackmaps. This is the easiest representation of OSR exit because it requires no knowledge of OSR exit semantics to get right. It’s natural for compiler phases to treat extra arguments to an instruction opaquely. Explicit stackmaps are particularly affordable in B3 because of a combination of factors:

  1. the FTL is a more expensive compiler anyway so the DFG OSR delta encoding optimizations matter less,
  2. we only create stackmaps in B3 for exits that DFG didn’t optimize out, and
  3. B3 stackmaps only include a subset of live state (the rest may be completely described in the FTL OSR exit descriptor).

We have found that some optimizations are annoying, sometimes to the point of being impractical, to write in DFG IR because of explicit OSR exit (like MovHint deltas and exit origins). It’s not necessary to worry about those issues in B3. So far we have found that every textbook optimization for SSA is practical to do in B3. This means that we only end up having a bad time with OSR exit in our compiler when we are writing phases that benefit from DFG’s high-level knowledge; otherwise we write the phases in B3 and have a great time.

This has some surprising outcomes. Anytime FTL emits a Check value in B3, B3 may duplicate the Check. B3 IR semantics allow any code to be duplicated during optimization and this usually happens due to tail duplication. Not allowing code duplication would restrict B3 more than we’re comfortable doing. So, when the duplication happens, we handle it by having multiple FTL OSR exits share the same OSR exit descriptor but get separate value reps. It’s also possible for B3 to prove that some Check is either unnecessary (always succeeds) or is never reached. In that case, we will have one FTL OSR exit descriptor but zero FTL OSR exits. This works in such a way that DFG IR never knows that the code was duplicated and B3’s tail duplication and unreachable code elimination know nothing about OSR exit.

Patchpoints: Lambdas in the IR

This brings us to the final point about the FTL. We think that what is most novel about this compiler is its use of lambdas in its IRs. Check is one example of this. The DFG has some knowledge about what a Check would do at the machine code level, but that knowledge is incomplete until we fill in some blanks about how B3 register-allocated some arguments to the Check. The FTL handles this by having one of the operands to a B3 Check be a lambda that takes a JIT code generator object and value representations for all of the arguments. We like this approach so much that we also have B3 support Patchpoint. A Patchpoint is like an inline assembly snippet in a C compiler, except that instead of a string containing assembly, we pass a lambda that will generate that assembly if told how to get its arguments and produce its result. The FTL uses this for a bunch of cases:

  • Anytime the B3 IR generated by the FTL interacts with JavaScriptCore’s internal ABI. This includes all calls and call-like instructions.
  • Inline caches. If the FTL wants to emit an inline cache, it uses the same inline cache code generation logic that the DFG and baseline use. Instead of teaching B3 how to do this, we just tell B3 that it’s a patchpoint.
  • Lazy slow paths. The FTL has the ability to only emit code for a slow path if that slow path executes. We implement that using patchpoints.
  • Instructions we haven’t added to B3 yet. If we find some JavaScript-specific CPU instruction, we don’t have to thread it through B3 as a new opcode. We can just emit it directly using a Patchpoint. (Of course, threading it through B3 is a bit better, but it’s great that it’s not strictly necessary.)

Here’s an example of the FTL using a patchpoint to emit a fast double-to-int conversion:

if (MacroAssemblerARM64::
    supportsDoubleToInt32ConversionUsingJavaScriptSemantics()) {
    PatchpointValue* patchpoint = m_out.patchpoint(Int32);
        [=] (CCallHelpers& jit,
             const StackmapGenerationParams& params) {
                params[1].fpr(), params[0].gpr());
    patchpoint->effects = Effects::none();
    return patchpoint;

This tells B3 that it’s a Patchpoint that returns Int32 and takes a Double. Both are assumed to go in any register of B3’s choice. Then the generator uses a C++ lambda to emit the actual instruction using our JIT API. Finally, the patchpoint tells B3 that the operation has no effects (so it can be hoisted, killed, etc).

This concludes our discussion of the FTL. The FTL is our high throughput compiler that does every optimization we can think of. Because it is a speculative compiler, a lot of its design is centered around having a balanced handling of OSR exit, which involves a separation of concerns between IRs that know different amounts of things about OSR. A key to the FTL’s power is the use of lambdas in B3 IR, which allows B3 clients to configure how B3 emits machine code for some operations.

Summary of Compilation and OSR

To summarize, JavaScriptCore has two optimizing compilers, the DFG and FTL. They are based on the same IR (DFG IR), but the FTL extends this with lots of additional compiler technology (SSA and multiple IRs). The DFG is a fast compiler: it’s meant to compile faster than typical optimizing compilers. But, it generates code that is usually not quite optimal. If that code runs long enough, then it will also get compiled with the FTL, which tries to emit the best code possible.

Related Work

The idea of using feedback from cheap profiling to speculate was pioneered by the Hölzle, Chambers, and Ungar paper on polymorphic inline caches, which calls this adaptive compilation. That work used a speculation strategy based on splitting, which means having the compiler emit many copies of code, one for each possible type. The same three authors later invented OSR exit, though they called it dynamic deoptimization and only used it to enhance debugging. Our approach to speculative compilation means using OSR exit as our primary speculation strategy. We do use splitting in a very limited sense: we emit diamond speculations in those cases where we are not sure enough to use OSR and then we allow tail duplication to split the in-between code paths if they are small enough.

This speculative compilation technique, with OSR or diamond speculations but not so much splitting, first received extraordinary attention during the Java performance wars. Many wonderful Java VMs used combinations of interpreters and JITs with varied optimization strategies to profile virtual calls and speculatively devirtualize them, with the best implementations using inline caches, OSR exit, and watchpoints. Java implementations that used variants of this technique include (but are not limited to):

  • the IBM JIT, which combined an interpreter and an optimizing JIT and did diamond speculations for devirtualization.
  • HotSpot and HotSpot server, which combined an interpreter and an optimizing JIT and used diamond speculations, OSR exit, and lots of other techniques that JavaScriptCore uses. JavaScriptCore’s FTL JIT is similar to HotSpot server in the sense that both compilers put a big emphasis on great OSR support, comprehensive low-level optimizations, and graph coloring register allocation.
  • Eclipse J9, a major competitor to HotSpot that also uses speculative compilation.
  • Jikes RVM, a research VM that used OSR exit but combined a baseline JIT and an optimizing JIT. I learned most of what I know about this technique from working on Jikes RVM.

Like Java, JavaScript has turned out to be a great use case for speculative compilation. Early instigators in the JavaScript performance war included the Squirrelfish interpreter (predecessor to LLInt), the Squirrelfish Extreme JIT (what we now call the Baseline JIT), the early V8 engine that combined a baseline JIT with inline caches, and TraceMonkey. TraceMonkey used a cheap optimizing JIT strategy called tracing, which compiles lots of speculative paths. This JIT sometimes outperformed the baseline JITs, but often lost to them due to overspeculation. V8 upped the ante by introducing the speculative compilation approach to JavaScript, using the template that had worked so well in Java: a lower tier that does inline caches, then an optimizing JIT (called Crankshaft) that speculates based on the inline caches and exits to the lower tier. This version of V8 used a pair of JITs (baseline JIT and optimizing JIT), much like Jikes RVM did for Java. JavaScriptCore soon followed by hooking up the DFG JIT as an optimizing tier for the baseline JIT, then adding the LLInt and FTL JIT. During about the same time, TraceMonkey got replaced with IonMonkey, which uses similar techniques to Crankshaft and DFG. The ChakraCore JavaScript implementation also used speculative compilation. JavaScriptCore and V8 have continued to expand their optimizations with innovative compiler technology like B3 (a CFG SSA compiler) and TurboFan (a sea-of-nodes SSA compiler). Much like for Java, the top implementations have at least two tiers, with the lower one used to collect profiling that the upper one uses to speculate. And, like for Java, the fastest implementations are built around OSR speculation.


JavaScriptCore includes some exciting speculative compiler technology. Speculative compilation is all about speeding up dynamically typed programs by placing bets on what types the program would have had if it could have types. Speculation uses OSR exit, which is expensive, so we engineer JavaScriptCore to make speculative bets only if they are a sure thing. Speculation involves using multiple execution tiers, some for profiling, and some to optimize based on that profiling. JavaScriptCore includes four tiers to also get an ideal latency/throughput trade-off on a per-function basis. A control system chooses when to optimize code based on whether it’s hot enough and how many times we’ve tried to optimize it in the past. All of the tiers use a common IR (bytecode in JavaScriptCore’s case) as input and provide independent implementation strategies with different throughput/latency and speculation trade-offs.

This post is an attempt to demystify our take on speculative compilation. We hope that it’s a useful resource for those interested in JavaScriptCore and for those interested in building their own fast language implementations (especially the ones with really weird and funny features).

July 29, 2020 05:00 PM

July 16, 2020

Release Notes for Safari Technology Preview 110

Surfin’ Safari

Safari Technology Preview Release 110 is now available for download for macOS Big Sur and macOS Catalina. If you already have Safari Technology Preview installed, you can update in the Software Update pane of System Preferences on macOS.

This release covers WebKit revisions 263214-263988.


  • Added a functional WebRTC VP9 codec (r263734, r263820)
  • Allowed registering VP9 as a VT decoder (r263894)
  • Added support for freeze and pause receiver stats (r263351)
  • Added MediaRecorder.onstart support (r263671, r263896)
  • Changed MediaRecorder to support peer connection remote video tracks (r263928)
  • Enabled VTB required low latency code path (r263931)
  • Fixed MediaRecorder stopRecorder() returning an empty Blob after first use (r263511, r263633, r263891)
  • Fixed MediaRecorder.start() Method ignoring the timeslice parameter (r263565, r263651, r263892)
  • Fixed RTCDataChannel.bufferedAmount to stay the same even if channel is closed (r263655)
  • Updated the max width and height for mock sources (r263844)

Web Authentication

  • Improved UI for PIN entry for security keys

Web Animations

  • Keyframe animation with infinite iteration count doesn’t show up in the Animations timeline (r263400)


  • Changed to require a <form> to be connected before it can be submitted (r263624)
  • Fixed window.location.replace with invalid URLs to throw (r263647)
  • Fixed the behavior when setting url.search="??" (two question marks) (r263637)
  • Changed to allow selecting HEIF images if the ‘accept’ attribute includes an image MIME type that the platform can transcode (r263949)
  • Added referrerpolicy attribute support for <link> (r263356, r263442)
  • Allow setting empty host/hostname on URLs if they use file scheme (r263971)
  • Allow the async clipboard API to write data when copying via menu action or key binding (r263480)


  • Changed to check for mode=“showing” to consider a text track as selected in the tracks panel (r263802)


  • Changed to allow indefinite size flex items to be definite with respect to resolving percentages inside them (r263399)
  • Changed to not include scrollbar extents when computing sizes for percentage resolution (r263794)
  • Fixed pointer events (click/hover/etc) passing through flex items, if they have negative margin (r263659)


  • Changed to resolve viewport units against the preferred content size (r263311)


  • Fixed overlapping content when margin-right is present (r263550)
  • Fixed content sometimes missing in nested scrollers with border-radius (r263578)


  • Fixed honoring aria-modal nodes wrapped in aria-hidden (r263673)
  • Implemented relevant simulated key presses for custom ARIA widgets for increment and decrement (r263823)

Bug Fixes

  • Fixed the indeterminate progress bar animation periodically jumping in macOS Big Sur (r263952)


  • Enabled RelativeTimeFormat and Locale by default (r263227)
  • Configured option-offered numberingSystem in Intl.NumberFormat through locale (r263837)
  • Changed Intl.Collator to set usage:”search” option through ICU locale (r263833)
  • Fixed Promise built-in functions to be anonymous non-constructors (r263222)
  • Fixed incorrect TypedArray.prototype.set with primitives (r263216)

Storage Access API

  • Added the capability to call the Storage Access API as a quirk, on behalf of websites that should be doing it themselves (r263383)

Text Manipulation

  • Updated text manipulation to exclude text rendered using icon-only fonts (r263527)
  • Added a new text manipulation heuristic to decide paragraph boundary (r263958)


  • Enabled referrer policy attribute support by default (r263274)
  • Changed image crossorigin mutations to be considered “relevant mutations” (r263345, r263350)

Web Inspector

  • Added a tooltip to the icon of resources replaced by a local override explaining what happened (r263429)
  • Allow selecting text of Response (DOM Tree) in Network tab (r263872)
  • Adjusted the height of title area when Web Inspector is undocked to match macOS Big Sur (r263377, r263402)

July 16, 2020 05:35 PM

July 12, 2020

Manuel Rego: Open Prioritization and CSS Containment

Igalia WebKit

Igalia is a major contributor to all the open source web rendering engines (Blink, Gecko, Servo and WebKit). We have been doing different kind of contributions for years, which has led us to have an important position on the different communities. This allows us to help our customers to solve their problems through upstream contributions that also benefit the whole web community.

Implementing a feature in a rendering engine (or in several) might look very simple at first sight, but contributing them upstream can take a while depending on the standarization status, the related bugs, the browser architecture, and many other factors. You can find examples of things implemented by Igalia in the past on my previous blog posts, and you will realize about all the work behind some of those features.

There’s a common thing everywhere, people usually get really angry because that bug they reported years ago is still not fixed in a given browser. That can be for a variety of reasons, and not simply because the developers of that browser are very lazy and not paying attention to that particular bug. In many cases the answer to why that hasn’t been solved yet is pretty simple: priorities. Different companies and individuals contributing to the projects have their own interests and priorities, they prioritize the different issues and tasks and put the focus and effort on the ones that have a higher priority for them. A possible solution for that, now that major browsers are all open source, would be to look for a consulting company like Igalia that can fix that bug for you; but you as an individual, or even as a company, maybe you don’t have the budget to make that happen.

What would happen if we allow several parties to contribute together to the development of some features? That would make possible that both individuals and organizations that don’t have the power to implement them alone, could contribute their piece of the cake in order to add support for those features on the web platform.

Open Prioritization

Igalia is launching Open Prioritization, a crowd-founding campaign for the web platform. We believe this can open the door to many different people and organizations to prioritize the development of some features on the different web engines. Initially we have defined 6 tasks that can be found on the website, together with a FAQ explaining all the details of the campaign. 🚀

Let’s hope we can make this happen. If this is a success and some of these items get funded and implemented, probably there’ll be more in the future, including new things or ideas that you can share with us.

Open Prioritization by Igalia. An experiment in crowd-funding prioritization. Open Prioritization by Igalia

One of the tasks of the Open Prioritization campaign we’re starting this week is about adding CSS Containment support in WebKit, and we have experience working on that in Chromium.

Why CSS Containment in WebKit?

Briefly speaking CSS Containment is a standard focused in improving the rendering performance of web pages, it allows author to isolate DOM subtrees from the rest of the document, so any change that happens on the “contained” subtree doesn’t affect anything outside that element.

This is the spec behind the contain property, that can have a few values defining the “type of containment”: layout, paint, size and style. I’m not going to go deeper into this and I’ll refer to my introductory post or my CSSconf EU talk if you’re interested in getting more details about this specification.

So why we think this is important? Currently we have an issue with CSS Containment, it’s supported in Chromium and Firefox (except style containment) but not in WebKit. This might be not a big deal as it’s a performance oriented feature, so if you don’t have support you’ll simply have a worse performance and that’s all. But that’s not completely true as the different type of containments have some restrictions that apply to the contained element (e.g. layout containment makes the element become the containing block of positioned descendants), which might cause interoperability issues if you start to use the contain property in your websites.

The main goal of this task would be add CSS Containment support in WebKit, at least to the level that it’s spec compliant with the other implementations, and if time permits to implement some optimizations based on it. Once we have interoperability you can start using it wihtout any concern in your web pages, as the behavior won’t change between the different browsers and you might get some perf improvements (that will vary depending on each browser implementation).

In addition this will allow WebKit to implement further optimizations thanks to the information that the web authors provide through the contain property. On top of that, this initial support is a requirement in order to implement new features that are based on it; like the new CSS properties content-visibility and contain-intrinsic-size which are part of Display Locking feature.

If you think this is an important feature for you, please go ahead and do your pledge so it can get prioritized and implemented in WebKit upstream.

Really looking forward to seeing how this Open Prioritization campaign goes in the coming weeks. 🤞

July 12, 2020 10:00 PM

July 05, 2020

Frédéric Wang: Contributions to Web Platform Interoperability (First Half of 2020)

Igalia WebKit

Note: This blog post was co-authored by AMP and Igalia teams.

Web developers continue to face challenges with web interoperability issues and a lack of implementation of important features. As an open-source project, the AMP Project can help represent developers and aid in addressing these challenges. In the last few years, we have partnered with Igalia to collaborate on helping advance predictability and interoperability among browsers. Standards and the degree of interoperability that we want can be a long process. New features frequently require experimentation to get things rolling, course corrections along the way and then, ultimately as more implementations and users begin exploring the space, doing really interesting things and finding issues at the edges we continue to advance interoperability.

Both AMP and Igalia are very pleased to have been able to play important roles at all stages of this process and help drive things forward. During the first half of this year, here’s what we’ve been up to…

Default Aspect Ratio of Images

In our previous blog post we mentioned our experiment to implement the intrinsic size attribute in WebKit. Although this was a useful prototype for standardization discussions, at the end there was a consensus to switch to an alternative approach. This new approach addresses the same use case without the need of a new attribute. The idea is pretty simple: use specified width and height attributes of an image to determine the default aspect ratio. If additional CSS is used e.g. “width: 100%; height: auto;”, browsers can then compute the final size of the image, without waiting for it to be downloaded. This avoids any relayout that could cause bad user experience. This was implemented in Firefox and Chromium and we did the same in WebKit. We implemented this under a flag which is currently on by default in Safari Tech Preview and the latest iOS 14 beta.


We continued our efforts to enhance scroll features. In WebKit, we began with scroll-behavior, which provides the ability to do smooth scrolling. Based on our previous patch, it has landed and is guarded by an experimental flag “CSSOM View Smooth Scrolling” which is disabled by default. Smooth scrolling currently has a generic platform-independent implementation controlled by a timer in the web process, and we continue working on a more efficient alternative relying on the native iOS UI interfaces to perform scrolling.

We have also started to work on overscroll and overscroll customization, especially for the scrollend event. The scrollend event, as you might expect, is fired when the scroll is finished, but it lacked interoperability and required some additional tests. We added web platform tests for programmatic scroll and user scroll including scrollbar, dragging selection and keyboard scrolling. With these in place, we are now working on a patch in WebKit which supports scrollend for programmatic scroll and Mac user scroll.

On the Chrome side, we continue working on the standard scroll values in non-default writing modes. This is an interesting set of challenges surrounding the scroll API and how it works with writing modes which was previously not entirely interoperable or well defined. Gaining interoperability requires changes, and we have to be sure that those changes are safe. Our current changes are implemented and guarded by a runtime flag “CSSOM View Scroll Coordinates”. With the help of Google engineers, we are trying to collect user data to decide whether it is safe to enable it by default.

Another minor interoperability fix that we were involved in was to ensure that the scrolling attribute of frames recognizes values “noscroll” or “off”. That was already the case in Firefox and this is now the case in Chromium and WebKit too.

Intersection and Resize Observers

As mentioned in our previous blog post, we drove the implementation of IntersectionObserver (enabled in iOS 12.2) and ResizeObserver (enabled in iOS 14 beta) in WebKit. We have made a few enhancements to these useful developer APIs this year.

Users reported difficulties with observe root of inner iframe and the specification was modified to accept an explicit document as a root parameter. This was implemented in Chromium and we implemented the same change in WebKit and Firefox. It is currently available Safari Tech Preview, iOS 14 beta and Firefox 75.

A bug was also reported with ResizeObserver incorrectly computing size for non-default zoom levels, which was in particular causing a bug on twitter feeds. We landed a patch last April and the fix is available in the latest Safari Tech Preview and iOS 14 beta.

Resource Loading

Another thing that we have been concerned with is how we can give more control and power to authors to more effectively tell the browser how to manage the loading of resources and improve performance.

The work that we started in 2019 on lazy loading has matured a lot along with the specification.

The lazy image loading implementation in WebKit therefore passes the related WPT tests and is functional and comparable to the Firefox and Chrome implementations. However, as you might expect, as we compare uses and implementation notes it becomes apparent that determining the moment when the lazy image load should start is not defined well enough. Before this can be enabled in releases some more work has to be done on improving that. The related frame lazy loading work has not started yet since the specification is not in place.

We also added an implementation for stale-while-revalidate. The stale-while-revalidate Cache-Control directive allows a grace period in which the browser is permitted to serve a stale asset while the browser is checking for a newer version. This is useful for non-critical resources where some degree of staleness is acceptable, like fonts. The feature has been enabled recently in WebKit trunk, but it is still disabled in the latest iOS 14 beta.

Contributions were made to improve prefetching in WebKit taking into account its cache partitioning mechanism. Before this work can be enabled some more patches have to be landed and possibly specified (for example, prenavigate) in more detail. Finally, various general Fetch improvements have been done, improving the fetch WPT score. Examples are:

What’s next

There is still a lot to do in scrolling and resource loading improvements and we will continue to focus on the features mentioned such as scrollend event, overscroll behavior and scroll behavior, lazy loading, stale-while-revalidate and prefetching.

As a continuation of the work done for aspect ratio calculation of images, we will consider the more general CSS aspect-ratio property. Performance metrics such as the ones provided by the Web Vitals project is also critical for web developers to ensure that their websites provide a good user experience and we are willing to investigate support for these in Safari.

We love doing this work to improve the platform and we’re happy to be able to collaborate in ways that contribute to bettering the web commons for all of us.

July 05, 2020 10:00 PM

July 02, 2020

Philippe Normand: Web-augmented graphics overlay broadcasting with WPE and GStreamer

Igalia WebKit

Graphics overlays are everywhere nowadays in the live video broadcasting industry. In this post I introduce a new demo relying on GStreamer and WPEWebKit to deliver low-latency web-augmented video broadcasts.

Readers of this blog might remember a few posts about WPEWebKit and a GStreamer element we at Igalia worked on …

By Philippe Normand at July 02, 2020 01:00 PM

June 30, 2020

Enrique Ocaña: Developing on WebKitGTK with Qt Creator 4.12.2

Igalia WebKit

After the latest migration of WebKitGTK test bots to use the new SDK based on Flatpak, the old development environment based on jhbuild became deprecated. It can still be used with export WEBKIT_JHBUILD=1, though, but support for this way of working will gradually fade out.

I used to work on a chroot because I love the advantages of having an isolated and self-contained environment, but an issue in the way bubblewrap manages mountpoints basically made it impossible to use the new SDK from a chroot. It was time for me to update my development environment to the new ages and have it working in my main Kubuntu 18.04 distro.

My mail goal was to have a comfortable IDE that follows standard GUI conventions (that is, no emacs nor vim) and has code indexing features that (more or less) work with the WebKit codebase. Qt Creator was providing all that to me in the old chroot environment thanks to some configuration tricks by Alicia, so it should be good for the new one.

I preferred to use the Qt Creator 4.12.2 offline installer for Linux, so I can download exactly the same version in the future in case I need it, but other platforms and versions are also available.

The WebKit source code can be downloaded as always using git:

git clone git.webkit.org/WebKit.git

It’s useful to add WebKit/Tools/Scripts and WebKit/Tools/gtk to your PATH, as well as any other custom tools you may have. You can customize your $HOME/.bashrc for that, but I prefer to have an env.sh environment script to be sourced from the current shell when I want to enter into my development environment (by running webkit). If you’re going to use it too, remember to adjust to your needs the paths used there.

Even if you have a pretty recent distro, it’s still interesting to have the latests Flatpak tools. Add Alex Larsson’s PPA to your apt sources:

sudo add-apt-repository ppa:alexlarsson/flatpak

In order to ensure that your distro has all the packages that webkit requires and to install the WebKit SDK, you have to run these commands (I omit the full path). Downloading the Flatpak modules will take a while, but at least you won’t need to build everything from scratch. You will need to do this again from time to time, every time the WebKit base dependencies change:


Now just build WebKit and check that MiniBrowser works:

build-webkit --gtk
run-minibrowser --gtk

I have automated the previous steps as go full-rebuild and runtest.sh.

This build process should have generated a WebKit/WebKitBuild/GTK/Release/compile_commands.json
file with the right parameters and paths used to build each compilation unit in the project. This file can be leveraged by Qt Creator to get the right include paths and build flags after some preprocessing to translate the paths that make sense from inside Flatpak to paths that make sense from the perspective of your main distro. I wrote compile_commands.sh to take care of those transformations. It can be run manually or automatically when calling go full-rebuild or go update.

The WebKit way of managing includes is a bit weird. Most of the cpp files include config.h and, only after that, they include the header file related to the cpp file. Those header files depend on defines declared transitively when including config.h, but that file isn’t directly included by the header file. This breaks the intuitive rule of “headers should include any other header they depend on” and, among other things, completely confuse code indexers. So, in order to give the Qt Creator code indexer a hand, the compile_commands.sh script pre-includes WebKit.config for every file and includes config.h from it.

With all the needed pieces in place, it’s time to import the project into Qt Creator. To do that, click File → Open File or Project, and then select the compile_commands.json file that compile_commands.sh should have generated in the WebKit main directory.

Now make sure that Qt Creator has the right plugins enabled in Help → About Plugins…. Specifically: GenericProjectManager, ClangCodeModel, ClassView, CppEditor, CppTools, ClangTools, TextEditor and LanguageClient (more on that later).

With this setup, after a brief initial indexing time, you will have support for features like Switch header/source (F4), Follow symbol under cursor (F2), shading of disabled if-endif blocks, auto variable type resolving and code outline. There are some oddities of compile_commands.json based projects, though. There are no compilation units in that file for header files, so indexing features for them only work sometimes. For instance, you can switch from a method implementation in the cpp file to its declaration in the header file, but not the opposite. Also, you won’t see all the source files under the Projects view, only the compilation units, which are often just a bunch of UnifiedSource-*.cpp files. That’s why I prefer to use the File System view.

Additional features like Open Type Hierarchy (Ctrl+Shift+T) and Find References to Symbol Under Cursor (Ctrl+Shift+U) are only available when a Language Client for Language Server Protocol is configured. Fortunately, the new WebKit SDK comes with the ccls C/C++/Objective-C language server included. To configure it, open Tools → Options… → Language Client and add a new item with the following properties:

  • Name: ccls
  • Language: *.c;.cpp;*.h
  • Startup behaviour: Always On
  • Executable: /home/enrique/work/webkit/WebKit/Tools/Scripts/webkit-flatpak
  • Arguments: --gtk -c ccls --index=/home/enrique/work/webkit/WebKit

Some “LanguageClient ccls: Unexpectedly finished. Restarting in 5 seconds.” errors will appear in the General Messages panel after configuring the language client and every time you launch Qt Creator. It’s just ccls taking its time to index the whole source code. It’s “normal”, don’t worry about it. Things will get stable and start to work after some minutes.

Due to the way the Locator file indexer works in Qt Creator, it can become confused, run out of memory and die if it finds cycles in the project file tree. This is common when using Flatpak and running the MiniBrowser or the tests, since /proc and other large filesystems are accessible from inside WebKit/WebKitBuild. To avoid that, open Tools → Options… → Environment → Locator and set Refresh interval to 0 min.

I also prefer to call my own custom build and run scripts (go and runtest.sh) instead of letting Qt Creator build the project with the default builders and mess everything. To do that, from the Projects mode (Ctrl+5), click on Build & Run → Desktop → Build and edit the build configuration to be like this:

  • Build directory: /home/enrique/work/webkit/WebKit
  • Add build step → Custom process step
    • Command: go (no absolute route because I have it in my PATH)
    • Arguments:
    • Working directory: /home/enrique/work/webkit/WebKit

Then, for Build & Run → Desktop → Run, use these options:

  • Deployment: No deploy steps
  • Run:
    • Run configuration: Custom Executable → Add
      • Executable: runtest.sh
      • Command line arguments:
      • Working directory:

With these configuration you can build the project with Ctrl+B and run it with Ctrl+R.

I think I’m not forgetting anything more regarding environment setup. With the instructions in this post you can end up with a pretty complete IDE. Here’s a screenshot of it working in its full glory:

Anyway, to be honest, nothing will ever reach the level of code indexing features I got with Eclipse some years ago. I could find usages of a variable/attribute and know where it was being read, written or read-written. Unfortunately, that environment stopped working for me long ago, so Qt Creator has been the best I’ve managed to get for a while.

Properly configured web based indexers such as the Searchfox instance configured in Igalia can also be useful alternatives to a local setup, although they lack features such as type hierarchy.

I hope you’ve found this post useful in case you try to setup an environment similar to the one described here. Enjoy!

By eocanha at June 30, 2020 03:47 PM

June 26, 2020

App-Bound Domains

Surfin’ Safari

Many applications use WKWebView as a convenient way to display websites without requiring users to leave the app, referred to as in-app browsing. Although this can provide a great user experience, the powerful features available to developers using WKWebView allow a hosting app to monitor users across all of the sites they visit within the app.

Powerful WKWebView features, such as JavaScript injection, event handlers, and other APIs can be used by applications or utility frameworks in intrusive ways to communicate with known trackers seeking to collect and aggregate personal information about users. These tactics can reveal which images a user pauses on, what content they copy/paste, and which sections of pages they reach while scrolling.

For iOS 14.0 and iPadOS 14.0, we want to make it possible for developers to continue offering an in-app browsing experience without exposing users to tracking risks. Today we are introducing App-Bound Domains, a new, opt-in WKWebView technology to improve in-app browsing by offering greater privacy to users.

App-Bound Domains

The App-Bound Domains feature takes steps to preserve user privacy by limiting the domains on which an app can utilize powerful APIs to track users during in-app browsing. Applications that opt-in to this new feature can specify up to 10 “app-bound” domains using a new Info.plist key — WKAppBoundDomains. Note that content supplied by the app through local files, data URLs, and HTML strings are always treated as app bound domains, and do not need to be listed.

<plist version="1.0">

Once the WKAppBoundDomains key is added to the Info.plist, all WKWebView instances in the application default to a mode where JavaScript injection, custom style sheets, cookie manipulation, and message handler use is denied. To gain back access to these APIs, a WKWebView can set the limitsNavigationsToAppBoundDomains flag in their WKWebView configuration, like so:

webViewConfiguration.limitsNavigationsToAppBoundDomains = YES;

Setting this flag indicates to WebKit that a WKWebView will only navigate to app-bound domains. Once set, any attempt to navigate away from an app-bound domain will fail with the error: “App-bound domain failure.” A web view which has this configuration flag and loads an app-bound domain from the WKAppBoundDomains list, or from local resources like file URLs, data URLs, and strings, will have access to the following APIs:

  • (void)evaluateJavaScript:(NSString *)javaScriptString completionHandler:(void (^ _Nullable)(_Nullable id, NSError * _Nullable error))completionHandler
  • (void)addUserScript:(WKUserScript *)userScript;
  • window.webkit.messageHandlers

Additionally, an application will have access to the following WKHTTPCookieStore APIs for accessing cookies for app-bound domains:

  • (void)setCookie:(NSHTTPCookie *)cookie completionHandler:(nullable void (^)(void))completionHandler;
  • (void)getAllCookies:(void (^)(NSArray<NSHTTPCookie *> *))completionHandler;

All other WKWebView instances are prevented from using these APIs, since they are capable of leaking private data. This makes a WKWebView navigating to domains outside of the small set of app-bound domains work more like SafariViewController, which has built-in privacy protections like this already.

We will talk more about specific examples and benefits of App-Bound Domains below.

Example Use Cases

We will use five examples to illustrate the ways App-Bound Domains can be adopted for different types of applications:

  • UnchangedApp, an application which does not opt-in to App-Bound Domains.
  • ShopApp, an application with only self-hosted content.
  • SocialApp, an application with an in-app browser.
  • BrowserApp, a full web browser application.
  • HybridApp, an application with both self-hosted content and an in-app browser.


First let’s consider UnchangedApp, which does not opt-in to App-Bound Domains or change its behavior in any way. UnchangedApp will experience pre-iOS 14.0 WKWebView behavior, with no restricted APIs on any domains. However, the decision to not adopt App-Bound Domains in UnchangedApp could expose its users to tracking risks, for example if UnchangedApp includes third party code that surreptitiously injects script into web views.

ShopApp (self-hosted content)

Let’s look at a simple example of an application, ShopApp, which only serves web content from its own domain, shop.example. ShopApp can opt-in to App-Bound Domains by creating a new array entry in its Info.plist with the key WKAppBoundDomains. This kind of app can add up to 10 “app-bound” domains as strings in the array. This might be an example of ShopApp’s Info.plist entry:

<plist version="1.0">

In order for a WKWebView to use restricted APIs on these domains, ShopApp will also have to initialize the WKWebView with the following configuration argument:

webViewConfiguration.limitsNavigationsToAppBoundDomains = YES;

ShopApp will now have access to the full spectrum of APIs listed above when browsing on shop.example and shop-cdn.example. Note that the check for app-bound domains only occurs for the top-level frame, so ShopApp will still be able to display third party iframes from domains outside the app-bound set on shop.example.

SocialApp (with in-app browser)

Now let’s consider a social media application, SocialApp, which is used largely as an in-app browser. A SocialApp user might navigate to many different websites using the app, possibly encountering a tracker, tracker.example, during in-app browsing.

Without the protections offered by App-Bound Domains, it is possible that SocialApp is intrusively using in-app browsing to track users by communicating with tracker.example.

If the developers of SocialApp want a better user privacy experience they have two paths forward:

  1. Use SafariViewController instead of WKWebView for in-app browsing. SafariViewController protects user data from SocialApp by loading pages outside of SocialApp’s process space. SocialApp can guarantee it is giving its users the best available user privacy experience while using SafariViewController.
  2. Opt-in to App-Bound Domains. The additional WKWebView restrictions from App-Bound Domains ensure that SocialApp is not able to track users using the APIs outlined above.

To opt-in to App-Bound Domains, SocialApp only needs to add an empty WKAppBoundDomains key to their Info.plist.

<plist version="1.0">

Since SocialApp does not need any restricted APIs, no WKWebViewConfiguration arguments are necessary.

Due to the asynchronous nature of the web, a SocialApp developer could see different errors if trying to use restricted APIs and navigate to non app-bound domains in the same WKWebView. If a WKWebView in SocialApp uses a restricted API before any navigations occur, and then tries to navigate to a domain outside of the set of “app-bound” domains, the navigation will fail with the error “App-bound domain failure.” Conversely, if SocialApp first navigates to a non-app-bound domain then tries to use a restricted API, the API call will fail.

BrowserApp (exclusively browsing the web)

Another application, BrowserApp, is used exclusively for browsing the web. BrowserApp has previously received permission to take the managed entitlement com.apple.developer.web-browser, which signifies its purpose as a full web-browser. All WKWebView instances for BrowserApp will therefore have unrestricted API access on all domains. BrowserApp will not need to add a WKAppBoundDomains value to their Info.plist or make any changes to the way they initialize WKWebView.

HybridApp (both self-hosted + in-app browser)

Finally, let’s look at a more complex example. HybridApp is an application which offers in-app browsing to its users, but also requires restricted API use for WKWebView instances on its own domain, hybrid.example. HybridApp is a combination of ShopApp and SocialApp, and you should read and fully understand those examples first before considering HybridApp.

HybridApp’s Info.plist might look like this:

<plist version="1.0">

HybridApp needs to inject JavaScript on hybrid.example, so it might create a WKWebViewConfiguration with the specific argument: webViewConfiguration.limitsNavigationsToAppBoundDomains = YES;.

HybridApp can now navigate to hybrid.example and successfully inject JavaScript:

[webView loadRequest:[NSURLRequest requestWithURL:[NSURL URLWithString:@"https://hybrid.example"]];
[webView evaluateJavaScript:script completionHandler:^(id value, NSError *error) {
    if ([value isEqual:@"Successfully injected JavaScript"])
        // …

Say HybridApp tried to use this WKWebView to navigate to shop.example. Since shop.example is not an app-bound domain, this will result in a failed navigation.

Instead, HybridApp can create a different WKWebView with no limitsNavigationsToAppBoundDomains configuration flag. HybridApp can use this new WKWebView to navigate to any domain, including app-bound domains. However, any attempts to call restricted APIs will fail.

Other Options

The App-Bound Domains feature was created to allow for in-app browsing without sacrificing user privacy.

Prior to iOS 14, the only way to protect web content inside applications was to use SFSafariViewController for general web content, or ASWebAuthenticationSession for authentication purposes. We still think SafariViewController and ASWebAuthenticationSession represent the best way to protect user data, because they are views hosted outside of the application, making it impossible for applications to view or interact with the content of those views. If the developer only wishes to display web content as a convenience to the user, or if they only wish to support a web-based authentication flow, SafariViewController and ASWebAuthenticationSession continue to be the best choices.

Intelligent Tracking Prevention in WKWebView

Additionally in iOS 14.0 and macOS Big Sur, Intelligent Tracking Prevention (ITP), is enabled by default in all WKWebView applications. To learn more about how ITP protects users against web tracking, checkout this documentation on the topic.

In some extreme cases, users might need to disable ITP protections, for example when relying on web content outside of the app developer’s control. Applications can signal the need to allow users to disable ITP by adding a Purpose String for the key NSCrossWebsiteTrackingUsageDescription to the app’s Info.plist. When present, this key causes the application’s Settings screen to display a user control to disable ITP. The setting cannot be read or changed through API calls.

Note that applications taking the new Default Web Browser entitlement always have a user control in Settings to disable ITP, and don’t need to specify the NSCrossWebsiteTrackingUsageDescription key in their Info.plist.

Feedback and Bug Reports

If you find that this feature in any way doesn’t work as explained, please file a WebKit bug at https://bugs.webkit.org and CC Brent Fulgham and Kate Cheney. For feedback, please contact our web evangelist Jon Davis.

June 26, 2020 05:00 PM

June 25, 2020

Release Notes for Safari Technology Preview 109 with Safari 14 Features

Surfin’ Safari

Safari Technology Preview Release 109 is now available for download for macOS Catalina. With this release, Safari Technology Preview is now available for betas of macOS Big Sur. If you already have Safari Technology Preview installed, you can update in the Software Update pane of System Preferences on macOS. Safari Technology Preview is currently only available for Intel-based Macs.

This release includes new Safari and WebKit features that will be present in Safari 14. The following Safari 14 features are new in Safari Technology Preview 109:

Safari Web Extensions. Extensions written for Chrome, Firefox, and Edge that use the WebExtension APIs can be converted to Safari Web Extensions using Xcode 12.

Privacy Report. See the trackers that Intelligent Tracking Prevention prevented from accessing identifying information.

Improved tab management with tab previews. Tabs feature a new space-efficient design that lets you view more tabs on-screen and preview tabs to find the one you’re looking for.

Website icons in tabs. Icons in tabs are turned on by default in Safari 14.

Password breach notifications. On macOS Big Sur, Safari will notify users when one of their saved passwords in iCloud Keychain has shown up in a data breach; requesting a password change uses the well-known URL for changing passwords (https://example.com/.well-known/change-password), enabling websites to specify the page to open for updating a password.

Domain-bound codes. On macOS Big Sur, added support to Security Code AutoFill for domain-bound, one-time codes sent over SMS; in the following 2FA SMS, Safari only offers to fill the code on example.com, and no other domain.

Your Example code is 123446.

@example.com #123446

Web Authentication. Added a Web Authentication platform authenticator using Touch ID, if that capability is present (macOS Big Sur-only). Added support for PIN entry and account selection on external FIDO2 security keys.

Adobe Flash is no longer supported in Safari.

In addition to these new Safari 14 features, this release covers WebKit revisions 262502-263214 and Password Manager Resources version 10e3fca9.


  • Changed the file picker of the <input> file element to show the selection filter (r262895)
  • Changed to disallow XHR+Fetch responses when a response contains invalid header values (r262511)
  • Changed image referrerpolicy mutations to be considered “relevant mutations” (r263167)
  • Fixed empty dataTransfer.types when handling the dragstart event (r262507)
  • Fixed a case of being unable to select an item from dropdown (r263179)
  • Made ReadableStream robust against user code (r263141)


  • Fixed align-content to apply for a single line (r262716)
  • Fixed pseudo-elements (::after) in shadow roots to animate (r262711)
  • Fixed CSS custom properties for specific border properties (r262627)

Web Animations

  • Fixed animating font-size values with em units (r262946)


  • Fixed Document.currentScript to work for SVGScriptElements (r262945)
  • Fixed multiple SVG filters unexpectedly lightening the image using linearRGB (r262893)


  • Added support for IDBFactory databases method (r263157)


  • Fixed horizontally scrolling elements that are broken when revealed by toggling visibility (r262774)


  • Changed to not apply the special anchor handling when the anchor content is visible after clamping (r262892)
  • Fixed inserted text placeholder to vertically align to top and behave like a block-level element when it has 0 width (r262525)


  • Fixed a YouTube video that gets stuck after rapidly tapping on touchbar’s picture-in-picture button (r262599)
  • Added a quirk to allow starting AudioContext if document was interacted (r263025)


  • Improved SCTP cookie generation (r263154)

Back-forward Cache

  • Stopped allowing pages served over HTTPS with Cache-Control: no-store into the back-forward cache (r262978)


  • Added support for private class fields (r262613)
  • Added “el” (Greek) to our maintained available locales list (r262992)
  • Changed Logical Assignment to perform NamedEvaluation of anonymous functions (r262638)
  • Changed JSON.stringify to throw stack overflow error (r262727)
  • Changed RegExp.prototype getters to throw on cross-realm access (r262908)
  • Changed super to not depend on proto (r263035)
  • Fixed AsyncGenerator to await return completions (r262979)
  • Made errors an own property of AggregateError instead of a prototype accessor (r263006)


  • Fixed text form controls to prevent scrolling by a pixel when the value is the same length as the size (r263073)
  • Fixed observing a newly displayed element inside previously observed content (r263044)
  • Fixed text manipulation to exclude characters outside of the unicode private use area (r262645)
  • Fixed editing to handle nested item boundary elements (r263101)
  • Fixed to not re-extract elements whose children have been manipulated (r263132)
  • Fixed first and last unit in a paragraph to not contain only excluded tokens (r262601)


  • Changed <address> element to no longer map to ARIA contentinfo role (r263096)

Apple Pay

  • Added new values for -apple-pay-button-type (r262528)

Web Inspector

  • Changed text inputs to not spellcheck or autocomplete (r262848)
  • Fixed an issue where XHRs with the same URL as the main resource were not shown in the Sources Tab (r262842)
  • Improved the performance of resizing the Scope Chain panel in the details sidebar of the Sources Tab (r263115)

Web Driver

  • Fixed Automation.computeElementLayout to return iframe-relative element rects when the coordinate system is “Page” (r262997)
  • Fixed WebDriver on non-iOS ports that cannot perform ActionChain which has scrolling down to the element and click it (r262861)

June 25, 2020 06:35 PM

June 23, 2020

Async Clipboard API

Surfin’ Safari

Safari 13.1 adds support for the async Clipboard API. This API allows you, as web developers, to write and read data to and from the system clipboard, and offers several advantages over the current state of the art for reading and writing clipboard data via DataTransfer. Let’s take a look at how this new API works.


The Clipboard API introduces two new objects:

  • Clipboard, which is accessible through navigator.clipboard and contains methods for reading from and writing to the system clipboard.
  • ClipboardItem, which represents a single item on the system clipboard that may have multiple representations. Currently, when reading and writing data, WebKit supports four MIME type representations: "text/plain", "text/html", "text/uri-list", and "image/png".

Conceptually, the Clipboard is represented by an ordered list of one or more ClipboardItem, each of which may have multiple type representations. For instance, when writing several PNG images to the clipboard, you should create a ClipboardItem for each image, with a single "image/png" type representation for each item. When writing a single PNG image with some alt text, you would instead write a single ClipboardItem with two representations: "image/png" and "text/plain".

You can use clipboard.read to extract data from the system clipboard; this asynchronously retrieves an array of ClipboardItem, each containing a mapping of MIME type to Blob. Similarly, clipboard.write can be used to write a given array of ClipboardItem to the system clipboard. However, if you are only trying to read or write plain text, you may find the methods clipboard.readText and clipboard.writeText to be more ergonomic.

Each ClipboardItem also has a presentationStyle, which may indicate whether the item is best represented as inline data or an “attachment” (that is, a file-like entity). This distinction may be useful in order to tell a copied text selection on a webpage apart from a copied HTML file.

Let’s dive into some examples below to see how to programmatically read and write data.

Writing Data

Consider this basic example, which implements a button that copies plain text when clicked:

<button id="new-copy">Copy text</button>
document.getElementById("new-copy").addEventListener("click", event => {
    navigator.clipboard.writeText("This text was copied programmatically.");

This is much simpler than the current method of programmatically copying text, which requires us to select a text field and execute the “copy” command:

<button id="old-copy">Copy text</button>
document.getElementById("old-copy").addEventListener("click", event => {
    let input = document.createElement("input");
    input.style.opacity = "0";
    input.style.position = "fixed";
    input.value = "This text was also copied programmatically.";

    input.setSelectionRange(0, input.value.length);


When using clipboard.write to copy data, you need to create an array of ClipboardItem. Each ClipboardItem is initialized with a mapping of MIME type to Promise which may resolve either to a string or a Blob of the same MIME type. The following example uses clipboard.write to copy a single item with both plain text and HTML representations.

<button id="copy-html">Copy text and markup</button>
<div>Then paste in the box below:</div>
<div contenteditable spellcheck="false" style="width: 200px; height: 100px; overflow: hidden; border: 1px solid black;"></div>
document.getElementById("copy-html").addEventListener("click", event => {
        new ClipboardItem({
            "text/plain": Promise.resolve("This text was copied using `Clipboard.prototype.write`."),
            "text/html": Promise.resolve("<p style='color: red; font-style: oblique;'>This text was copied using <code>Clipboard.prototype.write</code>.</p>"),

A similar implementation using existing DataTransfer API would require us to create a hidden text field, install a copy event handler on the text field, focus it, trigger a programmatic copy, set data on the DataTransfer (within the copy event handler), and finally call preventDefault.

Note that both clipboard.write and clipboard.writeText are asynchronous. If you attempt to write to the clipboard while a prior clipboard writing invocation is still pending, the previous invocation will immediately reject, and the new content will be written to the clipboard.

On both iOS and macOS, the order in which types are written to the clipboard is also important. WebKit writes data to the system pasteboard in the order specified — this means types that come before other types are considered by the system to have “higher fidelity” (that is, preserve more of the original content). Native apps on macOS and iOS may use this fidelity order as a hint when choosing an appropriate UTI (universal type identifier) to read.

Reading Data

Data extraction follows a similar flow. In the following example, we:

  1. Use clipboard.read to obtain a list of clipboard items.
  2. Resolve the first item’s "text/html" data to a Blob using clipboardItem.getType.
  3. Use the FileReader API to read the contents of the Blob as text.
  4. Display the pasted markup by setting the innerHTML of a container <div>.
<span style="font-weight: bold; background-color: black; color: white;">Select this text and copy</span>
<div><button id="read-html">Paste HTML below</button></div>
<div id="html-output"></div>
document.getElementById("read-html").addEventListener("click", async clickEvent => {
    let items = await navigator.clipboard.read();
    for (let item of items) {
        if (!item.types.includes("text/html"))

        let reader = new FileReader;
        reader.addEventListener("load", loadEvent => {
            document.getElementById("html-output").innerHTML = reader.result;
        reader.readAsText(await item.getType("text/html"));

There are a couple of interesting things to note here:

  • Like writing data, reading data is also asynchronous; the processes of both fetching every ClipboardItem and extracting a Blob from a ClipboardItem return promises.
  • Type fidelities are preserved when reading data. This means the order in which types were written (either using system API on iOS and macOS, or the async clipboard API) is the same as the order in which they are exposed upon reading from the clipboard.

Security and Privacy

The async clipboard API is a powerful web API, capable of both writing arbitrary data to the clipboard, as well as reading from the system clipboard. As such, there are serious security ramifications when allowing pages to write data to the clipboard, and privacy ramifications when allowing pages to read from the clipboard. Clearly, untrusted web content shouldn’t be capable of extracting sensitive data — such as passwords or addresses — without explicit consent from the user. Other vulnerabilities are less obvious; for instance, consider a page that writes "text/html" to the pasteboard that contains malicious script. When pasted in another website, this could result in a cross-site scripting attack!

WebKit’s implementation of the async clipboard API mitigates these issues through several mechanisms.

  • The API is limited to secure contexts, which means that navigator.clipboard is not present for http:// websites.
  • The request to write to the clipboard must be triggered during a user gesture. A call to clipboard.write or clipboard.writeText outside the scope of a user gesture (such as "click" or "touch" event handlers) will result in the immediate rejection of the promise returned by the API call.
  • Both "text/html" and "image/png" data is sanitized before writing to the pasteboard. Markup is loaded in a separate document where JavaScript is disabled, and only visible content is then extracted from this page. Content such as <script> elements, comment nodes, display: none; elements and event handler attributes are all stripped away. For PNG images, the image data is first decoded into a platform image representation, before being re-encoded and sent to the platform pasteboard for writing. This ensures that a website cannot write corrupted or broken images to the pasteboard. If the image data cannot be decoded, the writing promise will reject. Additional information about WebKit’s sanitization mechanisms is available in the Clipboard API Improvements blog post.
  • Since users may not always be aware that sensitive content has been copied to the pasteboard, restrictions on the ability to read are more strict than the restrictions on the ability to write. If a page attempts to programmatically read from the pasteboard outside of a user gesture, the promise will immediately reject. If the user is explicitly triggering a paste during the gesture (for instance, using a keyboard shortcut on macOS such as ⌘V or pasting using the “Paste” action on the callout bar on iOS), WebKit will allow the page to programmatically read the contents of the clipboard. Programmatic clipboard access is also automatically granted in the case where the contents of the system clipboard were written by a page with the same security origin. If neither of the above are true, WebKit will show platform-specific UI which the user may interact with to proceed with a paste. On iOS, this takes the form of a callout bar with a single option to paste; on macOS, it is a context menu item. Tapping or clicking anywhere in the page (or performing any other actions, such as switching tabs or hiding Safari) will cause the promise to be rejected; the page is granted programmatic access to the clipboard only if the user manually chooses to paste by interacting with the platform-specific UI.
  • Similar to writing data, reading data from the system clipboard involves sanitization to prevent users from unknowingly exposing sensitive information. Image data read from the clipboard is stripped of EXIF data, which may contain details such as location information and names. Likewise, markup that is read from the clipboard is stripped of hidden content, such as comment nodes.

These policies ensure that the async clipboard API allows developers to deliver great experiences without the potential to be abused in a way that compromises security or privacy for users.

Future Work

As we continue to iterate on the async clipboard API, we’ll be adding support for custom pasteboard types, and will also consider support for additional MIME types, such as "image/jpeg" or "image/svg+xml". As always, please let us know if you encounter any bugs (or if you have ideas for future enhancements) by filing bugs on bugs.webkit.org.

June 23, 2020 11:35 PM

June 16, 2020

Víctor Jáquez: WebKit Flatpak SDK and gst-build

Igalia WebKit

This post is an annex of Phil’s Introducing the WebKit Flatpak SDK. Please make sure to read it, if you haven’t already.

Recapitulating, nowadays WebKitGtk/WPE developers —and their CI infrastructure— are moving towards to Flatpak-based environment for their workflow. This Flatpak-based environment, or Flatpak SDK for short, can be visualized as a software sandboxed-container, which bundles all the dependencies required to compile, run and debug WebKitGtk/WPE.

In a day-by-day work, this approach removes the potential compilation of the world in order to obtain reproducible builds, improving the development and testing work flow.

But what if you are also involved in the development of one dependency?

This is the case of Igalia’s multimedia team where, besides developing the multimedia features for WebKitGtk and WPE, we also participate in the GStreamer development, the framework used for multimedia.

Because of this, in our workflow we usually need to build WebKit with a fix, hack or new feature in GStreamer. Is it possible to add in Flatpak our custom GStreamer build without messing its own GStreamer setup? Yes, it’s possible.

gst-build is a set of scripts in Python which clone GStreamer repositories, compile them and setup an uninstalled environment. This uninstalled environment allows a transient usage of the compiled framework from their build tree, avoiding installation and further mess up with our system.

The WebKit scripts that wraps Flatpak operations are also capable to handle the scripts of gst-build to build GStreamer inside the container, and, when running WebKit’s artifacts, the scripts enable the mentioned uninstalled environment, overloading Flatpak’s GStreamer.

How do we unveil all this magic?

First of all, setup a gst-build installation as it is documented. In this installation is were the GStreamer plumbing is done.

Later, gst-build operations through WebKit compilation scripts are enabled when the environment variable GST_BUILD_PATH is exported. This variable should point to the directory where the gst-build tree is placed.

And that’s all!

But let’s put these words in actual commands. The following workflow assumes that WebKit repository is cloned in ~/WebKit and the gst-build tree is in ~/gst-build (please, excuse my bashisms).

Compiling WebKitGtk with symbols, using LLVM as toolchain (this command will also compile GStreamer):

$ cd ~/WebKit
% CC=clang CXX=clang++ GST_BUILD_PATH=/home/vjaquez/gst-build Tools/Scripts/build-webkit --gtk --debug

Running the generated minibrowser (remind GST_BUILD_PATH is required again for a correct linking):

$ GST_BUILD_PATH=/home/vjaquez/gst-build Tools/Scripts/run-minibrowser --gtk --debug

Running media layout tests:

$ GST_BUILD_PATH=/home/vjaquez/gst-build ./Tools/Scripts/run-webkit-tests --gtk --debug media

But wait! There’s more...

What if you I want to parametrize the GStreamer compilation. To say, I would like to enable a GStreamer module or disable the built of a specific element.

gst-build, as the rest of GStreamer modules, uses meson build system, so it’s possible to pass arguments to meson through the environment variable GST_BUILD_ARGS.

For example, I would like to enable gstreamer-vaapi 😇

$ cd ~/WebKit
% CC=clang CXX=clang++ GST_BUILD_PATH=/home/vjaquez/gst-build GST_BUILD_ARGS="-Dvaapi=enabled" Tools/Scripts/build-webkit --gtk --debug

By vjaquez at June 16, 2020 11:49 AM

June 11, 2020

Release Notes for Safari Technology Preview 108

Surfin’ Safari

Safari Technology Preview Release 108 is now available for download for macOS Catalina and macOS Mojave. If you already have Safari Technology Preview installed, you can update in the Software Update pane of System Preferences on macOS.

This release covers WebKit revisions 262002-262502.

Web Inspector

  • Network
    • Fixed updating statistics when filtering (r262263)
    • Fixed gaps around the “An error occurred trying to load this resource” message (r262162)
  • Storage
    • Prevented requesting the list of IndexedDB database names multiple times for the same security origin (r262077)
  • Graphics
    • Added support for the id (name) of the animation if it exists (r262404)
    • Fixed populating text editors in the Keyframes section when the Animation panel sidebar in the details sidebar is first shown (r262307)
  • Miscellaneous
    • Fixed ⌘G to not override the current query of the find banner if it’s visible (r262173)


  • Fixed SVG text node’s with content getting described as “empty group” even if it’s not empty (r262500)
  • Fixed ignoring images with an empty alt attribute (r262224)


  • Fixed <area> to require being connected in order to navigate (r262359)
  • Fixed the pageshow event only firing the first time the back button is pressed (r262221)
  • Fixed Array.prototype.splice not setting the length of the returned object if not an Array (r262088)
  • Fixed incorrect location.origin in blob workers (r262026)
  • Implemented ParentNode.prototype.replaceChildren (r262381)


  • Changed the calculation to compute the hypothetical cross size of each item in flexbox to use fit-content, not max-content (r262411)
  • Changed to allow indefinite size flex items to be definite with respect to resolving percentages inside them (r262124)
  • Fixed dynamically setting position: absolute in a grid item to trigger a relayout of that element (r262481)
  • Fixed tables as flex items to obey the flex container sizing (r262378)
  • Fixed styling ::selection for a flex container (r262049)
  • Prevented grid-template-rows from serializing adjacent <line-names> (r262130)
  • Prevented putting out-of-flow boxes in anonymous flex items or grid items (r262061)


  • Fixed BigInt operations to handle exceptions correctly (r262386)


  • Fixed scrolling on a mercurynews.com article (r262127)
  • Fixed stuttery overflow scrolling in slow-scrolling regions (r262094)
  • Fixed rendering artifacts when scrolling overlays (r262177)


  • Fixed incorrect clipping of absolute and fixed elements inside stacking-context composited overflow: hidden (r262237)

Async Clipboard API

  • Added support for reading "image/png" on ClipboardItem (r262209)
  • Fixed DataTransfer.prototype.files containing multiple files when pasting a single image with multiple representations (r262047)

Web Animations

  • Avoided starting CSS Transitions for a property when a CSS Animations or JavaScript-originated animation is running for the same property (r262154)
  • Fixed SVG animations to not stop when other animators are still running (r262175)


  • Fixed Picture-in-Picture API issues under stress tests (r262038)
  • Fixed scrubbing video on www.judiciary.senate.gov ( r262169)
  • Fixed fullscreen animation missing a few frames at beginning (r262322)
  • Fixed transition between encrypted and clear codecs throwing an error (r262364)
  • Fixed video freezing when attaching a local MediaStream to multiple elements (r262189)
  • Made setting fullscreen mode more robust under stress tests (r262456)

June 11, 2020 05:30 PM

Diego Pino: Renderization of Conic gradients

Igalia WebKit

The CSS Images Module Level 4 introduced a new type of gradient: conic-gradient. Until then, there were only two other type of gradients available on the Web: linear-gradient and radial-gradient.

The first browser to ship conic-gradient support was Google Chrome, around March 2018. A few months after, September 2018, the feature was available in Safari. Firefox have been missing support until now, although an implementation is on the way and will ship soon. In the case of WebKitGTK (Epiphany) and WPE (Web Platform for Embedded), support landed in October 2019 which I implemented as part of my work at Igalia. The feature has been officially available in WebKitGTK and WPE since version 2.28 (March 2020).

Before native browser support, conic-gradient was available as a JavaScript polyfill created by Lea Verou.

Gradients in the Web

Generally speaking, a gradient is a smooth transition of colors defined by two or more stop-colors. In the case of a linear gradient, this transition is defined by a straight line (which might have and angle or not).

div.linear-gradient {
  width: 400px;
  height: 100px;
  background: linear-gradient(to right, red, yellow, lime, aqua, blue, magenta, red);
Linear gradientLinear gradient

In the case of a radial gradient, the transition is defined by a center and a radius. Colors expand evenly in all directions from the center of the circle to outside.

div.radial-gradient {
  width: 300px;
  height: 300px;
  border-radius: 50%;
  background: radial-gradient(red, yellow, lime, aqua, blue, magenta, red);
Radial gradientRadial gradient

A conical gradient, although also defined by a center and a radius, isn’t the same as a radial gradient. In a conical gradient colors spin around the circle.

div.conic-gradient {
  width: 300px;
  height: 300px;
  border-radius: 50%;
  background: conic-gradient(red, yellow, lime, aqua, blue, magenta, red);
Conic gradientConic gradient

Implementation in WebKitGTK and WPE

At the time of implementing support in WebKitGTK and WPE, the feature had already shipped in Safari. That meant WebKit already had support for parsing the conic-gradient specification as defined in CSS Images Module Level 4 and the data structures to store relevant information were already created. The only piece missing in WebKitGTK and WPE was painting.

Safari leverages many of its graphical painting operations on CoreGraphics library, which counts with a primitive for conic gradient painting (CGContextDrawConicGradient). Something similar happens in Google Chrome, although in this case the graphics library underneath is Skia (CreateTwoPointConicalGradient). WebKitGTK and WPE use Cairo for many of their graphical operations. In the case of linear and radial gradients, there’s native support in Cairo. However, there isn’t a function for conical gradient painting. This doesn’t mean Cairo cannot be used to paint conical gradients, it just means that is a little bit more complicated.

Mesh gradients

Cairo documentation states is possible to paint a conical gradient using a mesh gradient. A mesh gradient is defined by a set of colors and control points. The most basic type of mesh gradient is a Gouraud-shading triangle mesh.

cairo_mesh_pattern_begin_patch (pattern)

cairo_mesh_pattern_move_to (pattern, 100, 100);
cairo_mesh_pattern_line_to (pattern, 130, 130);
cairo_mesh_pattern_line_to (pattern, 130,  70);

cairo_mesh_pattern_set_corner_color_rgb (pattern, 0, 1, 0, 0);
cairo_mesh_pattern_set_corner_color_rgb (pattern, 1, 0, 1, 0);
cairo_mesh_pattern_set_corner_color_rgb (pattern, 2, 0, 0, 1);

cairo_mesh_pattern_end_patch (pattern)
Gouraud-shaded triangle meshGouraud-shaded triangle mesh

A more sophisticated patch of mesh gradient is a Coons patch. A Coons patch is a quadrilateral defined by 4 cubic Bézier curve and 4 colors, one for each vertex. A Bézier curve is defined by 4 points, so we have a total of 12 control points (and 4 colors) in a Coons patch.

cairo_mesh_pattern_begin_patch (pattern);

cairo_mesh_pattern_move_to (pattern, 45, 12);
cairo_mesh_pattern_curve_to(pattern, 69, 24, 173, -15, 115, 50);
cairo_mesh_pattern_curve_to(pattern, 127, 66, 174, 47, 148, 104);
cairo_mesh_pattern_curve_to(pattern, 65, 58, 70, 69, 18, 103);
cairo_mesh_pattern_curve_to(pattern, 42, 43, 63, 45, 45, 12);

cairo_mesh_pattern_set_corner_color_rgb (pattern, 0, 1, 0, 0); // red
cairo_mesh_pattern_set_corner_color_rgb (pattern, 1, 0, 1, 0); // green
cairo_mesh_pattern_set_corner_color_rgb (pattern, 2, 0, 0, 1); // blue
cairo_mesh_pattern_set_corner_color_rgb (pattern, 3, 1, 1, 0); // yellow

cairo_mesh_pattern_end_patch (pattern);
Coons patch gradientCoons patch gradient

A Coons patch comes very handy to paint a conical gradient. Consider the first quadrant of a circle, such quadrant can be easily defined with a Bézier curve.

cairo_mesh_pattern_begin_patch (pattern);

cairo_mesh_pattern_move_to (pattern, 0, 200);
cairo_mesh_pattern_line_to (pattern, 0, 0);
cairo_mesh_pattern_curve_to (pattern, 133, 0, 200, 133, 200, 200);
cairo_mesh_pattern_line_to (pattern, 0, 200);

cairo_mesh_pattern_set_corner_color_rgb (pattern, 0, 1, 0, 0); // red
cairo_mesh_pattern_set_corner_color_rgb (pattern, 1, 0, 1, 0); // green
cairo_mesh_pattern_set_corner_color_rgb (pattern, 2, 0, 0, 1); // blue
cairo_mesh_pattern_set_corner_color_rgb (pattern, 3, 1, 1, 0); // yellow

Coons patch of the first quadrant of a circleCoons patch of the first quadrant of a circle

If we just simply use two colors instead, the final result resembles more to how a conical gradient looks.

cairo_mesh_pattern_set_corner_color_rgb (pattern, 0, 1, 0, 0); // red
cairo_mesh_pattern_set_corner_color_rgb (pattern, 1, 1, 0, 0); // red
cairo_mesh_pattern_set_corner_color_rgb (pattern, 2, 1, 1, 0); // yellow
cairo_mesh_pattern_set_corner_color_rgb (pattern, 3, 1, 1, 0); // yellow
Coons patch of the first quadrant of a circle (2 colors)Coons patch of the first quadrant of a circle (2 colors)

Repeat this step 3 times more, with a few more stop colors, and you have a nice conical gradient.

A conic gradient made by composing mesh patchesA conic gradient made by composing mesh patches

Bézier curve as arcs

At this point the difficulty of painting a conical gradient has been reduced to calculating the shape of the Bézier curve of each mesh patch.

Computing the starting and ending points is straight forward, however calculating the position of the other two control points of the Bézier curve is a bit much harder.

Bézier curve approximation to a circle quadrantBézier curve approximation to a circle quadrant

Mozillian Michiel Kamermans (pomax) has a beautifully written essay on Bézier curves. Section “Circles and cubic Bézier curves” of such essay discusses how to approximate a Bézier curve to an arc. The case of a circular quadrant is particularly interesting because it allows painting a circle with 4 Bézier curves with minimal error. In the case of the quadrant above the values for each point would be the following:

S = (0, r), CP1 = (0.552 * r, r), CP2 = (r, 0.552 * r), E = (r, 0) 

Even though on its most basic form a conic gradient is defined by one starting and one ending color, painting a circle with two Bézier curves is not a good approximation to a semicircle (check the interactive examples of pomax’s Bézier curve essay). In such case, the conic gradient is split into four Coon patches with middle colors interpolated.

Also, in cases were there are more than 4 colors, each Coons patch will be smaller than a quadrant. It’s necessary a general formula that can compute the control points for each section of the circle, given an angle and a radius. After some math, the following formula can be inferred (check section “Circle and cubic Bézier curves” in pomax’s essay):

cp1 = {
   x: cx + (r * cos(angleStart) - f * (r * sin(angleStart),
   y: cy + (r * sin(angleStart)) + f * (r * cos(angleStart))
cp2 = {
   x: cx + (r * cos(angleEnd)) + f * (r * sin(angleEnd)),
   y: cy + (r * sin(angleEnd)) - f * (r * cos(angleEnd))

where f is a variable computed as:

f = 4 * tan((angleEnd - angleStart) / 4) / 3;

For a 90 degrees angle the value of f is 0.552. Thus, if the quadrant above had a radius of 100px, the values of the control points would be: CP1(155.2, 0) and CP2(200, 44.8) (considering top corner left as point 0,0).

And that’s basically all that is needed. The formula above allows us to compute a circular sector as a Bézier line, which when setup as a Coons patch creates a section of a conical gradient. Adding several Coons patches together creates the final conical gradient.

Wrapping up

It has been a long time since conic gradients for the Web were first drafted. For instance, the current bug in Firefox’s Bugzilla was created by Lea Verou five years ago. Fortunately, browsers have started shipping native support and conical gradients have been available in Chrome and Safari since two years ago. In this post I discussed the implementation, mainly rendering, of conic gradients in WebKitGTK and WPE. And since both browsers are WebKit based, they can leverage on the implementation efforts led by Apple when bringing support of this feature to Safari. With Firefox shipping conic gradient support soon this feature will be safe to use in the Web Platform.

June 11, 2020 12:00 AM

June 09, 2020

Philippe Normand: WebKitGTK and WPE now supporting videos in the img tag

Igalia WebKit

Using videos in the <img> HTML tag can lead to more responsive web-page loads in most cases. Colin Bendell blogged about this topic, make sure to read his post on the cloudinary website. As it turns out, this feature has been supported for more than 2 years in Safari, but …

By Philippe Normand at June 09, 2020 04:00 PM

June 08, 2020

Philippe Normand: Introducing the WebKit Flatpak SDK

Igalia WebKit

Working on a web-engine often requires a complex build infrastructure. This post documents our transition from JHBuild to Flatpak for the WebKitGTK and WPEWebKit development builds.

For the last 10 years, WebKitGTK has been relying on a custom JHBuild moduleset to handle its dependencies and (try to) ensure a reproducible …

By Philippe Normand at June 08, 2020 04:50 PM

June 05, 2020

Paulo Matos: JSC: what are my options?

Igalia WebKit

Compilers tend to be large pieces of software that provide an enormous amount of options. We take a quick look at how to find what JavaScriptCore (JSC) provides.


By Paulo Matos at June 05, 2020 12:23 PM

May 28, 2020

Release Notes for Safari Technology Preview 107

Surfin’ Safari

Safari Technology Preview Release 107 is now available for download for macOS Catalina and macOS Mojave. If you already have Safari Technology Preview installed, you can update in the Software Update pane of System Preferences on macOS.

This release covers WebKit revisions 261057-262002.

Web Inspector

  • Network
    • Adjusted the spacing of the navigation items so that none are hidden when previewing a resource (r261497)
  • Sources
    • Added showing the name of the Worker if it exists as the title of its main resource (r261104)
    • Fixed source mapping issue when combining single-child directory chains (r261200)
    • Fixed restoring global DOM, URL, or event breakpoints incorrectly enabling all breakpoints (r261340)
    • Supported CSS pretty printing when a url() is nested inside a url() (r261772)
  • Timelines
    • Fixed the memory stacked area graph to not extend beyond the “stopping time” marker (r261197)
  • Storage
    • Fixed double-clicking on a cookie field to start editing that cookie (r261339)
  • Layers
    • Ensured that the text at the bottom of the details sidebar doesn’t overlap (r261237)
  • Console
    • Added showing EventTarget listeners as an internal property (r261670)
    • Added showing Worker name as an internal property (r261499)
  • Miscellaneous
    • Accessibility
      • Fixed Left/Right arrow keys to collapse/expand details sections (r261962)
    • Remote Inspection
      • Provided a way to turn on or off ITP debug mode and AdClickAttribution debug mode (r261103)
      • Dropped support for iOS 8.x, iOS 9.x, and iOS 10.x (r261108, r261109, r261105)


  • Changed the initial value of transform-box to be view-box to fix some SVG animations (r261752)
  • Fixed computing the correct perspective matrix on a box with borders and overflow: hidden (r261619)
  • Fixed Object.prototype.toString to match standards (r261159)
  • XML external entity resources should only be loaded from XML MIME types (r261443)


  • Changed the cursor to update during the rendering steps, rather than on a 20ms timer (r261741)
  • Fixed the computed min-width and min-height for auto depending on the box (r261974)
  • Fixed disappearing content with CSS parallax on an overflow scroll (r261837)


  • Fixed repaint issues when the login field collapses on music.apple.com (r261979)
  • Fixed text clipped when rendered with fonts which have a negative line gap metric (r261573)
  • Fixed table sizing when max-width is used (r261924)


  • Fixed tapping on the trackpad in a <select> to flash the scrollers (r261368)
  • Fixed a <select> that sometimes becomes non-scrollable (r261427)
  • Fixed overflow scrollbars not growing when hovered (r261132)
  • Fixed find not always scrolling search results into view (r261819)
  • Fixed composited scrolling interfering with the propagation of perspective (r261632)
  • Fixed scrollbars flickering in RTL scrollable regions (r261535)


  • Changed to ignore a poster set after playback begins (r261341, r261576)
  • Fixed media controls tracks menu showing “Auto” selected instead of the track selected via the JavaScript API (r261084)


  • Improved the accuracy of IndexedDB estimated write size computation (r261533)
  • Fixed a bug that could cause IndexedDB log files to grow without bound (r261533)


  • Implemented Intl.Locale (r261215)
  • Implemented BigInt.asIntN and BigInt.asUintN (r261156, r261199)
  • Enabled logical assignment operators (r261728)
  • Ensured IntlCollator.prototype.resolvedOptions returns relevant locale extension keys in alphabetical order (r261182)

Web Animations

  • Fixed the animation engine to not wake up every tick for steps() timing functions (r261926)
  • Fixed animations with a single keyframe not getting accelerated (r261756)
  • Fixed calling reverse() on an accelerated animation having no effect (r261637)
  • Coordinated “update animations and send events” procedure across multiple timelines (r261218)
  • Fixed Document.getAnimations() to only consider document connection and not timeline association (r261488)
  • Fixed the animation of font-size using rem values (r261861)

Async Clipboard API

  • Enabled clipboard API access when pasting from a menu item or key binding (r261825)
  • Fixed cut and paste from Google Doc to Notes in several (non-Latin) languages (r261247)
  • Preserved character set information when writing to the pasteboard when copying rich text (r261395)


  • Implemented accessibility of HTML 5.1 Drag & Drop (r261248)

CSS Grid

  • Cleared the override width for computing percent margins (r261841)
  • Changed to treat percentages as auto for the minimum contribution (r261767)
  • Fixed auto repeat with multiple tracks and gutters (r261949)

Bug Fixes

  • Added quirk for cookie blocking latch mode aolmail.com redirecting to aol.com under aol.com (r261724)
  • Changed to enforce a URL cannot have a username, password, or port if its host is null (r261173)
  • Changed XML external entities to require an XML MIME type to be loaded (r261451)
  • Fixed the playhead in Touch Bar continuing when loading stalls (r261342)
  • Fixed the search field on mayoclinic.org clipping the submit button (r261450)
  • Fixed setting a host on a URL when no port is specified (r261212)
  • Limited the HTTP referer to 4KB (r261402)

May 28, 2020 05:30 PM

May 14, 2020

Release Notes for Safari Technology Preview 106

Surfin’ Safari

Safari Technology Preview Release 106 is now available for download for macOS Catalina and macOS Mojave. If you already have Safari Technology Preview installed, you can update in the Software Update pane of System Preferences on macOS.

This release covers WebKit revisions 260266-261057.

Web Inspector

  • Sources
    • Ensured “Step Over” only steps through comma expressions if they not nested (r260520)
  • Storage
    • Fixed third-party cookie display (r260807)
    • Added support for selecting multiple local storage entries (r260613)
  • Miscellaneous
    • Updated find dialog to populate the search string from the system find pasteboard (r260847, r260887, r260895)
    • Fixed the filter bar in the navigation sidebar to respect the global search settings (r260386)

Async Scrolling

  • Enabled async frame and overflow scrolling by default on macOS (r260276)
  • Fixed an overflow that’s hidden on one axis to be scrollable on that axis (r260450)

Web Animations

  • Fixed applying keyframe easings to transforms (r260360)
  • Changed to guarantee assigning an element to effect.target keeps the element alive, even without other references to it (r260705)
  • Implemented jump-* functions for steps() timing functions (r261046)


  • Added support for :where() pseudo class (r260319)
  • Fixed :is() and :where() to not allow pseudo-elements when parsing (r260338)
  • Fixed border-radius failing to clip composited iframe contents (r260950)


  • Enabled BigInt (r260345)
  • Changed BigInt constructor to accept larger integers than safe-integers (r260863)
  • Added support for Intl.RelativeTimeFormat (r260349)
  • Redesigned for-of iteration for arrays (r260323)


  • Updated getDisplayMedia to respect aspect ratio with max constraints (r260561, r260638)


  • Fixed the visibilitychange event to bubble per spec (r260483)


  • Changed to ensure a remote track event gets unmuted after the track event is fired (r260813)
  • Fixed audio session category to be set incorrectly after changing video source with MSE (r261004)
  • Fixed video elements to return to an incorrect position when exiting fullscreen (r260150)


  • Fixed flickering header when scrolling articles with fixed position elements (r260828)
  • Fixed content disappearing in a CSS-based parallax implementation (r260371)
  • Fixed a blank header on a site by changing to not use stale containing block width value while computing preferred width (r260905)
  • Fixed oversized caret and selection rects in text fields (r260367)

Bug Fix

  • Enabled using credentials for same-origin CSS mask images (r260598)

May 14, 2020 05:15 PM

April 23, 2020

Release Notes for Safari Technology Preview 105

Surfin’ Safari

Safari Technology Preview Release 105 is now available for download for macOS Catalina and macOS Mojave. If you already have Safari Technology Preview installed, you can update in the Software Update pane of System Preferences on macOS.

This release covers WebKit revisions 259476-260266.


  • Added Selectors Level 4 specificity calculation for pseudo classes (r260024, r260069)
  • Added support for font-relative lh and rlh unit frp, CSS Values Level 4 specification (r259703)
  • Corrected the computed style for outline-offset when outline-style is none (r259562)
  • Fixed bad style sharing between sibling elements with different part attributes for CSS Shadow Parts (r259877)
  • Implemented the CSS Color Level 4 behavior for inheritance of currentColor (r259532)
  • Prevented caching definite height against perpendicular flex items (r260055)


  • Fixed Intl.DateTimeFormat patterns and fields (r260145)
  • Implemented BigInt.prototype.toLocaleString` (r259919)
  • Updated Intl to allow calendar and numberingSystem options (r259941)
  • Implemented logical assignment operators (r260119)
  • Updated canonicalizeLocaleList to gracefully throw OOM error if the input and error message is too large (r259481)
  • Updated module’s default cross-origin value should be “anonymous” (r260003, r260038)


  • Made a change to update ScreenTime as playback state changes (r260182, r260201)
  • Filtered some capture device names (r259477)
  • Added support for applying a frameRate limit when the request stream is from Camera (r260245)

Web Animations

  • Added support for pseudoElement on KeyframeEffect and KeyframeEffectOptions (r260139)
  • Fixed computing transition-property correctly when transition-duration is set to inherit (r259720)


  • Fixed smart invert to handle the picture elements on foxnews.com (r260092)


  • Fixed drawing an image srcRect and imageRect to be in the same orientation of destRect (r260016)
  • Fixed a missing gradient banner on fastclick.com (r259701)


  • Fixed querySelector("#\u0000") to match an element with ID U+FFFD (r259773)
  • Fixed scroll snap in subframes when async overflow scroll is enabled (r260086)
  • Fixed zoom changes to not affect ResizeObserverSize (r259578)
  • Updated CanvasRenderingContext2D.drawImage to ignore the EXIF orientation if the image-orientation is none (r259567)
  • Updated documentFragment.getElementById() not work for empty-string IDs (r259651)
  • Updated baseURL for a module script to be the response URL, not the request URL (r260131)

Web Inspector

  • Elements Tab
    • De-indented items in the Variables section in the Computed sidebar panel so that wrapped content doesn’t line up with -- (r260096)
  • Sources Tab
    • Added support for copying selected call frame(s) in the Call Stack section (r259738)
    • Added a “Step” button that continues execution to the next expression in the current call frame (r260113)
    • Treated comma sub-expressions as separate statements to provide more intuitive formatting, additional breakpoint opportunities, and better stepping functionality (r259781, r259810)
  • Storage Tab
    • Provided a way to delete multiple localStorage or sessionStorage entries (r259744)
    • Allowed cookies to be set with no value (r259842)
    • Fixed an issue where cookies weren’t shown on pages that have subframes that have been denied access to cookies (r259649)
  • Console Tab
    • Ensured that long strings are not truncated when passed to console functions (r260091)
  • Search Tab
    • Added a setting that controls whether search field is populated with the current selection when using the global search shortcut ⇧⌘F (r259748)
  • Miscellaneous
    • Increased the auto-inspect debugger timeout delay to account for slower networks/devices (r259479)

April 23, 2020 08:00 PM

April 10, 2020

A Tour of Inline Caching with Delete

Surfin’ Safari

If you search for any JavaScript performance advice, a very popular recommendation is to avoid the delete operator. Today, this seems to be good advice, but why should it be vastly more expensive to delete a property than to add it?

The goal of my internship at Apple this winter was to improve the performance of the delete operator in JavaScriptCore. This has given me the opportunity to learn about how the various pieces of JavaScriptCore’s inline caching optimizations work, and we will take a quick tour of these optimizations today.

First, we will look at how expensive deletion really is in the major engines, and discuss why we might want to optimize it. Then, we will learn what inline caching is by implementing it for property deletion. Finally, we will look at the performance difference these optimizations make on benchmarks and microbenchmarks.

How expensive is Deletion?

First of all, why should we even bother optimizing deletion? Deletion should be fairly rare, and many JavaScript developers already know not to use it when possible. At the same time, we generally try to avoid having large hidden performance cliffs. To demonstrate how a simple delete statement can have a surprising effect on performance, I wrote a small benchmark that renders a scene progressively, and measures the time it takes to render each frame. This benchmark is not designed to precisely measure performance, but rather to make it easy to see large performance differences.

You can run the program yourself by pressing the run button below, which will calculate a new color value for every pixel in the image. It will then display how long it took to render, in milliseconds.

Next, we can try executing a single delete statement to a hot part of the code. You can do this by checking the “Use Delete” checkbox above, and clicking run again. This is what the code looks like:

class Point {
    constructor(x, y, z) {
        this.x = x
        this.y = y
        this.z = z
        this.a = 0
        if (useDelete)
            delete this.a

The following results were measured on my personal computer running Fedora 30, comparing tip of tree WebKit (r259643) with all of the delete optimizations both enabled and disabled. I also used Firefox 74 and Chromium 77 from the Fedora repositories.

Here, we can see that the addition of a delete statement to a hot section of code can send the performance off a cliff! The primary reason for this is that deletion in JavaScriptCore used to disable all of the inline caching optimizations for an object, including when putting and getting properties of the object. Let’s see why.

Structures and Transitions

JavaScript code can be extremely dynamic, making it tricky to optimize. While many other languages have fixed-size structs or classes to help the compiler make optimizations, objects in JavaScript behave like dictionaries that allow associating arbitrary values and keys. In addition, objects frequently change their shape over time. For performance, JavaScriptCore uses multiple internal representations for objects, choosing between them at runtime based on how a program uses them. The default representation of objects use something called a Structure to hold the general shape of an object, allowing many instances that have the same shape to share a structure. If two objects have the same structure ID, we can quickly tell that they have the same shape.

class A {
  constructor() {
    this.x = 5
    this.y = 10
let a = new A() // Structure: { x, y }
let b = new A() // same as a

Structures store the names, attributes (such as read-only), and offsets of all of the properties owned by the object, and often the object’s prototype too. While the structure stores the shape, the object maintains its property storage, and the offset gives a location inside this storage.

At a high level, adding a property to an object using this representation goes as follows:

1) Construct a new structure with the property added.
2) Place the value into the property storage.
3) Change the object’s structure ID to point to the new structure.

One important optimization is to cache these structure changes in something called a structure transition table. This is the reason why in the example above, both objects share the same structure ID rather than having separate but equivalent structure objects. Instead of creating a new structure for b, we can simply transition to the structure we already created for a.

This representation improves performance for the case when you have a small number of object shapes that change predictably, but it is not always the best representation. For example, if there are many properties that are unlikely to be shared with other objects, then it is faster to create a new structure for every instance of this object. Since we know that there is a 1:1 relationship between this object and its structure, we no longer have to update the structure ID when making changes to the shape of the object. This also means that we can no longer cache most property accesses, since these optimizations rely on checking the structure ID.

In previous versions of JavaScriptCore, this is the representation that was chosen for any object that had a deleted property. This is why we see such a large performance difference when a delete is performed on a hot object.

The first optimization of my internship was to instead cache deletion transitions, allowing these objects to continue to be cached.

class A {
  constructor() {
    this.x = 5
    this.y = 10
    delete this.y
let a = new A() // Structure: { x } from { x, y }
let b = new A() // same as a

Before this change, both a and b had distinct structures. Now they are the same. This not only removes our performance cliff, but will enable further optimizations to the deletion statement itself.

In the path tracing example above, we get the following performance results:

Inline Caching

Now that we can cache deletion transitions, we can further optimize the act of property deletion itself. Getting and putting properties both use something called an inline cache to do this, and now deletion does too. The way this works is by emitting a generic version of these operations that modifies itself over time to handle frequent cases faster.

I implemented this optimization in all three of our JIT compilers for delete, and our interpreter also supports inline caching. To learn more about how JavaScriptCore uses multiple compilers to trade-off latency and throughput, see this blog post. The summary is that code is first executed by the interpreter. Once it is sufficiently hot, it gets compiled into increasingly optimized pieces of machine code by our compilers.

An inline cache uses structures to keep track of the object shapes it has seen. We will see how inline caching now works for delete, which will give us some insight into how it works for getting and putting properties too.

When we see a delete property statement, we will first emit something called a patchable jump. This is a jump instruction that can be changed later on. At first, it will simply jump to some code that calls the slow path for property deletion. This slow path will store a record of this call, however, and after we have performed a few accesses, we will attempt to emit our first inline cache. Let’s walk through an example:

function test() {
    for (let i = 0; i < 50; ++i) {
        let o = { }
        if (i < 10)
            o.a = 1
        else if (i < 20)
            o.b = 1
            o.c = 1

        delete o.a

for (let i = 0; i < 100; ++i)

First, we run test a few times causing it to tier up to the baseline compiler. Inside test(), we see that the code deletes properties from objects with three different structures, while giving enough time for us to emit a new inline cache for each. Let’s see what the baseline code looks like:

          jmp [slow path] ; This jump is patchable
          [package up boolean result]
          ; falls through to rest of program
slow path:
          [call into C++ slow path]
          jmp [next bytecode]

This code performs a few checks, then jumps either to the slow path call or the patchable jump target. The slow path call is placed outside the main portion of the generated code to improve cache locality. Right now, we see that the patchable jump target also points to the slow path, but that will change.

At this point in time, deletion is a jump to the slow path plus a call into C++, which is quite expensive.

Every time the slow path call is made, it collects and saves information about its arguments. In this example, the first case that it decided to cache is the delete miss for the structure with shape { c }. It generates the following code, and repatches the patchable jump target to go here instead:

      0x7f3fa96ff6e0: cmp [structure ID for { c }], (%rsi)
      0x7f3fa96ff6e6: jnz [slow path]
      0x7f3fa96ff6ec: mov $0x1, %eax ; return value true
      0x7f3fa96ff6f1: jmp [continuation]
      0x7f3fa96ff6f6: jmp [slow path]

We see that if the structure ID matches, we simply return true and jump back to the continuation which is responsible for packaging up the boolean result and continuing execution. We save ourselves an expensive call into C++, running just a few instructions in its place.

Next, we see the following inline cache get generated as the engine decides to cache the next case:

      0x7f3fa96ff740: mov (%rsi), %edx
      0x7f3fa96ff742: cmp [structure ID for { c }], %edx
      0x7f3fa96ff748: jnz [case 2]
      0x7f3fa96ff74e: mov $0x1, %eax
      0x7f3fa96ff753: jmp [continuation]
case 2:
      0x7f3fa96ff758: cmp [structure ID for { a }], %edx
      0x7f3fa96ff75e: jnz [slow path]
      0x7f3fa96ff764: xor %rax, %rax; Zero out %rax
      0x7f3fa96ff767: mov %rax, 0x10(%rsi); Store to the property storage
      0x7f3fa96ff76b: mov [structure ID for { } from { a }], (%rsi)
      0x7f3fa96ff771: mov $0x1, %eax
      0x7f3fa96ff776: jmp [continuation]

Here, we see what happens if the property exists. In this case, we zero the property storage, change the structure ID, and return true (0x1), to the continuation for our result to be packaged up.

Finally, we see the complete inline cache:

      0x7f3fa96ff780: mov (%rsi), %edx
      0x7f3fa96ff782: cmp [structure ID for { c }], %edx
      0x7f3fa96ff788: jnz [case 2]
      0x7f3fa96ff78e: mov $0x1, %eax
      0x7f3fa96ff793: jmp [continuation]
case 2:
      0x7f3fa96ff798: cmp [structure ID for { a }], %edx
      0x7f3fa96ff79e: jnz [case 3]
      0x7f3fa96ff7a4: xor %rax, %rax
      0x7f3fa96ff7a7: mov %rax, 0x10(%rsi)
      0x7f3fa96ff7ab: mov $0x37d4, (%rsi)
      0x7f3fa96ff7b1: mov $0x1, %eax
      0x7f3fa96ff7b6: jmp [continuation]
case 3:
      0x7f3fa96ff7bb: cmp [structure ID for { b }], %edx
      0x7f3fa96ff7c1: jnz [slow path]
      0x7f3fa96ff7c7: mov $0x1, %eax
      0x7f3fa96ff7cc: jmp [continuation]

When we run the code above in different browsers, we get the following performance numbers with and without delete optimizations:

Inlining Delete

Now that we have seen how an inline cache is generated, we would like to see how we can feed back this profiling information into the compiler. We will see how our first optimizing compiler, the DFG, can attempt to inline delete statements like it already does for puts and gets. This will then allow all of the other optimizations we have to see inside the delete statement, rather than seeing a black box.

We will demonstrate this by looking at just one of these optimizations, our object allocation sinking and elimination phase. This phase attempts to prevent allocations of objects when they do not escape their scope, but it previously saw property deletion as an escape. Consider the following code:

function noEscape(i) {
    let foo = { i }
    delete foo.x
    return foo.i

for (let i = 0; i < 1000000; ++i)

As the delete is run, it will first get an inline cache. As the code continues to tier up, the DFG tier will eventually look at the cases covered by the inline cache and see only one structure. It will decide to try inlining the delete.

It first speculates that the structure ID of the target object is for the structure with shape { i }. That is, we check that the structure of the object matches, and if it does not, we exit to lower-tier code. This means that our engine can assume that the structures match for any subsequent code. If this assumption turns out to be false, we will eventually recompile the code again without it.

D@25:   NewObject()
D@30:   CheckStructure(D@25, Structure for { })
D@31:   PutByOffset(D@25, D@25, D@36, id0{i})
D@32:   PutStructure(D@25, Structure for { i })
D@36:   CheckStructure(D@25, Structure for { i })
D@37:   JSConstant(True)
D@42:   CheckStructure(D@25. Structure for { i })
D@43:   GetByOffset(D@25, 0)
D@34:   Return(D@43)

We see here that we make a new object, and then see inlined versions of put, delete and get. Finally, we return.

If the delete statement were compiled into a DeleteByID node, later optimizations could not do much to optimize this code. They don’t understand what effects the delete has on the heap or the rest of the program. We see that once inlined however, the delete statement actually becomes very simple:

D@36:   CheckStructure(D@25, Structure for { i })
D@37:   JSConstant(True)

That is, the delete statement becomes simply a check and a constant! Next, while it often isn’t possible to prove every check is redundant, in this example our compiler will be able to prove that is can safely remove them all. Finally, our object allocation elimination phase can look at this code and remove the object allocation. The final code for noEscape() looks like this:

D@36:   GetStack(arg1)
D@25:   PhantomNewObject()
D@34:   Return(D@36)

PhantomNewObject does not cause any code to be emitted, making our final code trivial! We get the following performance results:


The caching of delete transitions caused a 1% improvement overall on the speedometer benchmark. The primary reason for this is that the EmberJS Debug subtest uses the delete statement frequently, and delete transition caching progresses this benchmark by 6%. This subtest was included in the benchmark because it was discovered that many websites do not ship the release version of EmberJS. The other optimizations we discussed were all performance-neutral on the macro-benchmarks we track.

In conclusion, we have seen how inline caching works first-hand, and even progressed a few benchmarks in the process! The order that we learned about these optimizations (and indeed the order that I learned about them this term) follows closely the order that they were discovered too. If you would like to learn more, check out these papers on implementations of inline caching in smalltalk and self, two languages that inspired the design of JavaScript. We can see the evolution from monomorphic inline caches in smalltalk to polymorphic inline caches with inlining support in self, just like we saw on our tour today. Implementing inline caching for delete was an extremely educational experience for me, and I hope you enjoyed reading about it.

While deletion will still be rare, it still feels great to make JSC even more robust. You can get in touch by filing a bug or by joining the WebKit slack workspace. You can also consider downloading the code and hacking on it yourself!

April 10, 2020 05:11 PM

April 08, 2020

Web Animations in Safari 13.1

Surfin’ Safari

With the release of iOS 13.4, iPadOS 13.4, and Safari 13.1 in macOS Catalina 10.15.4, web developers have a new API at their disposal: Web Animations. We have been working on this feature for well over 2 years and it’s now available to all Safari users, providing a great programmatic API to create and control animations using JavaScript.

In this article we will discuss the benefits of this new API, how best to detect its availability, how it integrates with existing features such as CSS Animations and CSS Transitions, and the road ahead for animation technology in WebKit.

A Little History

The WebKit team came up with original proposals for CSS Animations and CSS Transitions back in 2007 and announced them on this blog. Over the years these specifications have matured and become W3C standards and an integral part of the web platform.

With these technologies, integrating animations in web content became simple, removing the requirement for developers to write JavaScript while providing better performance and power consumption by allowing browsers to use hardware acceleration when available and integrate animations in the layout and rendering pipeline.

As a web developer, I’ve enjoyed the simplicity and great performance of CSS Animations and CSS Transitions. I believe it is those virtues that have allowed animations to become a powerful tool for web developers. However, in my day-to-day work, I also found a few areas where these technologies were frustrating: dynamic creation, playback control, and monitoring an animation’s lifecycle.

The great news is that these issues are all taken care of by the new Web Animations API. Let’s see how to leverage this new API to improve everyday code in these areas.

Part I – Animation Creation

While CSS allows you to very easily animate a state change (the appearance of a button, for instance) it will be a lot trickier if the start and end values of a given animation are not known ahead of time. Typically, web developers would deal with those cases with CSS Transitions:

// Set the transition properties and start value.
element.style.transitionProperty = "transform";
element.style.transitionDuration = "1s";
element.style.transform = "translateX(0)";

// Force a style invalidation such that the start value is recorded.

// Now, set the end value.
element.style.transform = "translateX(100px)";

While this may look like a reasonable amount of code, there are additional factors to consider. Forcing a style invalidation will not let the browser perform that task at the time it will judge most appropriate. And this is just one single animation; what if another part of the page, possibly even a completely different JavaScript library, also needed to create an animation? This would multiply forced style invalidations and degrade performance.

And if you consider using CSS Animations instead, you would have to first generate a dedicated @keyframes rule and insert it inside a <style> element, failing to encapsulate what is really a targeted style change for a single element and causing a costly style invalidation.

The value of the Web Animations API lies in having a JavaScript API that preserves the ability to let the browser engine do the heavy lifting of running animations efficiently while enabling more advanced control of your animations. Using the Web Animations API, we can rewrite the code above with a single method call using Element.animate():

element.animate({ transform: ["translateX(0)", "translateX(100px)"] }, 1000);

While this example is very simple, the single Element.animate() method is a veritable Swiss Army knife and can express much more advanced features. The first argument specifies CSS values while the second argument specifies the animation’s timing. We won’t go into all possible timing properties, but all of the features of CSS Animations can be expressed using the Web Animations API. For instance:

    transform: ["translateX(500px)"], // move by 500px 
    color: ["red", "blue", "green"]   // go through three colors
}, {
    delay: 500,            // start with a 500ms delay
    easing: "ease-in-out", // use a fancy timing function 
    duration: 1000,        // run for 1000ms
    iterationCount: 2,     // repeat once
    direction: "alternate" // run the animation forwards and then backwards

Now we know how to create an animation using the Web Animations API, but how is it better than the code snippet using CSS Transitions? Well, that code tells the browser what to animate and how to animate it, but does not specify when. Now the browser will be able to process all new animations at the next most opportune moment with no need to force a style invalidation. This means that animations you author yourself as well as animations that may originate from a third-party JavaScript library – or even in a different document (for instance, via an <iframe>) – will all be started and progress in sync.

Part II – Playback Control

Another shortcoming with existing technologies was the lack of playback control: the ability to pause, resume, and seek animations and control their speed. While the animation-play-state property allows control of whether a CSS Animation is paused or playing, there is no equivalent for CSS Transitions and it only controls one aspect of playback control. If you want to set the current time of an animation, you can only resort to roundabout techniques such as clever manipulations of negative animation-delay values, and if you want to change the speed at which an animation plays, the only option is to manipulate the timing values.

With the Web Animations API, all these concerns are handled by dedicated API. For instance, we can manipulate playback state using the play() and pause() methods, query and set time using the read-write currentTime property, and control speed using playbackRate without modifying duration:

// Create an animation and keep a reference to it.
const animation = element.animate(…);

// Pause the animation.

// Change its current time to move forward by 500ms.
animation.currentTime += 500;

// Slow the animation down to play at half-speed.
animation.playbackRate = 0.5;

This gives developers control over the behavior of animations after they have been created. It is now trivial to perform tasks which would have been previously daunting. To toggle the playback state of an animation at the press of a button:

button.addEventListener("click", event => {
    if (animation.playState === "paused")

To connect the progress of an animation to an <input type="range"> element:

input.addEventListener("input", event => {
    animation.currentTime = event.target.value * animation.effect.getTiming().duration;

Thanks to the Web Animations API making playback control a core concept for animations, these simple tasks are trivial and more complex control over an animation’s state can be achieved.

Part III – Animation Lifecycle

While the transition* and animation* family of DOM events provide information about when CSS-originated animations start and end, it is difficult to use them correctly. Consider fading out an element prior to removing it from the DOM. Typically, this would be written this way using CSS Animations:

@keyframes fade-out {
    to { opacity: 0 }
element.style.animationName = "fade-out";
element.addEventListener("animationend", event => {

Seems correct, but on further inspection there are problems. This code will remove the element as soon as an animationend event is dispatch on the element, but since animation events bubble, the event could come from an animation completing in a child element in the DOM hierarchy, and the animations could even be named the same way. There are measures you can take to make this kind of code safer, but using the Web Animations API, writing this kind of code is not just easier but safer because you have a direct reference to an Animation object rather than working through animation events scoped to an element’s hierarchy. And on top of this, the Web Animations API uses promises to monitor the ready and finished state of animations:

let animation = element.animate({ opacity: 0 }, 1000);
animation.finished.then(() => {

Consider how complex the same task would have been if you wanted to monitor the completion of a number of CSS Animations targeting several elements prior to removing a shared container. With the Web Animations API and its support for promises this is now expressed concisely:

// Wait until all animations have finished before removing the container.
let animations = container.getAnimations();
Promise.all(animations.map(animation => animation.finished).then(() => {

Integration with CSS

Web Animations are not designed to replace existing technologies but rather to tightly integrate with them. You are free to use whichever technology you feel fits your use case and preferences best.

The Web Animations specification does not just define an API but also aims to provide a shared model for animations on the web; other specifications dealing with animations are defined with the same model and terminology. As such, it’s best to understand Web Animations as the foundation for animations on the web, and think of its API as well as CSS Transitions and CSS Animations as layers above that shared foundation.

What does this mean in practice?

To make a great implementation of the Web Animations API, we had to start off fresh with a brand new and shared animation engine for CSS Animations, CSS Transitions, and the new Web Animations API. Even if you don’t use the Web Animations API, the CSS Animations and CSS Transitions you’ve authored are now running in the new Web Animations engine. No matter which technology you choose, the animations will all run and update in sync, events dispatched by CSS-originated animation and the Web Animations API will be delivered together, etc.

But what may matter even more to authors is that the entire Web Animations API is available to query and control CSS-originated animations! You can specify animations in pure CSS but also control them with the Web Animations APIs using Document.getAnimations() and Element.getAnimations(). You can pause all animations running for a given document this way:

document.getAnimations().forEach(animation => animation.pause());

What about SVG? At this stage, SVG Animations remain distinct from the Web Animations model, and there is no integration between the Web Animations API and SVG. This remains an area of improvement for the Web platform.

Feature Detection

But before you start adopting this new technology in your projects, there are some further practical considerations that you need to be aware of.

Since this is new technology, it is important to use feature detection as users gradually update their browsers to newer versions with support for Web Animations. Detecting the availability of the various Web Animations API is simple. Here is one correct way to detect the availability of Element.animate():

if (element.animate) 
    element.animate(…); // Use the Web Animations API.
else// Fall back to other technologies.

While Safari is shipping the entire Web Animations API as a whole, other browsers, such as Firefox and Chrome, have been shipping Element.animate() for a long time already, so it’s critical to test individual features separately. So, if you want to use Document.getAnimations() to query all running animations for a given document, make sure to feature detect that feature’s availability. As such the snippet further above would be better written this way:

if (document.getAnimations)
   document.getAnimations().forEach(animation => animation.pause());
else// Fall back to another approach.

There are parts of the API that aren’t yet implemented in Safari. Notably, effect composition is not supported yet. Before trying to set the composite property when defining the timing of your animation, you can check whether it is supported this way:

const isEffectCompositionSupported = !!(new KeyframeEffect(null, {})).composite;

Animations in Web Inspector

Also new in the latest Safari release: CSS Animations and CSS Transitions can be seen in Web Inspector in the new Media & Animations timeline in the Timelines Tab, organized by animation-name or transition-property properties for each element target. When used alongside other timelines, it can be helpful to correlate how that particular CSS Animation or CSS Transition was created, such as by looking at script entries in the JavaScript & Events timeline.

Web Inspector Media and Animations Timeline

Starting in Safari Technology Preview 100, Web Inspector shows all animations, whether they are created by CSS or using the JavaScript API, in the Graphics Tab. It visualizes each individual animation object with lines for delays and curves for keyframes, and provides an in-depth view of exactly what the animation will do, as well as information about how it was created and some useful actions, such as logging the animation object in the Console. These are the first examples of what Web Animations allow to improve Web Inspector for working with animations, and we’re looking forward to improving our tools further.

Web Inspector Graphics Tab in Light Mode

The Road Ahead

Shipping support for Web Animations is an important milestone for animations in WebKit. Our new animation engine provides a more spec-compliant and forward-looking codebase to improve on. This is where developers like you come in: it’s important we hear about any compatibility woes you may run into with existing content using CSS Animations and CSS Transitions, but also when adopting the new Web Animations API in your content.

The transition to the new Web Animations engine allowed us to address known regressions and led to numerous progressions with Web Platform Tests, improving cross-browser compatibility. If you find your animations running differently in Safari, please file a bug report on bugs.webkit.org so that we can diagnose the issue and establish if it is an intentional change in behavior or a regression that we should address.

We’re already working on improving on this initial release and you can keep an eye out for future improvements by monitoring this blog and the release notes for each new Safari Technology Preview release.

You can also send a tweet to @webkit or @jonathandavis to share your thoughts on our new support for Web Animations.

April 08, 2020 08:00 PM

Release Notes for Safari Technology Preview 104

Surfin’ Safari

Safari Technology Preview Release 104 is now available for download for macOS Catalina and macOS Mojave. If you already have Safari Technology Preview installed, you can update in the Software Update pane of System Preferences on macOS.

This release covers WebKit revisions 258409-259472.

Web Inspector

  • Elements
    • Created a visual editor for box-shadow (r259170)
  • Network
    • Changed “Preserve Log” to be the last navigation item to be hidden at small widths (r258622)
    • Ensured that the method is escaped when using “Copy as cURL” (r259141)
  • Sources
    • If the hovered object is a DOM node, highlight it when hovering the title in the object preview popup (r258621)
  • Storage
    • Added support for editing cookies (r259173)
  • Console
    • Added logs for Intelligent Tracking Prevention Debug Mode and Ad Click Attribution Debug Mode in the Console (r259236)
    • Added a console message when legacy TLS is used (r258890, r258957)
  • Miscellaneous
    • Added a new WebSocket icon (r259329)
    • Added the keyboard shortcut for showing the Search Tab and Settings Tab to the titles of their respective tab tab items (r259101)
    • Fixed a bug where the tab bar thought it was too wide causing a tab bar item to be hidden (r258623)
    • Fixed a bug where the currently focused node was changed when detaching into a separate window (r259277)
    • Prevented disabled buttons from being focusable (r258730)


  • Avoided querying pasteboard strings while dragging content over a potential drop target (r258980)
  • Added label text to suggested values for a <datalist> element (r259330)
  • Fixed <datalist> dropdown suggestions table able to be scrolled too far (r259198)
  • Fixed a change event getting dispatched when a <textarea> gets changed without focus (r258532)
  • Fixed event listeners registered with the once option that get garbage collected too soon (r259009)
  • Fixed the name of X-Content-Type HTTP header in console logging (r258789)
  • Fixed a bug that could cause elements to disappear with combinations of transforms and overflow (r259015)
  • Fixed function passed to addEventListener may get garbage collected before the event listener is even added (r258959)
  • Prevented Force Touch preview on file:/// URL works while clicking on the URL is blocked (r259056)
  • Removed synchronous termination of service workers (r259383)
  • Sanitized the suggested download filename (r258741)
  • Updated Intl.NumberFormat.prototype.format to preserve a sign of -0 (r259370)
  • Updated to make sure a preflight fails if response headers are invalid (r258631)
  • Updated to consider the referrer-policy in the append Origin header algorithm (r259036)


  • Added support for :is() (r259261)
  • Fixed changes in grid or elements inside the grid affecting margin on other elements in the grid (r258735)

Web Animations

  • Marked promises as handled when rejected (r258702)
  • Fixed onwebkit{animation, transition}XX handlers missing from Document (r258697)

Intersection Observer

  • Fixed Intersection Observer intersections when zoomed in (r258787, r258791)


  • Changed HTMLTrackElement to be pending while it is waiting for LoadableTextTrack request (r259138)
  • Fixed animated PNG issue where it would play the frames one time more than the image loopCount (r258817)


  • Added initial support for WebRTC HEVC (r259452)
  • Applied video rotation at the source level if WebRTC sink ask so (r258504)
  • Fixed RTCRtpSender of kind video to have a null dtmf attribute (r258502)
  • Fixed audio failing to capture stream if the AudioSession gets interrupted (r258977)
  • Replaced the host candidate IP address in SDP with the corresponding mDNS name (r258545)
  • Supported inserting text or dictation alternative by simulating keyboard input (r258873)
  • Supported resolution of IPv6 STUN/TURN addresses (r259338)


  • Improved title and text used in prompts (r258961)

Bug Fixes

  • Fixed getting stuck in a loading state when seeking on hulu.com (r259404)

Safari extensions

  • Added support for restoring extension tabs across launches of Safari

April 08, 2020 05:00 PM

April 03, 2020

New WebKit Features in Safari 13.1

Surfin’ Safari

This year’s spring releases of Safari 13.1 for macOS Catalina, iPadOS, iOS, and watchOS bring a tremendous number of WebKit improvements for the web across Apple’s platforms. All of this with many more updates for improved privacy, performance, and a host of new tools for web developers.

Here’s a quick look at the new WebKit enhancements available with these releases.

Pointer and Mouse Events on iPadOS

The latest iPadOS 13.4 brings desktop-class pointer and mouse event support to Safari and WebKit. To ensure the best experience, web developers can use feature detection and adopt Pointer Events. Since a mouse or trackpad won’t send touch events, web content should not depend on touch events. Pointer Events will specify whether a mouse/trackpad or touch generated the event.

Web Animations API

These releases ship with support for the Web Animations API, a web standard offering developers a JavaScript API to create, query, and control animations, including direct control of CSS Animations and CSS Transitions. It offers a convenient unified model for programmatic animations, CSS Animations and Transitions. They can all now be directly controlled to pause, resume, seek, or change speed and direction, with less manual calculation. In addition, Web Inspector has been updated to show entries for them in the Media and Animations timeline.

Web Inspector Media and Animations Timeline

Read more about Web Animations in WebKit and Web Animations in Safari 13.1.

Async Clipboard API

WebKit brings the Async Clipboard API to this release of Safari. It provides access to the system clipboard and clipboard operations while keeping the webpage responsive. This API is much more flexible than DataTransfer, allowing developers to write multiple items with multiple types per item. Additionally, it brings programmatic paste to all websites on macOS and iOS.

The implementation is available through the navigator.clipboard API which must be called within user gesture event handlers like pointerdown or pointerup, and only works for content served in a secure context (e.g. https://). Instead of a permissions-based model for reading from the clipboard, a native UI is displayed when the page calls into the clipboard API; the clipboard can only be accessed if the user then explicitly interacts with the platform UI.

For more details see the original API specifications.

JavaScript Improvements

These releases include new JavaScript support for the replaceAll() method for strings and the new nullish coalescing operator (??).

The String.prototype.replaceAll() method does exactly what it suggests, replacing all occurrences of a given value in the string with a replacement string.

"too good to be true".replaceAll(" ", "-");
// too-good-to-be-true

Learn more from the String.prototype.replaceAll Proposal.

The nullish coalesing operator (??) is a new operator that only evaluates and returns the expression on the right of the ?? if the result of the expression on the left of the ?? is null or undefined.

const nullValue = null;
const resultWithNull = nullValue ?? "default";        // "default"

const nonNullValue = 0;
const resultWithNonNull = nonNullValue ?? "default";  // 0

For more details see the Nullish Coalescing for JavaScript proposal.


The addition of ResizeObserver in WebKit enables developers to design components that are responsive to the container instead of just the viewport. This allows more flexible responsive designs, where containers can react to window size changes, orientation changes, and additions of new elements to the layout. The JavaScript API avoids the circular dependencies of trying to use media queries for element sizes in CSS. ResizeObserver addresses the problem by providing a means to observe changes in the layout size of elements.

For more read about ResizeObserver in WebKit.

HTML enterkeyhint Attribute

On iOS, WebKit supports the enterkeyhint attribute that allows a content author to provide a label for the enter key on virtual keyboards with values for: enter, done, go, next, previous, search, and send.

See the HTML Standard for more information.

CSS Shadow Parts

New support for CSS Shadow Parts allows web authors to style parts of web components without the need to understand how they are constructed. This provides a mechanism to define author-defined style parts akin to input element’s pseudo elements in WebKit.

See the CSS Shadow Parts specification for more information.

More CSS Additions

There are a number of new CSS additions in WebKit. New font keywords are available for using platform-specific fonts including ui-serif, ui-sans-serif, ui-monospace, and ui-rounded . WebKit also supports the line-break: anywhere value that adds a soft wrap opportunity around each character unit, including around any punctuation or preserved white spaces, in the middle of words, even ignoring limits against line breaks. Finally, WebKit includes support for the dynamic-range media query allowing authors to create styles specific to display capabilities.

@media (dynamic-range: standard) {
    .example {
        /* Styles for displays not capable of HDR. */
        color: rgb(255, 0, 0);

@media (dynamic-range: high) {
    .example {
        /* Styles for displays capable of HDR. */
        color: color(display-p3 1 0 0);

Media APIs

Safari was the first to ship a picture-in-picture feature and has long supported the ability to specify a playback target for AirPlay. Safari for iOS and macOS now supports the standardizations of these features with the Picture-in-Picture API and Remote Playback API. There is also new support for HLS date-range metadata in DataCue.

Subtitles and Captions

WebKit is introducing enhancements to TextTrackCue for programmatic subtitle and caption presentation. This enables video publishers to continue storing captions in legacy or custom formats, and deliver them programmatically and still maintain the ability for users to control the presence and style of captions with system accessibility settings.

For more detail, see the WebKit TextTracks Explainer.

WebRTC Legacy Audio and Proxy Support

WebRTC support in WebKit has been updated so it can work in more places, with more systems. Support for DTMF allows WebKit to interact with legacy audio services. WebRTC Proxy support allows WebRTC to work in enterprise networks where firewalls may forbid UDP and require TCP to go through a dedicated proxy.

Performance Improvements

WebKit continues to deliver performance gains on benchmarks in these releases while also optimizing memory use. This release includes an 8-10% improvement on the Jetstream 2 benchmark. JavaScript Promises in particular showed a 2× improvement in the async-fs benchmark on JetStream 2. IndexedDB showed an improvement of 1.3× to 5× faster than before for most operations. There’s also faster Service Worker startup and more efficient CSS media query updates. Improved back-forward responsiveness helps history navigations feel snappier. Plus, a new Web Assembly interpreter dramatically improves startup time by around 8× for large WASM apps.

Security Improvements

WebKit has continued to harden security by fixing a number of bugs found through a process known as fuzzing. Following our announcement of deprecating TLS 1.0 and TLS 1.1 connections, this release now adds a “Not Secure” warning when connecting to a site where any resource is using either of these deprecated encryption protocols.

Intelligent Tracking Prevention Updates

There are several new enhancements to Intelligent Tracking Prevention including full third-party cookie blocking, cross-site document.referrers downgraded to their origins, and an expiry on non-cookie website data after seven days of Safari use and no user interaction on the website.

Read the “Full Third-Party Cookie Blocking and More” blog post for details.

Web Platform Quality Improvements

Areas of improved standards compliance and browser interoperability include more compatible gradient and position parsing, color component rounding, new support for the Q unit, and better calc() computed styles.

Web Inspector Updates

Web Inspector in Safari 13.1 includes new debugging experiences and adds several new tools to help web developers test functionality or identify issues.

Sources Tab

A new Sources Tab combines the Resources Tab and Debugger Tab into a single view, keeping more critical information in one place without the need to switch back and forth. Among the improvements, it includes improved support for debugging workers and has new JavaScript breakpoints, such as pausing on All Events or on All Microtasks.

Also new in the Sources Tab, developers can create use the “+” button in the lower left of the navigation sidebar to add an Inspector Bootstrap Script or Local Override. The Inspector Bootstrap Scripts is a snippet of JavaScript that is guaranteed to be the first script evaluated after any new global object is created in any page or sub-frame, regardless of URL, so long as Web Inspector is open. A Local Override can be added to override any resource loaded into the page, giving developers the ability to change files and preview those the changes on pages that they might ordinarily not be able to change.

Both the Sources Tab and the Network Tab also benefit from improved displaying of HTML and XML content, including being able to pretty print or viewing any request/response data as a simulated DOM tree.

Layers Tab

The Layers Tab is also newly available in this release. It provides a 3D visualization and complete list of the rendering layers used to display the page. It also includes information like layer count and the memory cost of all the layers, both of which can help point developers to potential performance problems.

Read the “Visualizing Layers in Web Inspector” blog post for details.

Script Blackboxing

Script Blackboxing is another powerful tool, focused on helping developers debug behaviors built on top of a JavaScript library or framework. By setting up a blackbox for any library or framework script, the debugger will ignore any pauses that would have happened in that script, instead deferring the pause until JavaScript execution continues to a statement outside of the blackboxed script.

Redesigned Color Picker

Other additions to Web Inspector give content authors more insight for design and user experience. A redesigned color picker uses a square design for more precise color selection and includes support for wide-gamut colors with a white guide line that shows the edge of sRGB to Display-P3 color space.

Learn more from the “Wide Gamut Color in CSS with Display-P3” blog post.

Customized AR QuickLook

AR QuickLook Custom HTML BannerIn Safari on iOS 13.3 or later, users can launch an AR experience from the web where content authors can customize a banner that overlays the AR view. It’s possible to customize:

  • Apple Pay button styles
  • Action button label
  • An item title
  • Item subtitle
  • Price

Or, authors can create an entirely custom banner with HTML:


For more information, read about Adding an Apple Pay Button or a Custom Action in AR Quick Look.


These improvements are available to users running watchOS 6.2, iOS and iPadOS 13.4, macOS Catalina 10.15.4, macOS Mojave 10.14.6 and macOS High Sierra 10.13.6. These features were also available to web developers with Safari Technology Preview releases. Changes in this release of Safari were included in the following Safari Technology Preview releases: 90, 91, 92, 93, 94, 95, 96, 97, 98. Download the latest Safari Technology Preview release to stay on the forefront of future web platform and Web Inspector features. You can also use the WebKit Feature Status page to watch for changes to your favorite web platform features.

Send a tweet to @webkit or @jonathandavis to share your thoughts on this release.. If you run into any issues, we welcome your bug reports for Safari, or WebKit bugs for web content issues.

April 03, 2020 05:00 PM

March 26, 2020

Release Notes for Safari Technology Preview 103

Surfin’ Safari

Safari Technology Preview Release 103 is now available for download for macOS Catalina and macOS Mojave. If you already have Safari Technology Preview installed, you can update in the Software Update pane of System Preferences on macOS.

This release covers WebKit revisions 257162-258409.

Web Inspector

  • Merged the toolbar and tab bar to save vertical space (r257759, r257765, r257810)
  • Redesigned resource and action identifier icons (r257753, r257757, r257791, r258039)
  • Allowed the use of dark mode theme independently from the system-wide theme (r257620, r257801, r257833)
  • Annotated tabs so that they are properly recognized as such (r257965)
  • Changed to not re-cycle through items in the Styles or Computed details sidebar panel when pressing tab (r257959)
  • Fixed clicking a button navigation item to focus it, allowing for subsequent keyboard navigation (r257411)
  • Supported expanding and collapsing details sections with the spacebar or “enter” key (r258058)
  • Supported cycling through scope bar items by pressing tab (r258057)


  • Aligned garbage collection for XMLHttpRequest objects with the specification (r258159)
  • Aligned Fetch ‘request Origin header’ behavior with the specification (r258194)
  • Changed the case of an activating service worker getting terminated to go to an activated state (r257929)
  • Changed to load async scripts with a low priority (r257566)
  • Changed to accept a Document as an explicit root (r257976)
  • Implemented wildcard behavior for Cross-Origin-Expose-Headers (r258330)


  • Made the style invalidation accurate for user-action pseudo classes (r258321)
  • Changed to avoid full style resolution on Element::focus() (r257839, r257846)

Page loading

  • Changed fixed sized SVG content to be taken into account when computing visually not empty status (r257952)
  • Changed layers going from visually empty to non-empty to immediately trigger layer unfreezing (r257840)

Back-Forward Cache

  • Added quirk to disable to back-forward cache on docs.google.com (r257714)


  • Updated custom element caching to be aware of different worlds (r257414)

Bug Fixes

  • Fixed leaking DocumentTimeline and CSSTransition objects on CNN.com (r257417)
  • Fixed icloud.com notes text in titles and headings is distorted (r258282)
  • Fixed maps.google.com not loading properly with screen flickering when zooming (r257716)

March 26, 2020 05:00 PM

March 24, 2020

Full Third-Party Cookie Blocking and More

Surfin’ Safari

This blog post covers several enhancements to Intelligent Tracking Prevention (ITP) in iOS and iPadOS 13.4 and Safari 13.1 on macOS to address our latest discoveries in the industry around tracking.

Full Third-Party Cookie Blocking

Cookies for cross-site resources are now blocked by default across the board. This is a significant improvement for privacy since it removes any sense of exceptions or “a little bit of cross-site tracking is allowed.”

It might seem like a bigger change than it is. But we’ve added so many restrictions to ITP since its initial release in 2017 that we are now at a place where most third-party cookies are already blocked in Safari. To keep supporting cross-site integration, we shipped the Storage Access API two years ago to provide the means for authenticated embeds to get cookie access with mandatory user control. It is going through the standards process in the W3C Privacy Community Group right now.

Regardless of the size of this change, there are further benefits, as explored below.

Paves the Way For Other Browsers

Safari continues to pave the way for privacy on the web, this time as the first mainstream browser to fully block third-party cookies by default. As far as we know, only the Tor Browser has featured full third-party cookie blocking by default before Safari, but Brave just has a few exceptions left in its blocking so in practice they are in the same good place. We know Chrome wants this behavior too and they announced that they’ll be shipping it by 2022.

We will report on our experiences of full third-party cookie blocking to the privacy groups in W3C to help other browsers take the leap.

Removes Statefulness From Cookie Blocking

Full third-party cookie blocking removes statefulness in cookie blocking. As discussed in our December 2019 blog post, the internal state of tracking prevention could be turned into a tracking vector. Full third-party cookie blocking makes sure there’s no ITP state that can be detected through cookie blocking behavior. We’d like to again thank Google for initiating this analysis through their report.

Disables Login Fingerprinting

As discussed by Jeremiah Grossman back in 2008 and Tom Anthony in 2012, and set up by Robin Linus in 2016 as a live demo with which you can test your browser, this technique allows a website to invisibly detect where you are logged in and is viable in any browser without full third-party cookie blocking.

Since “global browser state” has been top of mind in the web privacy community as of late, we’d like to point out that cookies themselves are global state and unless the browser blocks or partitions them in third-party contexts, they allow for cross-site leakage of user information such as login fingerprinting.

Additional Benefits

In addition, there are further benefits to full third-party cookie blocking:

  • Disables cross-site request forgery attacks against websites through third-party requests. Note that you still need to protect against forged requests that come in through top frame navigations (see SameSite cookies for guidance).
  • Removes the ability to use an auxiliary third-party domain to identify users. Such a setup could otherwise persist IDs even when users delete website data for the first party.
  • Simplifies things for developers. Now it’s as easy as possible: If you need cookie access as third-party, use the Storage Access API.

What About the Classifier?

ITP’s classifier keeps working to detect bounce trackers, tracker collusion, and link decoration tracking.

Developer Guidance

If yours is among the few websites that still relies on third-party cookies in Safari and has not been affected by ITP in its previous iterations, here’s how you can make things work for your users:

Option 1: OAuth 2.0 Authorization with which the authenticating domain (in your case, the third-party that expects cookies) forwards an authorization token to your website which you consume and use to establish a first-party login session with a server-set Secure and HttpOnly cookie.

Option 2: The Storage Access API with which the third-party can request permission to get access to its first-party cookies.

Option 3: The temporary compatibility fix for popups, see section “Temporary Compatibility Fix: Automatic Storage Access for Popups” in our ITP 2.0 blog post. This compatibility fix allows the third-party to open a popup from your website and upon a tap or click in that popup gain temporary cookie access under the opener page on your website. Note that this compatibility fix will go away in a future version of Safari so only go this route if it saves you time and allows for a graceful transition period.

Cookie Blocking Latch Mode

The original release of ITP featured what we call “cookie blocking latch mode.” It means once a request is blocked from using cookies, all redirects of that request are also blocked from using cookies. Back in 2017 we got a request to allow cookie blocking to open and close on redirects and implemented that behavior. But with full third-party cookie blocking in place, latch mode is back.

7-Day Cap on All Script-Writeable Storage

Back in February 2019, we announced that ITP would cap the expiry of client-side cookies to seven days. That change curbed third-party scripts’ use of first-party cookies for the purposes of cross-site tracking.

However, as many anticipated, third-party scripts moved to other means of first-party storage such as LocalStorage. If you have a look at what’s stored in the first-party space on many websites today, it’s littered with data keyed as various forms of “tracker brand user ID.” To make matters worse, APIs like LocalStorage have no expiry function at all, i.e. websites cannot even ask browsers to put a limit on how long such storage should stay around.

Now ITP has aligned the remaining script-writable storage forms with the existing client-side cookie restriction, deleting all of a website’s script-writable storage after seven days of Safari use without user interaction on the site. These are the script-writable storage forms affected (excluding some legacy website data types):

  • Indexed DB
  • LocalStorage
  • Media keys
  • SessionStorage
  • Service Worker registrations and cache

A Note On Web Applications Added to the Home Screen

As mentioned, the seven-day cap on script-writable storage is gated on “after seven days of Safari use without user interaction on the site.” That is the case in Safari. Web applications added to the home screen are not part of Safari and thus have their own counter of days of use. Their days of use will match actual use of the web application which resets the timer. We do not expect the first-party in such a web application to have its website data deleted.

If your web application does experience website data deletion, please let us know since we would consider it a serious bug. It is not the intention of Intelligent Tracking Prevention to delete website data for first parties in web applications.

Cross-Site document.referrer Downgraded to Origin

All cross-site document.referrers are downgraded to their origin. This matches the already downgraded cross-site referrer request headers.

Detection of Delayed Bounce Tracking

Some trackers have started to delay their navigational redirects, probably to evade ITP’s bounce tracking detection. This manifests as the webpage disappearing and reloading shortly after you land on it. We’ve added logic to cover such delayed bounce tracking and detect them just like instant bounces.

Testing Your Website

We encourage all developers to regularly test their websites with Safari Technology Preview (STP) and our betas of iOS, iPadOS, and macOS. Major changes to ITP and WebKit in general are included in the betas and STP, typically months before shipping. An easy way to stay ahead of the changes is to use STP as a daily development browser. This gives you access to the latest developer tools and helps you discover unexpected behavior in your websites with each release. If you come across bugs or breakage, please file an open source bug report.

March 24, 2020 05:59 PM

March 16, 2020

Víctor Jáquez: Review of the Igalia Multimedia team Activities (2019/H2)

Igalia WebKit

This blog post is a review of the various activities the Igalia Multimedia team was involved along the second half of 2019.

Here are the previous 2018/H2 and 2019/H1 reports.


Succinctly, GstWPE is a GStreamer plugin which allows to render web-pages as a video stream where it frames are GL textures.

Phil, its main author, wrote a blog post explaning at detail what is GstWPE and its possible use-cases. He wrote a demo too, which grabs and previews a live stream from a webcam session and blends it with an overlay from wpesrc, which displays HTML content. This composited live stream can be broadcasted through YouTube or Twitch.

These concepts are better explained by Phil himself in the following lighting talk, presented at the last GStreamer Conference in Lyon:

Video Editing

After implementing a deep integration of the GStreamer Editing Services (a.k.a GES) into Pixar’s OpenTimelineIO during the first half of 2019, we decided to implement an important missing feature for the professional video editing industry: nested timelines.

Toward that goal, Thibault worked with the GSoC student Swayamjeet Swain to implement a flexible API to support nested timelines in GES. This means that users of GES can now decouple each scene into different projects when editing long videos. This work is going to be released in the upcoming GStreamer 1.18 version.

Henry Wilkes also implemented the support for nested timeline in OpenTimelineIO making GES integration one of the most advanced one as you can see on that table:

Single Track of Clips ✔ ✔ ✔ ✔ ✔ W-O ✔ ✔
Multiple Video Tracks ✔ ✖ ✔ ✔ ✔ W-O ✔ ✔
Audio Tracks & Clips ✔ ✔ ✔ ✔ ✔ W-O ✔ ✔
Gap/Filler ✔ ✔ ✔ ✔ ✔ ✔ ✖ ✔
Markers ✔ ✔ ✔ ✔ ✖ N/A ✖ ✔
Nesting ✔ ✖ ✔ ✔ ✔ W-O ✔ ✔
Transitions ✔ ✔ ✖ ✖ ✔ W-O ✖ ✔
Audio/Video Effects ✖ ✖ ✖ ✖ ✖ N/A ✖ ✔
Linear Speed Effects ✔ ✔ ✖ ✖ R-O ✖ ✖ ✖
Fancy Speed Effects ✖ ✖ ✖ ✖ ✖ ✖ ✖ ✖
Color Decision List ✔ ✔ ✖ ✖ ✖ ✖ N/A ✖

Along these lines, Thibault delivered a 15 minutes talk, also in the GStreamer Conference 2019:

After detecting a few regressions and issues in GStreamer, related to frame accuracy, we decided to make sure that we can seek in a perfectly frame accurate way using GStreamer and the GStreamer Editing Services. In order to ensure that, an extensive integration testsuite has been developed, mostly targeting most important container formats and codecs (namely mxf, quicktime, h264, h265, prores, jpeg) and issues have been fixed in different places. On top of that, new APIs are being added to GES to allow expressing times in frame number instead of nanoseconds. This work is still ongoing but should be merged in time for GStreamer 1.18.

GStreamer Validate Flow

GstValidate has been turning into one of the most important GStreamer testing tools to check that elements behave as they are supposed to do in the framework.

Along with our MSE work, we found that other way to specify tests, related with produced buffers and events through specific pads, was needed. Thus, Alicia developed a new plugin for GstValidate: Validate Flow.

Alicia gave an informative 30 minutes talk about GstValidate and the new plugin in the last GStreamer Conference too:

GStreamer VAAPI

Most of the work along the second half of 2019 were maintenance tasks and code reviews.

We worked mainly on memory restrictions per backend driver, and we reviewed a big refactor: internal encoders now use GstObject, instead of the custom GstVaapiObject. Also we reviewed patches for new features such as video rotation and cropping in vaapipostproc.

Servo multimedia

Last year we worked integrating media playing in Servo. We finally delivered hardware accelerated video playback in Linux and Android. We worked also for Windows and Mac ports but they were not finished. As natural, most of the work were in servo/media crate, pushing code and reviewing contributions. The major tasks were to rewrite the media player example and the internal source element looking to handle the download playbin‘s flag properly.

We also added WebGL integration support with <video> elements, thus webpages can use video frames as WebGL textures.

Finally we explored how to isolate the multimedia processing in a dedicated thread or process, but that task remains pending.

WebKit Media Source Extension

We did a lot of downstream and upstream bug fixing and patch review, both in WebKit and GStreamer, for our MSE GStreamer-based backend.

Along this line we improved WebKitMediaSource to use playbin3 but also compatibility with older GStreamer versions was added.

WebKit WebRTC

Most of the work in this area were maintenance and fix regressions uncovered by the layout tests. Besides, the support for the Rasberry Pi was improved by handling encoded streams from v4l2 video sources, with some explorations with Minnowboard on top of that.


GStreamer Conference

Igalia was Gold sponsor this last GStreamer Conference held in Lyon, France.

All team attended and five talks were delivered. Only Thibault presented, besides the video editing one which we already referred, another two more: One about GstTranscoder API and the other about the new documentation infrastructure based in Hotdoc:

We also had a productive hackfest, after the conference, where we worked on AV1 Rust decoder, HLS Rust demuxer, hardware decoder flag in playbin, and other stuff.

Linaro Connect

Phil attended the Linaro Connect conference in San Diego, USA. He delivered a talk about WPE/Multimedia which you can enjoy here:


Charlie attended Demuxed, in San Francisco. The conference is heavily focused on streaming and codec engineering and validation. Sadly there are not much interest in GStreamer, as the main focus is on FFmpeg.


Phil and I attended the last RustFest in Barcelona. Basically we went to meet with the Rust community and we attended the “WebRTC with GStreamer-rs” workshop presented by Sebastian Dröge.

By vjaquez at March 16, 2020 03:20 PM

March 05, 2020

Release Notes for Safari Technology Preview 102

Surfin’ Safari

Safari Technology Preview Release 102 is now available for download for macOS Catalina and macOS Mojave. If you already have Safari Technology Preview installed, you can update in the Software Update pane of System Preferences on macOS.

This release covers WebKit revisions 256576-257162.

Web Inspector

  • Fixed VoiceOver to read the selected panel tab (r256652)
  • Updated resource, type, and timeline icons for both light and dark modes (r256774, r257043)


  • Changed the disk cache policy to allow resources larger than 10MB to be cached (r257041)
  • Defered execution of async scripts until the document is loaded (r256808)
  • Fixed value sanitization for input[type=text] to not truncate the value at a control character (r257132)
  • Fixed new FontFace() to not throw when failing to parse arguments (r256659)
  • Implemented EventTarget constructor (r256716)
  • Set User-Agent in preconnect requests (r256912)


  • Improved the speed of index cursor iteration when there are a lot of index records from different object stores (r256738)
  • Changed to prefetch cursor records on client side (r256621)

Apple Pay

  • Added support for Apple Pay buttons with custom corner radii (r256648)

Web Animations

  • Ensured CSS Transition and CSS Animation events are queued, sorted and dispatched by their timeline (r256619)
  • Ensured animations that lose their effect don’t schedule an animation update (r256623)
  • Fixed repeated animations on pseudo elements failing to run after a while (r257138)
  • Fixed style changes due to Web Animations to not trigger CSS Transitions (r256627)


  • Improved performance of track sizing algorithm for spanning items (r256826)


  • Changed to not fire timers when there is a pending rendering update (r256853)
  • Fixed a white flash that can occur if JavaScript forces an early layout (r256577)

Web Driver

  • Fixed Automation.setWindowFrameOfBrowsingContext to accept negative origin values (r257042)

March 05, 2020 06:15 PM

March 02, 2020

Wide Gamut Color in CSS with Display-P3

Surfin’ Safari

Display-P3 color space includes vivid colors that aren’t available in sRGB.

sRGB versus Display-P3

CSS Color Module Level 4 introduced syntax to use Display-P3 color space on the web:

color: color(display-p3 1 0.5 0)

The previously available syntax defined colors in sRGB color space. hsl(42, 70%, 50%), rgb(3, 5, 11), #abc — all of these colors are in the sRGB color space.

Display-P3 is a superset of sRGB. It’s around 35% larger:

sRGB outline

The white line shows the edge of sRGB. Everything on its top right is Display-P3 colors not available in sRGB. Note how greens is greatly expanded while blues aren’t nearly as much.

Browser support

WebKit has had support for Display-P3 color since 2016 (r207442). The following browsers support Display-P3 color:

  • Safari on macOS Mojave and newer
  • Safari on iOS 11 and newer

WebKit is the only browser engine that supports Display-P3 color as of January 2020.

Graceful degradation

One way to provide a fallback is to include the same property with the sRGB color before:

header {
    color: rgb(0, 255, 0);
    color: color(display-p3 0 1 0);

Browsers other than WebKit currently parse color(...) as invalid value. CSS properties with invalid values are ignored by the browsers.

Alternatively, you can use @supports feature query. This is particularly useful when defining variables with colors:

/* sRGB color. */
:root {
    --bright-green: rgb(0, 255, 0);

/* Display-P3 color, when supported. */
@supports (color: color(display-p3 1 1 1)) {
    :root {
        --bright-green: color(display-p3 0 1 0);

header {
    color: var(--bright-green);

Hardware support

  • iPhone 7 and newer
  • MacBook Pro (since 2016)
  • iMac (since 2015)
  • iPad Pro (since 2016)
  • LG UltraFine 5K Display

There are also numerous devices that support Display-P3 color space but currently have no browsers that support Display-P3 in CSS:

  • Google Pixel 2 XL
  • Google Pixel 3
  • HTC U11+
  • OnePlus 6

More devices that support Display-P3 are listed on Wikipedia.

Hardware support can be detected with a media query in CSS:

@media (color-gamut: p3) {
    /* Do colorful stuff. */

And JavaScript:

if (window.matchMedia("(color-gamut: p3)").matches) {
    // Do colorful stuff.

Web Inspector

Starting Safari Technology Preview 97, Web Inspector includes P3-capable color picker:

The white line draws the edge sRGB color space. All colors on the top right of it are only available in Display-P3 color space.

Right-clicking the color square shows an option to convert to sRGB color space:

Clamp to sRGB

When the color is within sRGB color space, “Convert to sRGB” menu item is displayed. When it outside — “Clamp to sRGB”.

Web Inspector also includes context menus to convert sRGB colors to Display-P3:

Convert to Display-P3

Closing thoughts

CSS has syntax to define colors in Display-P3 color space, which includes vivid colors previously not available in sRGB. Many modern displays cover 100% of the P3 color standard. Web Inspector now includes P3-capable color picker.

You can start using Display-P3 colors on your websites and web views today. It only takes a couple of lines of code to provide a backward compatible sRGB color.

If you have any feedback, reach me out on Twitter. You can also send general comments to the @webkit Twitter account.

Further reading

Note: Learn more about Web Inspector from the Web Inspector Reference documentation.

March 02, 2020 06:36 PM

Žan Doberšek: Flatpak repository for WPE

Igalia WebKit

To let developers play with the WPE stack, we have set up a Flatpak repository containing all the necessary bits to start working with it. To install applications (like Cog, the very simple WPE launcher), first add the remote repository, and proceed with the following instructions:

$ flatpak --user remote-add wpe-releases --from https://software.igalia.com/flatpak-refs/wpe-releases.flatpakrepo
$ flatpak --user install org.wpe.Cog
$ flatpak run org.wpe.Cog -P fdo <url>

Currently the 2.26 release of the WPE port is used, along with libwpe 1.4.0, WPEBackend-fdo 1.4.0 and Cog 0.4.0. Upgrades to the newer releases (happening in next few weeks) will be done in the next month or two. Builds are provided for x86_64, arm and aarch64 architectures.

If you need ideas or inspiration on how to use WPE, this repository also contains GstWPEBroadcastDemo, an application that showcases both GStreamer and WPE, enabling you to mix live video input with HTML content that can be updated on-the-fly. You can read more about this in the blog post made by Philippe Normand.

The current Cog/WPE stack still imposes the Wayland-only limitation, with Mesa-based graphics stacks most likely to work well. In future release, we plan to add support for new platforms, graphics stacks and methods of integration.

All of this is still in very early stages. If you find an issue with the applications or libraries in the repository, please do not hesitate to report it to our issue tracker. The issues will be rerouted to the trackers of the problematic component if necessary.

By Žan Doberšek at March 02, 2020 09:00 AM

February 19, 2020

Release Notes for Safari Technology Preview 101

Surfin’ Safari

Safari Technology Preview Release 101 is now available for download for macOS Catalina and macOS Mojave. If you already have Safari Technology Preview installed, you can update in the Software Update pane of System Preferences on macOS.

This release covers WebKit revisions 255473-256576.

Web Inspector

  • Added a special breakpoint for controlling whether debugger statements pause in the Sources tab (r255887)
  • Changed to encode binary web socket frames using base64 (r256497)
  • Fixed elements closing tag showing reversed in RTL mode (r256374)
  • Fixed the bezier editor popover to be strictly LTR (r255886)
  • Fixed dragging handles in the easing popover selecting sidebar text (r255888)
  • Updated some cookie table column headers to not be localizable (r255896)


  • Corrected TextTrack sorting with invalid BCP47 language (r255997)
  • Fixed AirPlay sometimes stopping after 60 minutes of playback (r255581)

Apple Pay

  • Redacted billing contact during payment method selection (r256071)


  • Added support for BigInt literal as PropertyName (r256541)

Web Animations

  • Fixed accelerated animations freezing on a render tree rebuild (r255663)
  • Fixed an event loop cycle between an animation finishing and it being removed from GraphicsLayerCA (r256181)
  • Fixed an issue where out-of-view transitions could trigger high memory use (r256095)
  • Prevented playing an accelerated animation that was canceled before it was committed (r255810)


  • Changed authenticatorGetAssertion to be sent without pinAuth if user verification is discouraged (r256001)


  • Aligned getDisplayMedia() with standards specifications (r256034)
  • Fixed not processing newly gathered ICE candidates if the document is suspended (r256009)


  • Fixed CSS rules with the same selector from several large stylesheets getting applied in the wrong order (r255671)


  • Fixed pages that trigger a redirect sometimes getting left blank (r256452)


  • Disallowed setting base URL to a data or JavaScript URL (r256191)
  • Fixed highlight text decorations to work with all decoration types and colors (r256451)
  • Implemented OffscreenCanvas.copiedImage (r256505)
  • Added standard gamepad mapping for GameControllerGamepads (r256215)
  • Tightened up stylesheet loading (r255693)
  • Fixed quantifiers after lookahead assertions to be syntax errors in Unicode patterns only (r255689)
  • Fixed \0 identity escapes to be syntax errors in Unicode patterns only (r255584)


  • Fixed iteration of cursors skipping records if deleted (r256414)

Back-forward Cache

  • Updated to remember if legacy TLS was used in the back-forward cache (r256073)

February 19, 2020 09:10 PM

February 05, 2020

Release Notes for Safari Technology Preview 💯

Surfin’ Safari

Safari Technology Preview Release 100 is now available for download for macOS Catalina and macOS Mojave. If you already have Safari Technology Preview installed, you can update in the Software Update pane of System Preferences on macOS.

This release covers WebKit revisions 254696-255473.

Web Inspector

  • Added links to Web Inspector Reference documentation (r254730)
  • Renamed the Canvas Tab to be the Graphics Tab, and included basic information and graphical representations of all Web Animation objects that exist in the inspected page (r255396)
  • Allowed developers to evaluate arbitrary JavaScript in isolated worlds created by Safari App Extensions via the execution context picker in the Console (r255191)

Web Animations

  • Added support for the options parameter to getAnimations() (r255149)
  • Changed animations to run accelerated even if other animations targeting the same element are not accelerated (r255383)
  • Fixed changing the delay of an accelerated animation to correctly seek (r255422)
  • Fixed a leak of CSS Animations when removing its animation-name property (r255371)
  • Separated setting a timeline’s current time from updating its animations (r255260)
  • Updated all DocumentTimeline objects when updating animations (r255141)


  • Fixed User Verification (UV) option present on a CTAP2 authenticatorMakeCredential while the authenticator has not advertised support for it (r254710)


  • Added support for allow="fullscreen" feature policy (r255162)
  • Changed EME to only emit an array of persistent-usage-records when more than one record is discovered (r254896)
  • Corrected VTT Cue Style handling to match the specification (r255151, r255227)
  • Fixed decoder glitches when watching videos on CNN.com (r254761)
  • Fixed AirPlay placard not visible when AirPlay is entered in fullscreen mode (r255103)
  • Fixed video sound sometimes continuing to play in page cache (r254814)
  • Fixed HTMLMediaElement to not remove the media session at DOM suspension time (r255116)


  • Added finite timeout when synchronously terminating a service worker (r254706)
  • Fixed :matches() to correctly combine with pseudo elements (r255059)
  • Fixed automatic link replacement via “Smart links” to emit insertLink input events (r254945)
  • Disabled Service Workers before terminating an unresponsive service worker process (r255438)
  • Implemented “create a potential-CORS request” (r254821)
  • Implemented transferable property of OffscreenCanvas (r255315)
  • Improved performance speed of index records deletion in IndexedDB (r255318)
  • Made pasteboard markup sanitization more robust (r254800)
  • Used Visible Position to calculate Positions for highlights (r254785)


  • Fixed EXIF orientation ignored for some CSS images (r254841)
  • Fixed elements no longer stay fixed with elastic overscroll (r255037)


  • Added support for MediaRecorder.requestData (r255085)


  • Fixed DateMath to accept more ISO-8601 timezone designators even if they are not included in ECMA262 to produce expected results in the wild code (r254939)


  • Implemented sub-source texImage2D and texSubImage2D (r255316)

February 05, 2020 09:10 PM

January 24, 2020

ResizeObserver in WebKit

Surfin’ Safari

For years now, web developers have desired the ability to design components that are responsive to their container instead of the viewport. Developers are used to using media queries against viewport width for responsive designs, but having media queries based on element sizes is not possible in CSS because it could result in circular dependencies. Thus, a JavaScript solution was required.

ResizeObserver was introduced to solve this problem, allowing authors to observe changes to the layout size of elements. It was first made available in Chrome 64 in January 2018, and it’s now in Safari Technology Preview releases (and Epiphany Technology Preview). ResizeObserver was enabled by default as of Safari Technology Preview 97.

API Overview

A script creates a ResizeObserver with a callback which will be called with ‘observations’, and registers/unregisters callbacks using .observe(element), and .unobserve(element). Each call to observe(element) adds that element to the set of elements observed by this ResizeObserver instance.

The callback provided to the constructor is called with a collection of observerEntries which contain data about the state of CSS boxes being observed, if those boxes actually changed size. The observer itself also has a .disconnect() method which stops the active delivery of observed changes to the callback. Here’s a simple example:

const callback = (entries) => {
  console.log(`${entries.length} resize observations happened`)
  Array.from(entries).forEach((entry) => {
    let rect = entry.contentRect;
      `size is now ${rect.width}w x ${rect.height}h`

const myObserver = new ResizeObserver(callback)


What we are observing with ResizeObserver is changes to the size of CSS Boxes that we have observed. Since we previously had no information on these boxes before observing, and now we do, this creates an observable effect. Assuming that targetElementA and targetElementB are in the DOM, we will see a log saying that 2 resize observations happened, and providing some information about the elements and sizes of each. It will look something like:

"2 resize observations happened"
"<div class='a'>a</div>" "size is now 1385w x 27h"
"<div class='b'>b</div>" "size is now 1385w x 27h"

Similarly, this means that while it is not an error to observe an element that isn’t in the DOM tree, no observations will occur until a box is actually laid out (when it is inserted, and creates a box). Removing an observed element from the DOM tree (which wasn’t hidden) also causes an observation.

How Observations are Delivered

ResizeObserver strictly specifies when and how things happen and attempts to ensure that calculation and observation always happen “downward” in the tree, and to help authors avoid circularity. Here’s how that happens:

  1. Boxes are created.
  2. Layout happens.
  3. The browser starts a rendering update, and runs the steps up to and including the Intersection Observer steps.
  4. The system gathers and compares the box sizes of observed element with their previously recorded size.
  5. ResizeObserver callback is called passing ResizeObserverEntry objects containing information about the new sizes.
  6. If any changes are incurred during the callback, then layout happens again, but here, the system finds the shallowest at which depth a change occurred (measured in simple node depth from the root). Any changes that are related to something deeper down in the tree are delivered at once, while any that are not are queued up and delivered in the next frame, and an error message will be sent to the Web Inspector console: (ResizeObserver loop completed with undelivered notifications).
  7. Subsequent steps in the rendering updates are executed (i.e. painting happens).


In Safari Technology Preview, entries contain a .contentRect property reflecting the size of the Content Box. After early feedback, the spec is being iterated on in backward compatible ways which will also provide a way to get the measure of the Border Box. Future versions of this API will also allow an optional second argument to .observe which allows you to specify which boxes (Content or Border) you want to receive information about.

Useful Example

Suppose that we have a component containing an author’s profile. It might be used on devices with many sized screens, and in many layout contexts. It might even be provided for reuse as a custom element somehow. Further, these sizes can change at runtime for any number of reasons:

  • On a desktop, the user resizes their window
  • On a mobile device, the user changes their orientation
  • A new element comes into being, or is removed from the DOM tree causing a re-layout
  • Some other element in the DOM changes size for any reason (some elements are even user resizable)

Depending on the amount of space available to us at any given point in time, we’d like to apply some different CSS—laying things out differently, changing some font sizes, perhaps even using different colors.

For this, let’s assume that we follow a ‘responsive first’ philosophy and make our initial design for the smallest screen size. As available space gets bigger, we have another design that should take effect when there are 768px available, and still another when there are at least 1024px. We’ll make these designs with our page using classes “.container-medium” and “.container-large”. Now all we have to do is add or remove those classes automatically.

/* Tell the observer how to manage the attributes */
const callback = (entries) => {
  entries.forEach((entry) => {
    let w = entry.contentRect.width
    let container = entry.target

    // clear out any old ones
    container.classList.remove('container-medium', 'container-large')

    // add one if a 'breakpoint' is true
    if (w > 1024) {
    } else if (w > 768) {

/* Create the instance **/
const myObserver = new ResizeObserver(callback)

/* Find the elements to observe */
const profileEls = [...document.querySelectorAll('.profile')]

/* .observe each **/
profileEls.forEach(el => myObserver.observe(el))

Now, each .profile element will gain the class of .container-medium or .container-large if their available size meets our specified criteria, and our designs will always be appropriately applied based on their available size. You can, of course, combine this with a MutationObserver or as a Custom Element in order to account for elements which might come into existence later.


We’re excited to have ResizeObserver available in Safari Technology Preview! Please try it out and file bugs for any issues you run into.

January 24, 2020 06:00 PM

January 22, 2020

Release Notes for Safari Technology Preview 99

Surfin’ Safari

Safari Technology Preview Release 99 is now available for download for macOS Catalina and macOS Mojave. If you already have Safari Technology Preview installed, you can update in the Software Update pane of System Preferences on macOS.

This release covers WebKit revisions 253789-254696.

Legacy Plug-Ins

  • Removed support for Adobe Flash

Web Inspector

  • Elements
    • Enabled the P3 color picker (r253802)
    • Added RGBA input fields for the P3 color picker (r254243)
    • Added support for manipulating the value with the arrow keys in the color picker (r254094)
    • Added color() suggestion when editing a CSS property that accepts color values (r254316)
  • Sources
    • Allowed editing of style sheets injected by Safari App Extensions (r254186)
  • Console
    • Ensured that the clear button is always visible, even at smaller widths (r253800)


  • Added support for using valid non-zero width and height attributes to become the default aspect ratio of <img> (r254669)
  • Added a check to ensure Service Workers terminate after a period of time when thread blocking (r253898)
  • Aligned Range.intersectsNode() with the DOM specification (r254018)
  • Changed <iframe> attributes to be processed on srcdoc attribute removal (r254498)
  • Changed <img>.naturalWidth to return the density-corrected intrinsic width (r254229)
  • Changed <link> with non-CSS type to not be retrieved (r253992)
  • Changed Object.keys to throw if called on a module namespace object with uninitialized binding (r254390)
  • Changed Object.preventExtensions to throw if not successful (r254626)
  • Changed Document.createAttribute() to take in a localName, not a qualifiedName (r254021)
  • Changed the supported MIME types for image encoding to be supported image MIME types (r254541)
  • Denied Notification API access for non-secure contexts (r253899)
  • Fixed dispatchEvent() to not clear the event’s isTrusted flag when it returns early (r254016)
  • Fixed String.prototype.replace() incorrectly handling named references on regular expressions without named groups (r254088)
  • Fixed URL parser in Fetch not always using UTF-8 (r254672)
  • Fixed encoding entities correctly in <style> element during XML serialization of text (r253988)
  • Removed the low priority resource load for sendBeacon to reduce failure rates (r253847)
  • Updated Fetch to Handle empty Location value (r253814)


  • Fixed document.cookie to not do a sync IPC to the network process for iframes that do not have storage access (r254556)


  • Added support for image-set() standard syntax (r254406)
  • Added support for rendering highlights specified in CSS Highlight API (r253857)
  • Implemented a network error when fetching a linked stylesheet resource fails (r254043)
  • Improved performance by invalidating only affected elements after media query evaluation changes (r253875)
  • Fixed rejected changes between similar unprefixed and prefixed gradient syntax (r254164)
  • Excluded implicit CSS grid tracks from the resolved value (r254561)


  • Enabled HDR Media Capabilities by default (r253853)
  • Fixed specification violation in Font Loading API (r254220)
  • Ignored URL host for schemes that are not using host information (r253946)
  • Implemented “create a potential-CORS request” (r254000)
  • Implemented transceiver setCodecPreferences (r253966)
  • Made text track loading set same-origin fallback flag (r254031)
  • Fixed MediaKeySession.load() failing (r253852)


  • Removed the certificate info checks related to getUserMedia (r253827)

Payment Request

  • Converted the payment method data IDL in the PaymentRequest constructor (r253986)

Web Animations

  • Stopped creating CSS Animations for <noscript> elements (r254201)


  • Fixed invalid date parsing for ISO 8601 strings when no timezone given (r254038)
  • Fixed RegExp.prototype[Symbol.replace] to support named capture groups (r254195)

Web Share API

  • Added support for a user gesture to allow using the Web Share API even when preceded by an XHR call (r254178)


  • Reimplemented the “Execute Async Script” command with Promises to match the specification (r254329)
  • Fixed handling of session timeouts for values higher than MAX_INT (r253883)
  • Fixed scripts being executed in the wrong page context after a history navigation (r254328)


  • Improved performance by removing the timer for pending operations in IDBTransaction (r253807)

January 22, 2020 06:00 PM

January 16, 2020

Paulo Matos: Cross-Arch Reproducibility using Containers

Igalia WebKit

I present the use of containers for cross architecture reproducibility using docker and podman, which I then go on to apply to JSC. If you are trying to understand how to create cross-arch reproducible environments for your software, this might help you!


By Paulo Matos at January 16, 2020 04:00 PM

January 08, 2020

Release Notes for Safari Technology Preview 98

Surfin’ Safari

Safari Technology Preview Release 98 is now available for download for macOS Catalina and macOS Mojave. If you already have Safari Technology Preview installed, you can update in the Software Update pane of System Preferences on macOS.

This release covers WebKit revisions 252823-253789.

Web Inspector

  • Elements
    • Removed the “Show/Hide Shadow DOM” navigation item (r253706)
    • Restricted showing paint flashing and compositing borders to the Web Inspector session (r253739)
    • Ensure that a bezier swatch is shown for CSS timing function keywords (r253758)
    • Fixed hovering over an invalid value while holding ⌘ to change the color of the text (r253405)
    • Fixed the Classes input to not display on top of other content (r253167)
  • Network
    • Fixed pressing ⌘F when no network item is selected to focus the filter bar (r253160)
  • Sources
    • Fixed non-regex local overrides to not apply to resources that only contain the URL instead of completely matching the URL (r253246)
  • Storage
    • Added support for filtering IndexedDB stores and indexes (r253161)
  • Audit
    • Fixed selected item before entering edit mode not being reselected after exiting edit mode (r253759)
    • Fixed importing a result with DOM nodes that don’t match the inspected page appearing as empty lines (r253757)
  • Console
    • Ensure copying an evaluation result does not include the saved variable index (r253169)
  • Search
    • Added basic “No Search Results” text with a clickable help navigation item that reveals and focuses the navigation sidebar search input when there is no active search (r253165)

Web Animations

  • Enabled Web Animations CSS Integration, a new implementation of CSS Animations and CSS Transitions, by default (r252945)
  • Fixed layout of element children with forwards-filling opacity animation that can be incorrect after removal (r252879)
  • Implemented Animation.commitStyles() (r252966)


  • Enabled the Generic Text Track Cue API (r253695)


  • Ensured transparency layers are properly ended when only painting root background (r253692)
  • Fixed an issue where elements could jump to the wrong position after some compositing-related style changes (r252935)


  • Implemented OffscreenCanvas.convertToBlob (r253474)
  • Changed setting toString or valueOf on a cross-origin Location object to throw a SecurityError (r253418)
  • Fixed an incorrect association of the URL object with the port value (r252998)
  • Prevented synchronous XHR in beforeunload and unload event handlers (r253213)


  • Changed to not perform range checking for calc() at parse time (r252983)
  • Changed media queries in img sizes attribute to evaluate dynamically (r252828)
  • Implemented the clamp() function (r253105)
  • Improved computed values of calc() functions to match the specification (r253079)


  • Changed Object.prototype.isPrototypeOf() to check if the passed in value is a non-object first (r253264)


  • Added protection for WebRTC network monitoring to wait forever in edge cases (r253203)
  • Fixed audio elements that resumed playback after getUserMedia (r253742)

Clipboard API

  • Added sanitization for HTML and image data written using clipboard.write (r253486)

Browser Changes

  • Changed to issue the load sooner on swipe back/forward navigation (r253360)
  • Re-disabled TLS 1.0 and TLS 1.1 by default (r253292)


  • Changed to validate and generate bytecode in a single pass (r253140)

January 08, 2020 09:15 PM

Angelos Oikonomopoulos: A Dive Into JavaScriptCore

Igalia WebKit

Recently, the compiler team at Igalia was discussing the available resources for the WebKit project, both for the purpose of onboarding new Igalians and for lowering the bar for third-party contributors. As compiler people, we are mainly concerned with JavaScriptCore (JSC), WebKit’s javascript engine implementation. There are many high quality blog posts on the webkit blog that describe various phases in the evolution of JSC, but finding one’s bearings in the actual source can be a daunting task.

The aim of this post is twofold: first, document some aspects of JavaScriptCore at the source level; second, show how one can figure out what a piece of code actually does in a large and complex source base (which JSC’s certainly is).

In medias res

As an exercise, we’re going to arbitrarily use a commit I had open in a web browser tab. Specifically, we will be looking at this snippet:

Operands<Optional<JSValue>> mustHandleValues(codeBlock->numParameters(), numVarsWithValues);
int localsUsedForCalleeSaves = static_cast<int>(CodeBlock::llintBaselineCalleeSaveSpaceAsVirtualRegisters());
for (size_t i = 0; i < mustHandleValues.size(); ++i) {
    int operand = mustHandleValues.operandForIndex(i);
    if (operandIsLocal(operand) && VirtualRegister(operand).toLocal() < localsUsedForCalleeSaves)
    mustHandleValues[i] = callFrame->uncheckedR(operand).jsValue();

This seems like a good starting point for taking a dive into the low-level details of JSC internals. Virtual registers look like a concept that’s good to know about. And what are those “locals used for callee saves” anyway? How do locals differ from vars? What are “vars with values”? Let’s find out!


Recall that JSC is a multi-tiered execution engine. Most Javascript code is only executed once; compiling takes longer than simply interpreting the code, so Javascript code is always interpreted the first time through. If it turns out that a piece of code is executed frequently though1, compiling it becomes a more attractive proposition.

Initially, the tier up happens to the baseline JIT, a simple and fast non-optimizing compiler that produces native code for a Javascript function. If the code continues to see much use, it will be recompiled with DFG, an optimizing compiler that is geared towards low compilation times and decent performance of the produced native code. Eventually, the code might end up being compiled with the FTL backend too, but the upper tiers won’t be making an appearence in our story here.

What do tier up and tier down mean? In short, tier up is when code execution switches to a more optimized version, whereas tier down is the reverse operation. So the code might tier up from the interpreter to the baseline JIT, but later tier down (under conditions we’ll briefly touch on later) back to the baseline JIT. You can read a more extensive overview here.

Diving in

With this context now in place, we can revisit the snippet above. The code is part of operationOptimize. Just looking at the two sites it’s referenced in, we can see that it’s only ever used if the DFG_JIT option is enabled. This is where the baseline JIT ➞ DFG tier up happens!

The sites that make use of operationOptimize both run during the generation of native code by the baseline JIT. The first one runs in response to the op_enter bytecode opcode, i.e. the opcode that marks entry to the function. The second one runs when encountering an op_loop_hint opcode (an opcode that only appears at the beginning of a basic block marking the entry to a loop). Those are the two kinds of program points at which execution might tier up to the DFG.

Notice that calls to operationOptimize only occur during execution of the native code produced by the baseline JIT. In fact, if you look at the emitted code surrounding the call to operationOptimize for the function entry case, you’ll see that the call is conditional and only happens if the function has been executed enough times that it’s worth making a C++ call to consider it for optimization.

The function accepts two arguments: a vmPointer which is, umm, a pointer to a VM structure (i.e. the “state of the world” as far as this function is concerned) and the bytecodeIndex. Remember that the bytecode is the intermediate representation (IR) that all higher tiers start compiling from. In operationOptimize, the bytecodeIndex is used for

Again, the bytecodeIndex is a parameter that has already been set in stone during generation of the native code by the baseline JIT.

The other parameter, the VM, is used in a number of things. The part that’s relevant to the snippet we started out to understand is that the VM is (sometimes) used to give us access to the current CallFrame. CallFrame inherits from Register, which is a thin wrapper around a (maximally) 64-bit value.

The CodeBlock

In this case, the various accessors defined by CallFrame effectively treat the (pointer) value that CallFrame consists of as a pointer to an array of Register values. Specifically, a set of constant expressions

struct CallFrameSlot {
    static constexpr int codeBlock = CallerFrameAndPC::sizeInRegisters;
    static constexpr int callee = codeBlock + 1;
    static constexpr int argumentCount = callee + 1;
    static constexpr int thisArgument = argumentCount + 1;
    static constexpr int firstArgument = thisArgument + 1;

give the offset (relative to the callframe) of the pointer to the codeblock, the callee, the argument count and the this pointer. Note that the first CallFrameSlot is the CallerFrameAndPC, i.e. a pointer to the CallFrame of the caller and the returnPC.

The CodeBlock is definitely something we’ll need to understand better, as it appears in our motivational code snippet. However, it’s a large class that is intertwined with a number of other interesting code paths. For the purposes of this discussion, we need to know that it

  • is associated with a code block (i.e. a function, eval, program or module code block)
  • holds data relevant to tier up/down decisions and operations for the associated code block

We’ll focus on three of its data members:

int m_numCalleeLocals;
int m_numVars;
int m_numParameters;

So, it seems that a CodeBlock can have at least some parameters (makes sense, right?) but also has both variables and callee locals.

First things first: what’s the difference between callee locals and vars? Well, it turns out that m_numCalleeLocals is only incremented in BytecodeGeneratorBase<Traits>::newRegister whereas m_numVars is only incremented in BytecodeGeneratorBase<Traits>::addVar(). Except, addVar calls into newRegister, so vars are a subset of callee locals (and therefore m_numVarsm_numCalleelocals).

Somewhat surprisingly, newRegister is only called in 3 places:

So there you have it. Callee locals

  1. are allocated by a function called newRegister
  2. are either a var or a temporary.

Let’s start with the second point. What is a var? Well, let’s look at where vars are created (via addVar):

There is definitely a var for every lexical variable (VarKind::Stack), i.e. a non-local variable accessible from the current scope. Vars are also generated (via BytecodeGenerator::createVariable) for

So, intuitively, vars are allocated more or less for “every JS construct that could be called a variable”. Conversely, temporaries are storage locations that have been allocated as part of bytecode generation (i.e. there is no corresponding storage location in the JS source). They can store intermediate calculation results and what not.

Coming back to the first point regarding callee locals, how come they’re allocated by a function called newRegister? Why, because JSC’s bytecode operates on a register VM! The RegisterID returned by newRegister wraps the VirtualRegister that our register VM is all about.

Virtual registers, locals and arguments, oh my!

A virtual register (of type VirtualRegister) consists simply of an int (which is also called its offset). Each virtual register corresponds to one of

There is no differentiation between locals and arguments at the type level (everything is a (positive) int); However, virtual registers that map to locals are negative and those that map to arguments are nonnegative. In the context of bytecode generation, the int

It feels like JSC is underusing C++ here.

In all cases, what we get after indexing with a local, argument or constant is a RegisterID. As explained, the RegisterID wraps a VirtualRegister. Why do we need this indirection?

Well, there are two extra bits of info in the RegisterID. The m_refcount and an m_isTemporary flag. The reference count is always greater than zero for a variable, but the rules under which a RegisterID is ref’d and unref’d are too complicated to go into here.

When you have an argument, you get the VirtualRegister for it by directly adding it to CallFrame::thisArgumentoffset.

When you have a local, you map it to (-1 - local) to get the corresponding Virtualregister. So

local vreg
0 -1
1 -2
2 -3

(remember, virtual registers that correspond to locals are negative).

For an argument, you map it to (arg + CallFrame::thisArgumentOffset()):

argument vreg
0 this
1 this + 1
2 this + 2

Which makes all the sense in the world when you remember what the CallFrameSlot looks like. So argument 0 is always the `this` pointer.

If the vreg is greater than some large offset (s_firstConstantRegisterIndex), then it is an index into the CodeBlock's constant pool (after subtracting the offset).

Bytecode operands

If you’ve followed any of the links to the functions doing the actual mapping of locals and arguments to a virtual register, you may have noticed that the functions are called localToOperand and argumentToOperand. Yet they’re only ever used in virtualRegisterForLocal and virtualRegisterForArgument respectively. This raises the obvious question: what are those virtual registers operands of?

Well, of the bytecode instructions in our register VM of course. Instead of recreating the pictures, I’ll simply encourage you to take a look at a recent blog post describing it at a high level.

How do we know that’s what “operand” refers to? Well, let’s look at a use of virtualRegisterForLocal in the bytecode generator. BytecodeGenerator::createVariable will allocate2 the next available local index (using the size of m_calleeLocals to keep track of it). This calls into virtualRegisterForLocal, which maps the local to a virtual register by calling localToOperand.

The newly allocated local is inserted into the function symbol table, along with its offset (i.e. the ID of the virtual register).

The SymbolTableEntry is looked up when we generate bytecode for a variable reference. A variable reference is represented by a ResolveNode3.

So looking into ResolveNode::emitBytecode, we dive into BytecodeGenerator::variable and there’s our symbolTable->get() call. And then the symbolTableEntry is passed to BytecodeGenerator::variableForLocalEntry which uses entry.varOffset() to initialize the returned Variable with offset. It also uses registerFor to retrieve the RegisterID from m_calleeLocals.

ResolveNode::emitBytecode will then pass the local RegisterID to move which calls into emitMove, which just calls OpMov::emit (a function generated by the JavaScriptCore/generator code). Note that the compiler implicitly converts the RegisterID arguments to VirtualRegister type at this step. Eventually, we end up in the (generated) function

template<OpcodeSize __size, bool recordOpcode, typename BytecodeGenerator>
static bool emitImpl(BytecodeGenerator* gen, VirtualRegister dst, VirtualRegister src)
    if (__size == OpcodeSize::Wide16)
    else if (__size == OpcodeSize::Wide32)
    if (checkImpl<__size>(gen, dst, src)) {
	if (recordOpcode)
	if (__size == OpcodeSize::Wide16)
	    gen->write(Fits<OpcodeID, OpcodeSize::Narrow>::convert(op_wide16));
	else if (__size == OpcodeSize::Wide32)
	    gen->write(Fits<OpcodeID, OpcodeSize::Narrow>::convert(op_wide32));
	gen->write(Fits<OpcodeID, __size>::convert(opcodeID));
	gen->write(Fits<VirtualRegister, __size>::convert(dst));
	gen->write(Fits<VirtualRegister, __size>::convert(src));
	return true;
    return false;

where Fits::convert(VirtualRegister) will trivially encode the VirtualRegister into the target type. Specifically the mapping is nicely summed up in the following comment

// Narrow:
// -128..-1  local variables
//    0..15  arguments
//   16..127 constants
// Wide16:
// -2**15..-1  local variables
//      0..64  arguments
//     64..2**15-1 constants

You may have noticed that the Variable returned by BytecodeGenerator::variableForLocalEntry already has been initialized with the virtual register offset we set when inserting the SymbolTableEntry for the local variable. And yet we use registerFor to look up the RegisterID for the local and then use the offset of the VirtualRegister contained therein. Surely those are the same? Oh well, something for a runtime assert to check.

Variables with values

Whew! Quite the detour there. Time to get back to our original snippet:

Operands<Optional<JSValue>> mustHandleValues(codeBlock->numParameters(), numVarsWithValues);
int localsUsedForCalleeSaves = static_cast<int>(CodeBlock::llintBaselineCalleeSaveSpaceAsVirtualRegisters());
for (size_t i = 0; i < mustHandleValues.size(); ++i) {
    int operand = mustHandleValues.operandForIndex(i);
    if (operandIsLocal(operand) && VirtualRegister(operand).toLocal() < localsUsedForCalleeSaves)
    mustHandleValues[i] = callFrame->uncheckedR(operand).jsValue();

What are those numVarsWithValues then? Well, the definition is right before our snippet:

unsigned numVarsWithValues;
if (bytecodeIndex)
    numVarsWithValues = codeBlock->numCalleeLocals();
    numVarsWithValues = 0;

OK, so this looks straighforward for a change. If the bytecodeIndex is not zero, we’re doing the tier up from JIT to DFG in the body of a function (i.e. at a loop entry). In that case, we consider all our callee locals to have values. Conversely, when we’re running for the function entry (i.e. bytecodeIndex == 0), none of the callee locals are live yet. Do note that the variable is incorrectly named. Vars are not the same as callee locals; we’re dealing with the latter here.

A second gotcha is that, whereas vars are always live, temporaries might not be. The DFG compiler will do liveness analysis at compile time to make sure it’s only looking at live values. That must have been a fun bug to track down!

Values that must be handled

Back to our snippet, numVarsWithValues is used as an argument to the constructor of mustHandleValues which is of type Operands<Optional<JSValue>>. Right, so what are the Operands? They simply hold a number of T objects (here T is Optional<JSValue>) of which the first m_numArguments correspond to, well, arguments whereas the remaining correspond to locals.

What we’re doing here is recording all the live (non-heap, obviously) values when we try to do the tier up. The idea is to be able to mix those values in with the previously observed values that DFG’s Control Flow Analysis will use to emit code which will bail us out of the optimized version (i.e. do a tier down). According to the comments and commit logs, this is in order to increase the chances of a successful OSR entry (tier up), even if the resulting optimized code may be slightly less conservative.

Remember that the optimized code that we tier up to makes assumptions with regard to the types of the incoming values (based on what we’ve observed when executing at lower tiers) and wil bail out if those assumptions are not met. Taking the values of the current execution at the time of the tier up attempt ensures we won’t be doing all this work only to immediately have to tier down again.

Operands provides an operandForIndex method which will directly give you a virtual reg for every kind of element. For example, if you had called Operands<T> opnds(2, 1), then the first iteration of the loop would give you

-> VirtualRegisterForargument(0).offset()
  -> VirtualRegister(argumentToOperand(0)).offset()
    -> VirtualRegister(CallFrame::thisArgumentOffset).offset()
      -> CallFrame::thisArgumentOffset

The second iteration would similarly give you CallFrame::thisArgumentOffset + 1.

In the third iteration, we’re now dealing with a local, so we’d get

-> virtualRegisterForLocal(2 - 2).offset()
  -> VirtualRegister(localToOperand(0)).offset()
    -> VirtualRegister(-1).offset()
      -> -1

Callee save space as virtual registers

So, finally, what is our snippet doing here? It’s iterating over the values that are likely to be live at this program point and storing them in mustHandleValues. It will first iterate over the arguments (if any) and then over the locals. However, it will use the “operand” (remember, everything is an int…) to get the index of the respective local and then skip the first locals up to localsUsedForCalleeSaves. So, in fact, even though we allocated space for (arguments + callee locals), we skip some slots and only store (arguments + callee locals - localsUsedForCalleeSaves). This is OK, as the Optional<JSValue> values in the Operands will have been initialized by the default constructor of Optional<> which gives us an object without a value (i.e. an object that will later be ignored).

Here, callee-saved register (csr) refers to a register that is available for use to the LLInt and/or the baseline JIT. This is described a bit in LowLevelInterpreter.asm, but is more apparent when one looks at what csr sets are used on each platform (or, in C++).

platform metadataTable PC-base (PB) numberTag notCellMask
X86_64 csr1 csr2 csr3 csr4
x86_64_win csr3 csr4 csr5 csr6
ARM64~/~ARM64E csr6 csr7 csr8 csr9
C_LOOP 64b csr0 csr1 csr2 csr3
C_LOOP 32b csr3 - - -
ARMv7 csr0 - - -
MIPS csr0 - - -
X86 - - - -

On 64-bit platforms, offlineasm (JSC’s portable assembler) makes a range of callee-saved registers available to .asm files. Those are properly saved and restored. For example, for X86_64 on non-Windows platforms, the returned RegisterSet contains registers r12-r15 (inclusive), i.e. the callee-saved registers as defined in the System V AMD64 ABI. The mapping from symbolic names to architecture registers can be found in GPRInfo.

On 32-bit platforms, the assembler doesn’t make any csr regs available, so there’s nothing to save except if the platform makes special use of some register (like C_LOOP does for the metadataTable 4).

What are the numberTag and notCellMask registers? Out of scope, that’s what they are!


Well, that wraps it up. Hopefully now you have a better understanding of what the original snippet does. In the process, we learned about a few concepts by reading through the source and, importantly, we added lots of links to JSC’s source code. This way, not only can you check that the textual explanations are still valid when you read this blog post, you can use the links as spring boards for further source code exploration to your heart’s delight!


1 Both the interpreter – better known as LLInt – and the baseline JIT keep track of execution statistics, so that JSC can make informed decisions on when to tier up.

2 Remarkably, no RegisterID has been allocated at this point – we used the size of m_calleeLocals but never modified it. Instead, later in the function (after adding the new local to the symbol table!) the code will call addVar which will allocate a new “anonymous” local. But then the code asserts that the index of the newly allocated local (i.e. the offset of the virtual register it contains) is the same as the offset we previously used to create the virtual register, so it’s all good.

3 How did we know to look for the ResolveNode? Well, the emitBytecode method needs to be implemented by subclasses of ExpressionNode. If we look at how a simple binary expression is parsed (and given that ASTBuilder defines BinaryOperand as std::pair<ExpressionNode*, BinaryOpInfo>), it’s clear that any variable reference has already been lifted to an ExpressionNode.

So instead, we take the bottom up approach. We find the lexer/parser token definitions, one of which is the IDENT token. Then it’s simply a matter of going over its uses in Parser.cpp, until we find our smoking gun. This gets us into createResolve aaaaand

return new (m_parserArena) ResolveNode(location, ident, start);

That’s the node we’re looking for!

4 C_LOOP is a special backend for JSC’s portable assembler. What is special about it is that it generates C++ code, so that it can be used on otherwise unsupported architectures. Remember that the portable assembler (offlineasm) runs at compilation time.

January 08, 2020 12:00 PM

Angelos Oikonomopoulos: A Dive Into JavaScriptCore

Igalia WebKit

This post is an attempt to both document some aspects of JSC at the source level and to show how one can figure out what a piece of code actually does in a source base as large and complex as JSC's.

January 08, 2020 12:00 PM