September 09, 2020

Release Notes for Safari Technology Preview 113

Surfin’ Safari

Safari Technology Preview Release 113 is now available for download for macOS Big Sur and macOS Catalina. If you already have Safari Technology Preview installed, you can update in the Software Update pane of System Preferences on macOS.

This release covers WebKit revisions 265179-265893.

Web Inspector

  • Timelines Tab
    • Fixed background colors for odd and even items in Dark Mode in the Timelines tab (r265498)
    • Media & Animations timeline shouldn’t shift when sorting (r265356)
  • Adapted Web Inspector’s user interface and styling to better match macOS Big Sur (r265237, r265507)

Web Audio

  • Added constructor for GainNode (r265227)
  • Added constructor for BiquadFilterNode (r265290)
  • Added constructor for ConvolverNode (r265298)
  • Added constructor for DelayNode (r265221)
  • Added constructor for AudioBuffer (r265210)
  • Added constructor for AnalyserNode (r265196)
  • Added support for suspending and resuming an OfflineAudioContext (r265701)
  • Added constructor for the MediaElementAudioSourceNode interface (r265330)
  • Aligned AudioListener with the W3C specification (r265266)
  • Aligned BiquadFilterNode.getFrequencyResponse() with the specification (r265291)
  • Fixed BiquadFilterNode’s lowpass filter (r265517)
  • Fixed missing length attribute on OfflineAudioContext (r265388)
  • Fixed missing baseLatency attribute on the AudioContext interface (r265393)

WebRTC

  • Added support for MediaRecorder bitrate options (r265328)

Rendering

  • Updated to avoid triggering redundant compositing updates when trying to run a steps() animation on transform (r265358)
  • Fixed inconsistent spacing of Chinese characters in Safari for macOS Big Sur (r265488)

Media

  • Enabled H.264 low latency code path by default for macOS (r265547)
  • Fixed the picture-in-picture button disappearing in the fullscreen YouTube player after starting a new video in a playlist (r265690)

CSS

  • Changed to apply aspect ratios when computing flex-basis (r265855)
  • Fixed updating min-height: auto after an image loads when the image has a specified height and width (r265858)
  • Fixed @font-face font-weight descriptor to reject bolder and lighter (r265677)
  • Fixed the CSS specificity of :host() pseudo-classes (r265812)

WebDriver

  • Fixed window.print to not invoke native UI (r265207)

Accessibility

  • Added VoiceOver access to font styling at insertion point (r265259)

Loading

  • Fixed font loads quickly followed by navigations failing indefinitely (r265603)

Web API

  • Implemented Canvas.transferControlToOffscreen and OffscreenCanvasRenderingContext2D.commit (r265543)
  • Implemented createImageBitmap(ImageData) (r265360)
  • Implemented PerfomanceObserverInit.buffered (r265390)
  • Fixed text input autocorrect="off" attribute getting ignored on macOS (r265509)

Gamepad API

  • Added a special HID mapping for the Google Stadia controller (r265180)
  • Added HID mapping for the Logitech F310/F710 controllers. (r265183)

Translation

  • Fixed table data incorrectly translated in some articles on wikipedia.org (r265188)
  • Fixed leading and trailing spaces to be ignored when comparing content (r265361)

September 09, 2020 05:32 PM

September 07, 2020

Víctor Jáquez: Review of Igalia Multimedia activities (2020/H1)

Igalia WebKit

This blog post is a review of the various activities the Igalia Multimedia team was involved in during the first half of 2020.

Our previous reports are:

Just before a new virus turned into pandemics we could enjoy our traditional FOSDEM. There, our colleague Phil gave a talk about many of the topics covered in this report.

GstWPE

GstWPE’s wpesrc element, produces a video texture representing a web page rendered off-screen by WPE.

We have worked on a new iteration of the GstWPE demo, focusing on one-to-many, web-augmented overlays, broadcasting with WebRTC and Janus.

Also, since the merge of gstwpe plugin in gst-plugins-bad (staging area for new elements) new users have come along spotting rough areas and improving the element along the way.

Video Editing

GStreamer Editing Services (GES) is a library that simplifies the creation of multimedia editing applications. It is based on the GStreamer multimedia framework and is heavily used by Pitivi video editor.

Implemented frame accuracy in the GStreamer Editing Services (GES)

As required by the industry, it is now possible to reference all time in frame number, providing a precise mapping between frame number and play time. Many issues were fixed in GStreamer to reach the precision enough for make this work. Also intensive regression tests were added.

Implemented time effects support in GES

Important refactoring inside GStreamer Editing Services have happened to allow cleanly and safely change playback speed of individual clips.

Implemented reverse playback in GES

Several issues have been fixed inside GStreamer core elements and base classes in order to support reverse playback. This allows us to implement reliable and frame accurate reverse playback for individual clips.

Implemented ImageSequence support in GStreamer and GES

Since OpenTimelineIO implemented ImageSequence support, many users in the community had said it was really required. We reviewed and finished up imagesequencesrc element, which had been awaiting review for years.

This feature is now also supported in the OpentimelineIO GES adapater.

Optimized nested timelines preroll time by an order of magnitude

Caps negotiation, done while the pipeline transitions from pause state to playing state, testing the whole pipeline functionality, was the bottleneck for nested timelines, so pipelines were reworked to avoid useless negotiations. At the same time, other members of GStreamer community have improved caps negotiation performance in general.

Last but not least, our colleague Thibault gave a talk in The Pipeline Conference about The Motion Picture Industry and Open Source Software: GStreamer as an Alternative, explaining how and why GStreamer could be leveraged in the motion picture industry to allow faster innovation, and solve issues by reusing all the multi-platform infrastructure the community has to offer.

WebKit multimedia

There has been a lot of work on WebKit multimedia, particularly for WebKitGTK and WPE ports which use GStreamer framework as backend.

WebKit Flatpak SDK

But first of all we would like to draw readers attention to the new WebKit Flatpak SDK. It was not a contribution only from the multimedia team, but rather a joint effort among different teams in Igalia.

Before WebKit Flatpak SDK, JHBuild was used for setting up a WebKitGTK/WPE environment for testing and development. Its purpose to is to provide a common set of well defined dependencies instead of relying on the ones available in the different Linux distributions, which might bring different outputs. Nonetheless, Flatpak offers a much more coherent environment for testing and develop, isolated from the rest of the building host, approaching to reproducible outputs.

Another great advantage of WebKit Flatpak SDK, at least for the multimedia team, is the possibility of use gst-build to setup a custom GStreamer environment, with latest master, for example.

Now, for sake of brevity, let us sketch an non-complete list of activities and achievements related with WebKit multimedia.

General multimedia

Media Source Extensions (MSE)

Encrypted Media Extension (EME)

One of the major results of this first half, is the upstream of ThunderCDM, which is an implementation of a Content Decryption Module, providing Widevine decryption support. Recently, our colleague Xabier, published a blog post on this regard.

And it has enabled client-side video rendering support, which ensures video frames remain protected in GPU memory so they can’t be reached by third-party. This is a requirement for DRM/EME.

WebRTC

GStreamer

Though we normally contribute in GStreamer with the activities listed above, there are other tasks not related with WebKit. Among these we can enumerate the following:

GStreamer VAAPI

  • Reviewed a lot of patches.
  • Support for media-driver (iHD), the new VAAPI driver for Intel, mostly for Gen9 onwards. There are a lot of features with this driver.
  • A new vaapioverlay element.
  • Deep code cleanups. Among these we would like to mention:
    • Added quirk mechanism for different backends.
    • Change base classes to GstObject and GstMiniObject of most of classes and buffers types.
  • Enhanced caps negotiation given current driver’s constraints

Conclusions

The multimedia team in Igalia has keep working, along the first half of this strange year, in our three main areas: browsers (mainly on WebKitGTK and WPE), video editing and GStreamer framework.

We worked adding and enhancing WebKitGTK and WPE multimedia features in order to offer a solid platform for media providers.

We have enhanced the Video Editing support in GStreamer.

And, along these tasks, we have contribuited as much in GStreamer framework, particulary in hardware accelerated decoding and encoding and VA-API.

By vjaquez at September 07, 2020 03:12 PM

September 02, 2020

Xabier Rodríguez Calvar: Serious Encrypted Media Extensions on GStreamer based WebKit ports

Igalia WebKit

Encrypted Media Extensions (a.k.a. EME) is the W3C standard for encrypted media in the web. This way, media providers such as Hulu, Netflix, HBO, Disney+, Prime Video, etc. can provide their contents with a reasonable amount of confidence that it will make it very complicated for people to “save” their assets without their permission. Why do I use the word “serious” in the title? In WebKit there is already support for Clear Key, which is the W3C EME reference implementation but EME supports more encryption systems, even privative ones (I have my opinion about this, you can ask me privately). No service provider (that I know) supports Clear Key, they usually rely on Widevine, PlayReady or some other.

Three years ago, my colleague Žan Doberšek finished the implementation of what was going to be the shell of WebKit’s modern EME implementation, following latest W3C proposal. We implemented that downstream (at Web Platform for Embedded) as well using Thunder, which includes as a plugin a fork of what was Open Content Decryption Module (a.k.a. OpenCDM). The OpenCDM API changed quite a lot during this journey. It works well and there are millions of set-top-boxes using it currently.

The delta between downstream and the upstream GStreamer based WebKit ports was quite big, testing was difficult and syncing was not always easy, so we decided reverse the situation.

Our first step was done by my colleague Charlie Turner, who made Clear Key work upstream again while adapted some changes the Apple folks had done meanwhile. It was amazing to see Clear Key tests passing again and his work with the CDMProxy related classes was awesome. After having ClearKey working, I had to adapt them a bit to accomodate Thunder. To explain a bit about the WebKit EME architecture, I must say that there are two layers. The first is the crossplatform one, which implements the W3C API (MediaKeys, MediaKeySession, CDM…). These classes rely on the platform ones (CDMPrivate, CDMInstance, CDMInstanceSession) to handle the platform management, message exchange, etc. which would be the second layer. Apple playback system is fully integrated with their DRM system so they don’t need anything else. We do because we need to integrate our own decryptors to defer to Thunder for decryption so in the GStreamer based ports we also need the CDMProxy related classes, which would be CDMProxy, CDMInstanceProxy, CDMInstanceSessionProxy… The last two extend CDMInstance and CDMInstanceSession respectively to be able to deal with the key management, that is abstracted to the KeyHandle and KeyStore.

Once the abstraction is there (let’s remember that the abstranction works both for Clear Key and Thunder), the Thunder implementation is quite simple, just gluing the CDMProxy, CDMInstanceProxy and CDMInstanceSessionProxy classes to the Thunder system and writing a GStreamer decryptor element for it. I might have made a mistake when selecting the files but considering Thunder classes + the GStreamer common decryptor code, cloc says it is just 1198 lines of platform code. I think it is pretty low for what it does. Apart from that, obviously, there are 5760 lines of crossplatform code.

To build and run all this you need to do several things:

  1. Build the dependencies with WEBKIT_JHBUILD=1 JHBUILD_ENABLE_THUNDER="yes" to enable the old fashioned JHBuild build and force it to build the Thunder dependencies. All dependendies are on JHBuild, even Widevine is referenced but to download it you need the proper credentials as it is closed source.
  2. Pass --thunder when calling build-webkit.sh.
  3. Run MiniBrowser with WEBKIT_GST_EME_RANK_PRIORITY="Thunder" and pass parameters --enable-mediasource=TRUE --enable-encrypted-media=TRUE --autoplay-policy=allow. The autoplay policy is usually optional but in this case it is necessary for the YouTube TV tests. We need to give the Thunder decryptor a higher priority because of WebM, that does not specify a key system and without it the Clear Key one can be selected and fail. MP4 does not create trouble because the protection system is specified and the caps negotiation does its magic.

As you could have guessed if you have a closer look at the GStreamer JHBuild moduleset, you’ll see that only Widevine is supported. To support more, you only have to make them build in the Thunder ecosystem and add them to CDMFactoryThunder::supportedKeySystems.

When I coded this, all YouTube TV tests for Widevine were green in the desktop. At the moment of writing this post they aren’t because of some problem with the Widevine installation that will be sorted quickly, I hope.

By calvaris at September 02, 2020 02:59 PM

August 27, 2020

Chris Lord: OffscreenCanvas, jobs, life

Igalia WebKit

Hoo boy, it’s been a long time since I last blogged… About 2 and a half years! So, what’s been happening in that time? This will be a long one, so if you’re only interested in a part of it (and who could blame you), I’ve titled each section.

Leaving Impossible

Well, unfortunately my work with Impossible ended, as we essentially ran out of funding. That’s really a shame, we worked on some really cool, open-source stuff, and we’ve definitely seen similar innovations in the field since we stopped working on it. We took a short break (during which we also, unsuccessfully, searched for further funding), after which Rob started working on a cool, related project of his own that you should check out, and I, being a bit less brave, starting seeking out a new job. I did consider becoming a full-time musician, but business wasn’t picking up as quickly as I’d hoped it might in that down-time, and with hindsight, I’m glad I didn’t (Covid-19 and all).

I interviewed with a few places, which was certainly an eye-opening experience. The last ‘real’ job interview I did was for Mozilla in 2011, which consisted mainly of talking with engineers that worked there, and working through a few whiteboard problems. Being a young, eager coder at the time, this didn’t really phase me back then. Turns out either the questions have evolved or I’m just not quite as sharp as I used to be in that very particular environment. The one interview I had that involved whiteboard coding was a very mixed bag. It seemed a mix of two types of questions; those that are easy to answer (but unless you’re in the habit of writing very quickly on a whiteboard, slow to write down) and those that were pretty impossible to answer without specific preparation. Perhaps this was the fault of recruiters, but you might hope that interviews would be catered somewhat to the person you’re interviewing, or the work they might actually be doing, neither of which seemed to be the case? Unsurprisingly, I didn’t get past that interview, but in retrospect I’m also glad I didn’t. Igalia’s interview process was much more humane, and involved mostly discussions about actual work I’ve done, hypothetical situations and ethics. They were very long discussions, mind, but I’m very glad that they were happy to hire me, and that I didn’t entertain different possibilities. If you aren’t already familiar with Igalia, I’d highly recommend having a read about them/us. I’ve been there a year now, and the feeling is quite similar to when I first joined Mozilla, but I believe with Igalia’s structure, this is likely to stay a happier and safer environment. Not that I mean to knock Mozilla, especially now, but anyone that has worked there will likely admit that along with the giddy highs, there are also some unfortunate lows.

Igalia

I joined Igalia as part of the team that works on WebKit, and that’s what I’ve been doing since. It almost makes perfect sense in a way. Surprisingly, although I’ve spent overwhelmingly more time on Gecko, I did actually work with WebKit first while at OpenedHand, and for a short period at Intel. While celebrating my first commit to WebKit, I did actually discover it wasn’t my first commit at all, but I’d contributed a small embedding-related fix-up in 2008. So it’s nice to have come full-circle! My first work at Igalia was fixing up some patches that Žan Doberšek had prototyped to allow direct display of YUV video data via pixel shaders. Later on, I was also pleased to extend that work somewhat by fixing some vc3 driver bugs and GStreamer bugs, to allow for hardware decoding of YUV video on Raspberry Pi 3b (this, I believe, is all upstream at this point). WebKit Gtk and WPE WebKit may be the only Linux browser backends that leverage this pipeline, allowing for 1080p30 video playback on a Pi3b. There are other issues making this less useful than you might think, but either way, it’s a nice first achievement.

OffscreenCanvas

After that introduction, I was pointed at what could be fairly described as my main project, OffscreenCanvas. This was also a continuation of Žan’s work (he’s prolific!), though there has been significant original work since. This might be the part of this post that people find most interesting or relevant, but having not blogged in over 2 years, I can’t be blamed for waffling just a little. OffscreenCanvas is a relatively new web standard that allows the use of canvas API disconnected from the DOM, and within Workers. It also makes some provisions for asynchronously updated rendering, allowing canvas updates in Workers to bypass the main thread entirely and thus not be blocked by long-running processes on that thread. The most obvious use-case for this, and I think the most practical, is essentially non-blocking rendering of generated content. This is extremely handy for maps, for example. There are some other nice use-cases for this as well – you can, for example, show loading indicators that don’t stop animating while performing complex DOM manipulation, or procedurally generate textures for games, asynchronously. Any situation where you might want to do some long-running image processing without blocking the main thread (image editing also springs to mind).

Currently, the only complete implementation is within Blink. Gecko has a partial implementation that only supports WebGL contexts (and last time I tried, crashed the browser on creation…), but as far as I know, that’s it. I’ve been working on this, with encouragement and cooperation from Apple, on and off for the past year. In fact, as of August 12th, it’s even partially usable, though there is still a fair bit missing. I’ve been concentrating on the 2d context use-case, as I think it’s by far the most useful part of the standard. It’s at the point where it’s mostly usable, minus text rendering and minus some edge-case colour parsing. Asynchronous updates are also not yet supported, though I believe that’s fairly close for Linux. OffscreenCanvas is enabled with experimental features, for those that want to try it out.

My next goal, after asynchronous updates on Linux, is to enable WebGL context support. I believe these aren’t particularly tough goals, given where it is now, so hopefully they’ll happen by the end of the year. Text rendering is a much harder problem, but I hope that between us at Igalia and the excellent engineers at Apple, we can come up with a plan for it. The difficulty is that both styling and font loading/caching were written with the assumption that they’d run on just one thread, and that that thread would be the main thread. A very reasonable assumption in a pre-Worker and pre-many-core-CPU world of course, but increasingly less so now, and very awkward for this particular piece of work. Hopefully we’ll persevere though, this is a pretty cool technology, and I’d love to contribute to it being feasible to use widely, and lessen the gap between native and the web.

And that’s it from me. Lots of non-work related stuff has happened in the time since I last posted, but I’m keeping this post tech-related. If you want to hear more of my nonsense, I tend to post on Twitter a bit more often these days. See you in another couple of years 🙂

By Chris Lord at August 27, 2020 08:56 AM

August 18, 2020

Release Notes for Safari Technology Preview 112

Surfin’ Safari

Safari Technology Preview Release 112 is now available for download for macOS Big Sur and macOS Catalina. If you already have Safari Technology Preview installed, you can update in the Software Update pane of System Preferences on macOS.

This release covers WebKit revisions 264601-265179.

Web Inspector

  • Changed the default tab order to display most commonly used tabs first (r264959)
  • Changed the background, text, and border colors to match the OS (r265120)
  • Changed to only show scrollbars when needed (r265118)
  • Fixed issue where a failed initial subresource load would break the Sources Tab (r264717)
  • Fixed the ability to save files that are base64 encoded (r264669)
  • Prevented blurring the add class input when a class is added in the Styles sidebar of the Elements tab (r264667)

Extensions

  • Fixed pop-up dialog sizing for percentage height values applied to <html> (r264960)
  • Added support for replacing a Safari App Extension with a Safari Web Extension by specifying the SFSafariAppExtensionBundleIdentifiersToReplace key in the NSExtension element in your Safari Web Extension Info.plist file. The value for the key should be an array of strings, each of which is the bundle identifier on a Safari App Extension you want to replace.

CSS

  • Fixed align-content in grid containers with small content area (r265020)
  • Fixed the CSS clip-path being applied to the view-box coordinates (r264622)
  • Fixed scroll snap when using RTL layout (r264908)

JavaScript

  • Implemented Intl.DisplayNames (r264639)
  • Changed eval?.() to be an indirect eval (r264633)

SVG

  • Added support for SVG <a> element’s rel and relList attributes (r264789)

Media

  • Added behaviors for YouTube to offer HDR variants to devices which support HDR (r265167)
  • Adopted AVPlayer.videoRangeOverride (r264710)
  • Added HDR decode support in software-decoded VP9 (r265073)
  • Fixed becoming unresponsive after playing a video from a YouTube playlist in picture-in-picture mode (r264684)

WebRTC

  • Added OfflineAudioContext constructor (r264657)
  • Fixed scaleResolutionDownBy on RTCRtpSender (r265047)

Web API

  • Added support for the type attribute to PerformanceObserver (r265001)
  • Changed date and time input types to have a textfield appearance (r265157)
  • Changed to propagate the user gesture through Fetch API (r264853)
  • Fixed highlight color to update after being set it system preferences (r265072)
  • Fixed datalist dropdown scrollbar position to match the visible region (r264783)
  • Made mousemove event cancelable (r264658)

Text Manipulation

  • Changed text manipulation to not extract non-breaking spaces (r264947)
  • Fixed article headlines being split across multiple lines after translating (r264729)

Storage

  • Changed to allow IndexedDB in third-party frames (r264790)

August 18, 2020 06:20 PM

August 13, 2020

Javier Fernández: Improving CSS Custom Properties performance

Igalia WebKit

Chrome 84 reached the stable channel a few weeks ago, and there are already several great posts describing the many important additions, interesting new features, security fixes and improvements in privacy policies (([1], [2], [3], [4]) it contains. However, there is a change that I worked on in this release which might have passed unnoticed by most, but I think is very valuable: A change regarding CSS Custom Properties (variables) performance.

The design of CSS, in general, takes great care in considering how features are designed with respect to making it possible for them to perform well. However, implementations may not perform as well as they could, and it takes a considerable amount of time to understand how authors use the features and which cases are more relevant for them.

CSS Custom Properties are an interesting example to look at here: They are a wonderful feature that provides a lot of advantages for web authors. For a whole lot of cases, all of the implementations of CSS Custom Properties perform well enough that most people won’t notice. However, we at Igalia have been analyzing several use cases and looking at some reports around their performance in different implementations.

Let’s consider a fairly straightforward example in which an author sets a single property in a toggleable class in the body, and then uses that property several times deeper in the tree to change the foreground color of some text.


   .red { --prop: red; }
   .green { --prop: green; }



Only about 20% of those actually use this property, 5 elements deep into the tree, and only to change the foreground color.

To evaluate Chromium’s performance in a case like this we can define a new perf tests, using the perf tools the Chromium project has available for browser engineers. In this case, we want a huge tree so that we can evaluate better the impact of the different optimizations.


    .green { --prop: green; }
    .red { --prop: red; }


    

These are the results obtained runing the test in Chrome 83:

avg median
stdev

min

max

163.74 ms 163.79 ms 3.69 ms 158.59 ms 163.74 ms

I admit that it’s difficult to evaluate the results, especially considering the number of nodes of such a huge DOM tree. Lets compare the results of the same test on Firefox, using different number of nodes.

Nodes 50K 20K 10K 5K 1K 500
Chrome 83 163.74 ms 55.05 ms 25.12 ms 14.18 ms 2.74 ms 1.50 ms
FF 78 28.35 ms 12.05 ms 6.10 ms 3.50 ms 1.15 ms 0.55 ms
1/6 1/5 1/4 1/4 1/2 1/3

As I commented before, the data are more accurate when the DOM tree has a lot of nodes; in any case, the difference is quite clear and shows there is plenty room for improvement. WebKit based browsers have results more similar to Chromium as well.

Performance tests like the one above can be added to browsers for tracking improvements and regressions over time, so we’ve added (r763335) that to Chromium’s tree: We’d like to see it get faster over time, and definitely cannot afford regressions (see Chrome Performance Dashboard and the ChangeStyleCustomPropertyDeclaration test for details) .

So… What can we do?

In Chrome 83 and lower, whenever the custom property declaration changed, the new declaration would be inherited by the whole tree. This inheritance implied executing the whole CSS cascade and recalculating the styles of all the nodes in the entire tree, since with this approach, all nodes may be affected.

Chrome had already implemented an optimization on the CSS cascade implementation for regular CSS properties that don’t depend on any other to resolve their value. These subset of CSS properties are defined as Independent Properties in the Chromium codebase. The optimization mentioned before affects how the inheritance mechanism is implemented for these Independent properties. Whenever one of these properties changes, instead of recalculating the styles of the inherited properties, children can just copy the whole parent’s computed style. Blink’s style engine has a component known as Matched Properties Cache responsible of deciding when is possible to avoid the style resolution of an element and instead, performing an efficient copy of the matched computed style. I’ll get back to this concept in the last part of this post.

In the case of CSS Custom Properties, we could apply a similar approach as a good step. We can consider that the nodes with computed styles that don’t have references to custom properties declarations shouldn’t be affected by the new declaration, and we can implement the inheritance directly by copying the parent’s computed style. The patch with the optimization I’ve implemented in r765278 initially landed in Chrome 84.0.4137.0

Let’s look at the result of this one action in the Chrome Performance Dashboard:

That’s a really good improvement!

However, it’s also just a first step. It’s clear that Chrome still has a wide margin for improvement in this case, as well any WebKit based browser – Firefox is still, impressively, markedly faster as it’s been described in the bug report filed to track this issue. The following table shows the result of the different browsers together; even disabling the muti-thread capabilities of Firefox’s Stylo engine (STYLO_THREAD=1), FF is much faster than Chrome with the optimization applied.

Chrome 83 Chrome 84 FF 78 FF 78 th=1
avg
median
stdev
min
max
163.74 ms
163.79 ms
3.69 ms
158.59 ms
163.74 ms
117.37 ms
117.52 ms
1.98 ms
113.66 ms
120.87 ms
28.35 ms
28.50 ms
0.93 ms
26.00 ms
30.00 ms
38.25 ms
38.50 ms
1.86 ms
35.00 ms
41.00 ms

Before continue, I want get back to the Matched Properties Cache (MPC) concept, since it has an important role on these style optimizations. This cache is not a new concept in the Chrome’s engine; as a matter of fact, it’s also used in WebKit, since it was implemented long ago, before the fork that created the new blink engine. However, Google has been working a lot on this area in the last years and some of the most recent changes in the MPC have had an important impact on style resolution performance. As a result of this work, elements with independent and non-independent properties using CSS Variables might produce cache hits in the MPC. The results of the Performance Dashboard show a considerable improvement in the mentioned ChangeStyleCustomPropertyDeclaration test (avg: 108.06 ms)

Additionally, there are several other cases where the use of CSS Variables has a considerable impact on performance, compared with using regular CSS properties. Obviously, resolving CSS Variables has a cost, so it’s clear that we could apply additional optimizations that reduce the impact of the variable resolution, especially for handling specific style changes that might not affect to a substantial portion of the DOM tree. I’ve been experimenting with the MPC to explore the idea an independent CSS Custom Properties cache; nodes with variables referencing the same custom property will produce cache hits in the MPC, even though other properties don’t match. The preliminary approach I’ve been implementing consists on a new matching function, specific for custom properties, and a mechanism to transfer/copy the property’s data to avoid resolving the variable again, since the property’s declaration hasn’t change. We would need to apply the css cascade again, but at least we could save the cost of the variable resolution.

Of course, at the end of the day, improving performance has costs and challenges – and it’s hard to keep performance even once you get it. Bit if we really want performant CSS Custom Properties, this means that we have to decide to prioritize this work. Currently there is reluctance to explore the concept of a new Custom Properties specific cache – the challenge is big and the risks are not non-existent; cache invalidation can get complicated. But, the point is that we have to understand that we aren’t all going to agree what is important enough to warrant attention, or how much investment, or when. Web authors must convince vendors that these use cases are worth being optimized and that the cost and risks of such a complex challenges should be assumed by them.

This work has been sponsored by Bloomberg, which I consider one of the most important contributors of the Web Platform. After several years, the vision of this company and its responsibility as consumer of the platform has lead to many and important contributions that we all enjoy now. Although CSS Grid Layout might be the most remarkable one, there are may other not that big, like this work on CSS Custom Properties, or several other new features of the CSS Text specification. This is a perfect example of an company that tries to change priorities and adapt the web platform to its needs and the use cases they consider more aligned with their business strategy.

I understand that not every user of the web platform can do this kind of investment. This is why I believe that initiatives like Open Priorization could help to move the web platform in a positive direction. By providing a way for us to move past a lot of these conversation and focus on the needs that some web authors and users of the platform consider more important, or higher priority. Improving performance for CSS Custom Properties isn’t currently one of the projects we’ve listed, but perhaps it would be an interesting one we might try in the future if we are successful with these. If you haven’t already, have a look and see if there is something there that is interesting to you or your company – pledges of any size are good – ten thousand $1 donations are every bit as good as ten $1000 donations. Together, we can make a difference, and we all benefit.

Also, we would love to hear about your ideas. Is improving CSS Custom Properties performance important to you? What else is? Share your comments with us on Twitter, either me (@lajava77) or our developer advocate Brian Kardell (@briankardell), or email me at jfernandez@igalia.com. I’d be glad to answer any question about the Open Priorization experiment.

By jfernandez at August 13, 2020 06:16 PM

July 29, 2020

Release Notes for Safari Technology Preview 111

Surfin’ Safari

Safari Technology Preview Release 111 is now available for download for macOS Big Sur and macOS Catalina. If you already have Safari Technology Preview installed, you can update in the Software Update pane of System Preferences on macOS.

This release covers WebKit revisions 263988-264601.

Web Inspector

  • Added an error message if unable to fetch shader source in the Canvas tab (r264045)
  • Fixed Heap Snapshot Object Graph view not getting populated in some cases when inspecting a JSContext (r264124)
  • Updated the tab bar colors of undocked Web Inspector to match Safari in macOS Big Sur (r264410)
  • Updated the title bar of undocked Web Inspector to be white in macOS Big Sur (r264204)

Web Extensions

  • Fixed chrome.tabs.update() so it does not open a new tab for safari-web-extension URLs
  • Fixed chrome.tabs.create() so it passes a valid tab object to the callback for relative extension URLs

Scrolling

  • Fixed content changes not triggering re-snapping with scroll snap after a scroll gesture (r264190)
  • Fixed scrolling pages with non-invertable transforms in children of an overflow: scroll element (r264031)
  • Fixed stuttery scrolling by ensuring a layout-triggered scroll snap does not happen if a user scroll is in progress on the scrolling thread (r264203)

Rendering

  • Fixed high CPU usage on Bitbucket search results pages (r264008)

CSS

  • Fixed line name positions after implicit grid track (r264465)

JavaScript

  • Made String.protoytpe.toLocaleLowerCase'savailableLocales` HashSet more efficient (r264293)
  • Changed Intl.Locale maximize, minimize to return Intl.Locale instead of String (r264275)
  • Fixed Math.max() yielding the wrong result for max(0, -0) (r264507)
  • Fixed redefining a property that should not change its insertion index (Object.keys order) (r264574)

Web Authentication

  • Added a console message to indicate a user gesture is required to use the platform authenticator (r264490)
  • Relaxed the user gesture requirement to allow it to be propagated through XHR events (r264528)

WebRTC

  • Fixed the ability to pause to pause playback of MediaStream video track (r264312)
  • Added support for parsing VP-style codec strings. (r264367)

Web API

  • Changed URL.host to not override the port (r264516)
  • Fixed autocapitalize="words" capitalizing every word’s second character (r264112)
  • Multiplexed the HID and GameController gamepad providers on macOS (r264207)
  • Removed the concept of “initial connected gamepads” (r264004)

Storage Access API

  • Added the capability to open a popup and get user interaction so we can call the Storage Access API as a quirk, on behalf of websites that should be doing it themselves (r263992)

Intelligent Tracking Prevention

  • Added an artificial delay to WebSocket connections to mitigate port scanning attacks (r264306)

Accessibility

  • Implemented user action specifications for Escape action (r264000)

Text Manipulation

  • Fixed text manipulation to observe manipulated text after update (r264305)
  • Fixed text manipulation to ignore white spaces between nodes (r264120)
  • Fixed the caret leaving trails behind when the editable content is subpixel positioned (r264386)

July 29, 2020 05:32 PM

Speculation in JavaScriptCore

Surfin’ Safari

This post is all about speculative compilation, or just speculation for short, in the context of the JavaScriptCore virtual machine. Speculative compilation is ideal for making dynamic languages, or any language with enough dynamic features, run faster. In this post, we will look at speculation for JavaScript. Historically, this technique or closely related variants has been applied successfully to Smalltalk, Self, Java, .NET, Python, and Ruby, among others. Starting in the 90’s, intense benchmark-driven competition between many Java implementations helped to create an understanding of how to build speculative compilers for languages with small amounts of dynamism. Despite being a lot more dynamic than Java, the JavaScript performance war that started in the naughts has generally favored increasingly aggressive applications of the same speculative compilation tricks that worked great for Java. It seems like speculation can be applied to any language implementation that uses runtime checks that are hard to reason about statically.

This is a long post that tries to demystify a complex topic. It’s based on a two hour compiler lecture (slides also available in PDF). We assume some familiarity with compiler concepts like intermediate representations (especially Static Single Assignment Form, or SSA for short), static analysis, and code generation. The intended audience is anyone wanting to understand JavaScriptCore better, or anyone thinking about using these techniques to speed up their own language implementation. Most of the concepts described in this post are not specific to JavaScript and this post doesn’t assume prior knowledge about JavaScriptCore.

Before going into the details of speculation, we’ll provide an overview of speculation and an overview of JavaScriptCore. This will help provide context for the main part of this post, which describes speculation by breaking it down into five parts: bytecode (the common IR), control, profiling, compilation, and OSR (on stack replacement). We conclude with a small review of related work.

Overview of Speculation

The intuition behind speculation is to leverage traditional compiler technology to make dynamic languages as fast as possible. Construction of high-performance compilers is a well-understood art, so we want to reuse as much of that as we can. But we cannot do this directly for a language like JavaScript because the lack of type information means that the compiler can’t do meaningful optimizations for any of the fundamental operations (even things like + or ==). Speculative compilers use profiling to infer types dynamically. The generated code uses dynamic type checks to validate the profiled types. If the program uses a type that is different from what we profiled, we throw out the optimized code and try again. This lets the optimizing compiler work with a statically typed representation of the dynamically typed program.

Types are a major theme of this post even though the techniques we are describing are for implementing dynamically typed languages. When languages include static types, it can be to provide safety properties for the programmer or to help give an optimizing compiler leverage. We are only interested in types for performance and the speculation strategy in JavaScriptCore can be thought of in broad strokes as inferring the kinds of types that a C program would have, but using an internal type system purpose built for our optimizing compiler. More generally, the techniques described in this post can be used to enable any kind of profile-guided optimizations, including ones that aren’t related to types. But both this post and JavaScriptCore focus on the kind of profiling and speculation that is most natural to think if as being about type (whether a variable is an integer, what object shapes a pointer points to, whether an operation has effects, etc).

To dive into this a bit deeper, we first consider the impact of types. Then we look at how speculation gives us types.

Impact of Types

We want to give dynamically typed languages the kind of optimizing compiler pipeline that would usually be found in ahead-of-time compilers for high-performance statically typed languages like C. The input to such an optimizer is typically some kind of internal representation (IR) that is precise about the type of each operation, or at least a representation from which the type of each operation can be inferred.

To understand the impact of types and how speculative compilers deal with them, consider this C function:

int foo(int a, int b)
{
    return a + b;
}

In C, types like int are used to describe variables, arguments, return values, etc. Before the optimizing compiler has a chance to take a crack at the above function, a type checker fills in the blanks so that the + operation will be represented using an IR instruction that knows that it is adding 32-bit signed integers (i.e. ints). This knowledge is essential:

  • Type information tells the compiler’s code generator how to emit code for this instruction. We know to use integer addition instructions (not double addition or something else) because of the int type.
  • Type information tells the optimizer how to allocate registers for the inputs and outputs. Integers mean using general purpose registers. Floating point means using floating point registers.
  • Type information tells the optimizer what optimizations are possible for this instruction. Knowing exactly what it does allows us to know what other operations can be used in place of it, allows us to do some algebraic reasoning about the math the program is doing, and allows us to fold the instruction to a constant if the inputs are constants. If there are types for which + has effects (like in C++), then the fact that this is an integer + means that it’s pure. Lots of compiler optimizations that work for + would not work if it wasn’t pure.

Now consider the same program in JavaScript:

function foo(a, b)
{
    return a + b;
}

We no longer have the luxury of types. The program doesn’t tell us the types of a or b. There is no way that a type checker can label the + operation as being anything specific. It can do a bunch of different things based on the runtime types of a and b:

  • It might be a 32-bit integer addition.
  • It might be a double addition.
  • It might be a string concatenation.
  • It might be a loop with method calls. Those methods can be user-defined and may perform arbitrary effects. This’ll happen if a or b are objects.
Figure 1. The best that a nonspeculative compiler can do if given a JavaScript plus operation. This figure depicts a control flow graph as a compiler like JavaScriptCore’s DFG might see. The Branch operation is like an if and has outgoing edges for the then/else outcomes of the condition.

Based on this, it’s not possible for an optimizer to know what to do. Instruction selection means emitting either a function call for the whole thing or an expensive control flow subgraph to handle all of the various cases (Figure 1). We won’t know which register file is best for the inputs or results; we’re likely to go with general purpose registers and then do additional move instructions to get the data into floating point registers in case we have to do a double addition. It’s not possible to know if one addition produces the same results as another, since they have loops with effectful method calls. Anytime a + happens we have to allow for the the possibility that the whole heap might have been mutated.

In short, it’s not practical to use optimizing compilers for JavaScript unless we can somehow provide types for all of the values and operations. For those types to be useful, they need to help us avoid basic operations like + seeming like they require control flow or effects. They also need to help us understand which instructions or register files to use. Speculative compilers get speed-ups by applying this kind of reasoning to all of the dynamic operations in a language — ranging from those represented as fundamental operations (like + or memory accesses like o.f and o[i]) to those that involve intrinsics or recognizable code patterns (like calling Function.prototype.apply).

Speculated Types

This post focuses on those speculations where the collected information can be most naturally understood as type information, like whether or not a variable is an integer and what properties a pointed-to object has (and in what order). Let’s appreciate two aspects of this more deeply: when and how the profiling and optimization happen and what it means to speculate on type.

Figure 2. Optimizing compilers for C and JavaScript.

Let’s consider what we mean by speculative compilation for JavaScript. JavaScript implementations pretend to be interpreters; they accept JS source as input. But internally, these implementations use a combination of interpreters and compilers. Initially, code starts out running in an execution engine that does no speculative type-based optimizations but collects profiling about types. This is usually an interpreter, but not always. Once a function has a satisfactory amount of profiling, the engine will start an optimizing compiler for that function. The optimizing compiler is based on the same fundamentals as the one found in a C compiler, but instead of accepting types from a type checker and running as a command-line tool, here it accepts types from a profiler and runs in a thread in the same process as the program it’s compiling. Once that compiler finishes emitting optimized machine code, we switch execution of that function from the profiling tier to the optimized tier. Running JavaScript code has no way of observing this happening to itself except if it measures execution time. (However, the environment we use for testing JavaScriptCore includes many hooks for introspecting what has been compiled.) Figure 2 illustrates how and when profiling and optimization happens when running JavaScript.

Roughly, speculative compilation means that our example function will be transformed to look something like this:

function foo(a, b)
{
    speculate(isInt32(a));
    speculate(isInt32(b));
    return a + b;
}

The tricky thing is what exactly it means to speculate. One simple option is what we call diamond speculation. This means that every time that we perform an operation, we have a fast path specialized for what the profiler told us and a slow path to handle the generic case:

if (is int)
    int add
else
    Call(slow path)

To see how that plays out, let’s consider a slightly different example:

var tmp1 = x + 42;
... // things
var tmp2 = x + 100;

Here, we use x twice, both times adding it to a known integer. Let’s say that the profiler tells us that x is an integer but that we have no way of proving this statically. Let’s also say that x‘s value does not change between the two uses and we have proved that statically.

Figure 3. Diamond speculation that x is an integer.

Figure 3 shows what happens if we speculate on the fact that x is an integer using a diamond speculation: we get a fast path that does the integer addition and a slow path that bails out to a helper function. Speculations like this can produce modest speed-ups at modest cost. The cost is modest because if the speculation is wrong, only the operations on x pay the price. The trouble with this approach is that repeated uses of x must recheck whether it is an integer. The rechecking is necessary because of the control flow merge that happens at the things block and again at more things.

The original solution to this problem was splitting, where the region of the program between things and more things would get duplicated to avoid the branch. An extreme version of this is tracing, where the entire remainder of a function is duplicated after any branch. The trouble with these techniques is that duplicating code is expensive. We want to minimize the number of times that the same piece of code is compiled so that we can compile a lot of code quickly. The closest thing to splitting that JavaScriptCore does is tail duplication, which optimizes diamond speculations by duplicating the code between them if that code is tiny.

A better alternative to diamond speculations or splitting is OSR (on stack replacement). When using OSR, a failing type check exits out of the optimized function back to the equivalent point in the unoptimized code (i.e. the profiling tier’s version of the function).

Figure 4. OSR speculation that x is an integer.

Figure 4 shows what happens when we speculate that x is an integer using OSR. Because there is no control flow merge between the case where x is an int and the case where it isn’t, the second check becomes redundant and can be eliminated. The lack of a merge means that the only way to reach the second check is if the first check passed.

OSR speculations are what gives our traditional optimizing compiler its static types. After any OSR-based type check, the compiler can assume that the property that was checked is now fact. Moreover, because OSR check failure does not affect semantics (we exit to the same point in the same code, just with fewer optimizations), we can hoist those checks as high as we want and infer that a variable always has some type simply by guarding all assignments to it with the corresponding type check.

Note that what we call OSR exit in this post and in JavaScriptCore is usually called deoptimization elsewhere. We prefer to use the term OSR exit in our codebase because it emphasizes that the point is to exit an optimized function using an exotic technique (OSR). The term deoptimization makes it seem like we are undoing optimization, which is only true in the narrow sense that a particular execution jumps from optimized code to unoptimized code. For this post we will follow the JavaScriptCore jargon.

JavaScriptCore uses OSR or diamond speculations depending on our confidence that the speculation will be right. OSR speculation has higher benefit and higher cost: the benefit is higher because repeated checks can be eliminated but the cost is also higher because OSR is more expensive than calling a helper function. However, the cost is only paid if the exit actually happens. The benefits of OSR speculation are so superior that we focus on that as our main speculation strategy, with diamond speculation being the fallback if our profiling indicates lack of confidence in the speculation.

Figure 5. Speculating with OSR and exiting to bytecode.

OSR-based speculation relies on the fact that traditional compilers are already good at reasoning about side exits. Trapping instructions (like for null check optimization in Java virtual machines), exceptions, and multiple return statements are all examples of how compilers already support exiting from a function.

Assuming that we use bytecode as the common language shared between the unoptimizing profiled tier of execution and the optimizing tier, the exit destinations can just be bytecode instruction boundaries. Figure 5 shows how this might work. The machine code generated by the optimizing compiler contains speculation checks against unlikely conditions. The idea is to do lots of speculations. For example, the prologue (the enter instruction in the figure) may speculate about the types of the arguments — that’s one speculation per argument. An add instruction may speculate about the types of its inputs and about the result not overflowing. Our type profiling may tell us that some variable tends to always have some type, so a mov instruction whose source is not proved to have that type may speculate that the value has that type at runtime. Accessing an array element (what we call get_by_val) may speculate that the array is really an array, that the index is an integer, that the index is in bounds, and that the value at the index is not a hole (in JavaScript, loading from a never assigned array element means walking the array’s prototype chain to see if the element can be found there — something we avoid doing most of the time by speculating that we don’t have to). Calling a function may speculate that the callee is the one we expected or at least that it has the appropriate type (that it’s something we can call).

While exiting out of a function is straightforward without breaking fundamental assumptions in optimizing compilers, entering turns out to be super hard. Entering into a function somewhere other than at its primary entrypoint pessimises optimizations at any merge points between entrypoints. If we allowed entering at every bytecode instruction boundary, this would negate the benefits of OSR exit by forcing every instruction boundary to make worst-case assumptions about type. Even allowing OSR entry just at loop headers would break lots of loop optimizations. This means that it’s generally not possible to reenter optimized execution after exiting. We only support entry in cases where the reward is high, like when our profiler tells us that a loop has not yet terminated at the time of compilation. Put simply, the fact that traditional compilers are designed for single-entry multiple-exit procedures means that OSR entry is hard but OSR exit is easy.

JavaScriptCore and most speculative compilers support OSR entry at hot loops, but since it’s not an essential feature for most applications, we’ll leave understanding how we do it as an exercise for the reader.

Figure 6. Speculation broken into the five topics of this post.

The main part of this post describes speculation in terms of its five components (Figure 6): the bytecode, or common IR, of the virtual machine that allows for a shared understanding about the meaning of profiling and exit sites between the unoptimized profiling tier and the optimizing tier; the unoptimized profiling tier that is used to execute functions at start-up, collect profiling about them, and to serve as an exit destination; the control system for deciding when to invoke the optimizing compiler; the optimizing tier that combines a traditional optimizing compiler with enhancements to support speculation based on profiling; and the OSR exit technology that allows the optimizing compiler to use the profiling tier as an exit destination when speculation checks fail.

Overview of JavaScriptCore

Figure 7. The tiers of JavaScriptCore.

JavaScriptCore embraces the idea of tiering and has four tiers for JavaScript (and three tiers for WebAssembly, but that’s outside the scope of this post). Tiering has two benefits: the primary benefit, described in the previous section, of enabling speculation; and a secondary benefit of allowing us to fine-tune the throughput-latency tradeoff on a per-function basis. Some functions run for so short — like straight-line run-once initialization code — that running any compiler on those functions would be more expensive than interpreting them. Some functions get invoked so frequently, or have such long loops, that their total execution time far exceeds the time to compile them with an aggressive optimizing compiler. But there are also lots of functions in the grey area in between: they run for not enough time to make an aggressive compiler profitable, but long enough that some intermediate compiler designs can provide speed-ups. JavaScriptCore has four tiers as shown in Figure 7:

  • The LLInt, or low-level interpreter, which is an interpreter that obeys JIT compiler ABI. It runs on the same stack as the JITs and uses a known set of registers and stack locations for its internal state.
  • The Baseline JIT, also known as a bytecode template JIT, which emits a template of machine code for each bytecode instruction without trying to reason about relationships between multiple instructions in the function. It compiles whole functions, which makes it a method JIT. Baseline does no OSR speculations but does have a handful of diamond speculations based on profiling from the LLInt.
  • The DFG JIT, or data flow graph JIT, which does OSR speculation based on profiling from the LLInt, Baseline, and in some rare cases even using profiling data collected by the DFG JIT and FTL JIT. It may OSR exit to either baseline or LLInt. The DFG has a compiler IR called DFG IR, which allows for sophisticated reasoning about speculation. The DFG avoids doing expensive optimizations and makes many compromises to enable fast code generation.
  • The FTL JIT, or faster than light JIT, which does comprehensive compiler optimizations. It’s designed for peak throughput. The FTL never compromises on throughput to improve compile times. This JIT reuses most of the DFG JIT’s optimizations and adds lots more. The FTL JIT uses multiple IRs (DFG IR, DFG SSA IR, B3 IR, and Assembly IR).

An ideal example of this in action is this program:

"use strict";

let result = 0;
for (let i = 0; i < 10000000; ++i) {
    let o = {f: i};
    result += o.f;
}

print(result);

Thanks to the object allocation inside the loop, it will run for a long time until the FTL JIT can compile it. The FTL JIT will kill that allocation, so then the loop finishes quickly. The long running time before optimization virtually guarantees that the FTL JIT will take a stab at this program’s global function. Additionally, because the function is clean and simple, all of our speculations are right and there are no OSR exits.

Figure 8. Example timeline of a simple long loop executing in JavaScriptCore. Execution times recorded on my computer one day.

Figure 8 shows the timeline of this benchmark executing in JavaScriptCore. The program starts executing in the LLInt. After about a thousand loop iterations, the loop trigger causes us to start a baseline compiler thread for this code. Once that finishes, we do an OSR entry into the baseline JITed code at the for loop’s header. The baseline JIT also counts loop iterations, and after about a thousand more, we spawn the DFG compiler. The process repeats until we are in the FTL. When I measured this, I found that the DFG compiler needs about 4× the time of the baseline compiler, and the FTL needs about 6× the time of the DFG. While this example is contrived and ideal, the basic idea holds for any JavaScript program that runs long enough since all tiers of JavaScriptCore support the full JavaScript language.

Figure 9. JavaScriptCore tier architecture.

JavaScriptCore is architected so that having many tiers is practical. Figure 9 illustrates this architecture. All tiers share the same bytecode as input. That bytecode is generated by a compiler pipeline that desugars many language features, such as generators and classes, among others. In many cases, it’s possible to add new language features just by modifying the bytecode generation frontend. Once linked, the bytecode can be understood by any of the tiers. The bytecode can be interpreted by the LLInt directly or compiled with the baseline JIT, which mostly just converts each bytecode instruction into a preset template of machine code. The LLInt and Baseline JIT share a lot of code, mostly in the slow paths of bytecode instruction execution. The DFG JIT converts bytecode to its own IR, the DFG IR, and optimizes it before emitting code. In many cases, operations that the DFG chooses not to speculate on are emitted using the same code generation helpers as the Baseline JIT. Even operations that the DFG does speculate on often share slow paths with the Baseline JIT. The FTL JIT reuses the DFG’s compiler pipeline and adds new optimizations to it, including multiple new IRs that have their own optimization pipelines. Despite being more sophisticated than the DFG or Baseline, the FTL JIT shares slow path implementations with those JITs and in some cases even shares code generation for operations that we choose not to speculate on. Even though the various tiers try to share code whenever possible, they aren’t required to. Take the get_by_val (access an array element) instruction in bytecode. This has duplicate definitions in the bytecode liveness analysis (which knows the liveness rules for get_by_val), the LLInt (which has a very large implementation that switches on a bunch of the common array types and has good code for all of them), the Baseline (which uses a polymorphic inline cache), and the DFG bytecode parser. The DFG bytecode parser converts get_by_val to the DFG IR GetByVal operation, which has separate definitions in the DFG and FTL backends as well as in a bunch of phases that know how to optimize and model GetByVal. The only thing that keeps those implementations in agreement is good convention and extensive testing.

To give a feeling for the relative throughput of the various tiers, I’ll share some informal performance data that I’ve gathered over the years out of curiosity.

Figure 10. Relative performance of the four tiers on JetStream 2 on my computer at the time of that benchmark’s introduction.

We’re going to use the JetStream 2 benchmark suite since that’s the main suite that JavaScriptCore is tuned for. Let’s first consider an experiment where we run JetStream 2 with the tiers progressively enabled starting with the LLInt. Figure 10 shows the results: the Baseline and DFG are more than 2× better than the tier below them and the FTL is 1.1× better than the DFG.

The FTL’s benefits may be modest but they are unique. If we did not have the FTL, we would have no way of achieving the same peak throughput. A great example is the gaussian-blur subtest. This is the kind of compute test that the FTL is built for. I managed to measure the benchmark’s performance when we first introduced it and did not yet have a chance to tune for it. So, this gives a glimpse of the speed-ups that we expect to see from our tiers for code that hasn’t yet been through the benchmark tuning grind. Figure 11 shows the results. All of the JITs achieve spectacular speed-ups: Baseline is 3× faster than LLInt, DFG is 6× faster than Baseline, and FTL is 1.6× faster than DFG.

Figure 11. Relative performance of the four tiers on the guassian-blur subtest of JetStream 2.

The DFG and FTL complement one another. The DFG is designed to be a fast-running compiler and it achieves this by excluding the most aggressive optimizations, like global register allocation, escape analysis, loop optimizations, or anything that needs SSA. This means that the DFG will always get crushed on peak throughput by compilers that have those features. It’s the FTL’s job to provide those optimizations if a function runs long enough to warrant it. This ensures that there is no scenario where a hypothetical competing implementation could outperform us unless they had the same number of tiers. If you wanted to make a compiler that compiles faster than the FTL then you’d lose on peak throughput, but if you wanted to make a compiler that generates better code than the DFG then you’d get crushed on start-up times. You need both to stay in the game.

Another way of looking at the performance of these tiers is to ask: how much time does a bytecode instruction take to execute in each of the tiers on average? This tells us just about the throughput that a tier achieves without considering start-up at all. This can be hard to estimate, but I made an attempt at it by repeatedly running each JetStream 2 benchmark and having it limit the maximum tier of each function at random. Then I employed a stochastic counting mechanism to get an estimate of the number of bytecode instructions executed at each tier in each run. Combined with the execution times of those runs, this gave a simple linear regression problem of the form:

ExecutionTime = (Latency of LLInt) * (Bytecodes in LLInt)
              + (Latency of Baseline) * (Bytecodes in Baseline)
              + (Latency of DFG) * (Bytecodes in DFG)
              + (Latency of FTL) * (Bytecodes in FTL)

Where the Latency of LLInt means the average amount of time it takes to execute a bytecode instruction in LLInt.

After excluding benchmarks that spent most of their time outside JavaScript execution (like regexp and wasm benchmarks) and fiddling with how to weight benchmarks (I settled on solving each benchmarks separately and computing geomean of the coefficients since this matches JetStream 2 weighting), the solution I arrived at was:

Execution Time = (3.97 ns) * (Bytecodes in LLInt)
               + (1.71 ns) * (Bytecodes in Baseline)
               + (.349 ns) * (Bytecodes in DFG)
               + (.225 ns) * (Bytecodes in FTL)

In other words, Baseline executes code about 2× faster than LLInt, DFG executes code about 5× faster than Baseline, and the FTL executes code about 1.5× faster than DFG. Note how this data is in the same ballpark as what we saw for gaussian-blur. That makes sense since that was a peak throughput benchmark.

Although this isn’t a garbage collection blog post, it’s worth understanding a bit about how the garbage collector works. JavaScriptCore picks a garbage collection strategy that makes the rest of the virtual machine, including all of the support for speculation, easier to implement. The garbage collector has the following features that make speculation easier:

  • The collector scans the stack conservatively. This means that compilers don’t have to worry about how to report pointers to the collector.
  • The collector doesn’t move objects. This means that if a data structure (like the compiler IR) has many possible ways of referencing some object, we only have to report one of them to the collector.
  • The collector runs to fixpoint. This makes it possible to invent precise rules for whether objects created by speculation should be kept alive.
  • The collector’s object model is expressed in C++. JavaScript objects look like C++ objects, and JS object pointers look like C++ pointers.

These features make the compiler and runtime easier to write, which is great, since speculation requires us to write a lot of compiler and runtime code. JavaScript is a slow enough language even with the optimizations we describe in this post that garbage collector performance is rarely the longest pole in the tent. Therefore, our garbage collector makes many tradeoffs to make it easier to work on the performance-critical parts of our engine (like speculation). It would be unwise, for example, to make it harder to implement some compiler optimization as a way of getting a small garbage collector optimization, since the compiler has a bigger impact on performance for typical JavaScript programs.

To summarize: JavaScriptCore has four tiers, two of which do speculative optimizations, and all of which participate in the collection of profiling. The first two tiers are an interpreter and bytecode template JIT while the last two are optimizing compilers tuned for different throughput-latency trade-offs.

Speculative Compilation

Now that we’ve established some basic background about speculation and JavaScriptCore, this section goes into the details. First we will discuss JavaScriptCore’s bytecode. Then we show the control system for launching the optimizing compiler. Next will be a detailed section about how JavaScriptCore’s profiling tiers work, which focuses mostly on how they collect profiling. Finally we discuss JavaScriptCore’s optimizing compilers and their approach to OSR.

Bytecode

Speculation requires having a profiling tier and an optimizing tier. When the profiling tier reports profiling, it needs to be able to say what part of the code that profiling is for. When the optimizing compiler wishes to compile an OSR exit, it needs to be able to identify the exit site in a way that both tiers understand. To solve both issues, we need a common IR that is:

  • Used by all tiers as input.
  • Persistent for as long as the function that it represents is still live.
  • Immutable (at least for those parts that all tiers are interested in).

In this post, we will use bytecode as the common IR. This isn’t required; abstract syntax trees or even SSA could work as a common IR. We offer some insights into how we designed our bytecode for JavaScriptCore. JavaScriptCore’s bytecode is register-based, compact, untyped, high-level, directly interpretable, and transformable.

Our bytecode is register-based in the sense that operations tend to be written as:

add result, left, right

Which is taken to mean:

result = left + right

Where result, left, and right are virtual registers. Virtual registers may refer to locals, arguments, or constants in the constant pool. Functions declare how many locals they need. Locals are used both for named variables (like var, let, or const variables) and temporaries arising from expression tree evaluation.

Our bytecode is compact: each opcode and operand is usually encoded as one byte. We have wide prefixes to allow 16-bit or 32-bit operands. This is important since JavaScript programs can be large and the bytecode must persist for as long as the function it represents is still live.

Our bytecode is untyped. Virtual registers never have static type. Opcodes generally don’t have static type except for the few opcodes that have a meaningful type guarantee on their output (for example, the | operator always returns int32, so our bitor opcode returns int32). This is important since the bytecode is meant to be a common source of truth for all tiers. The profiling tier runs before we have done type inference, so the bytecode can’t have any more types than the JavaScript language.

Our bytecode is almost as high-level as JavaScript. While we use desugaring for many JavaScript features, we only do that when implementation by desugaring isn’t thought to cost performance. So, even the “fundamental” features of our bytecode are high level. For example, the add opcode has all of the power of the JavaScript + operator, including that it might mean a loop with effects.

Our bytecode is directly interpretable. The same bytecode stream that the interpreter executes is the bytecode stream that we will save in the cache (to skip parsing later) and feed to the compiler tiers.

Finally, our bytecode is transformable. Normally, intermediate representations use a control flow graph and make it easy to insert and remove instructions. That’s not how bytecode works: it’s an array of instructions encoded using a nontrivial variable-width encoding. But we do have a bytecode editing API and we use it for generatorification (our generator desugaring bytecode-to-bytecode pass). We can imagine this facility also being useful for other desugarings or for experimenting with bytecode instrumentation.

Compared to non-bytecode IRs, the main advantages of bytecode are that it’s easy to:

  • Identify targets for OSR exit. OSR exit in JavaScriptCore requires entering into an unoptimized bytecode execution engine (like an interpreter) at some arbitrary bytecode instruction. Using bytecode instruction index as a way of naming an exit target is intuitive since it’s just an integer.
  • Compute live state at exit. Register-based bytecode tends to have dense register numberings so it’s straightforward to analyze liveness using bitvectors. That tends to be fast and doesn’t require a lot of memory. It’s practical to cache the results of bytecode liveness analysis, for example.

JavaScriptCore’s bytecode format is independently implemented by the execution tiers. For example, the baseline JIT doesn’t try to use the LLInt to create its machine code templates; it just emits those templates itself and doesn’t try to match the LLInt exactly (the behavior is identical but the implementation isn’t). The tiers do share a lot of code – particularly for inline caches and slow paths – but they aren’t required to. It’s common for bytecode instructions to have algorithmically different implementations in the four tiers. For example the LLInt might implement some instruction with a large switch that handles all possible types, the Baseline might implement the same instruction with an inline cache that repatches based on type, and the DFG and FTL might try to do some combination of inline speculations, inline caches, and emitting a switch on all types. This exact scenario happens for add and other arithmetic ops as well as get_by_val/put_by_val. Allowing this independence allows each tier to take advantage of its unique properties to make things run faster. Of course, this approach also means that adding new bytecodes or changing bytecode semantics requires changing all of the tiers. For that reason, we try to implement new language features by desugaring them to existing bytecode constructs.

It’s possible to use any sensible IR as the common IR for a speculative compiler, including abstract syntax trees or SSA, but JavaScriptCore uses bytecode so that’s what we’ll talk about in the rest of this post.

Control

Speculative compilation needs a control system to decide when to run the optimizing compiler. The control system has to balance competing concerns: compiling functions as soon as it’s profitable, avoiding compiling functions that aren’t going to run long enough to benefit from it, avoiding compiling functions that have inadequate type profiling, and recompiling functions if a prior compilation did speculations that turned out to be wrong. This section describes JavaScriptCore’s control system. Most of the heuristics we describe were necessary, in our experience, to make speculative compilation profitable. Otherwise the optimizing compiler would kick in too often, not often enough, or not at the right rate for the right functions. This section describes the full details of JavaScriptCore’s tier-up heuristics because we suspect that to reproduce our performance, one would need all of these heuristics.

JavaScriptCore counts executions of functions and loops to decide when to compile. Once a function is compiled, we count exits to decide when to throw away compiled functions. Finally, we count recompilations to decide how much to back off from recompiling a function in the future.

Execution Counting

JavaScriptCore maintains an execution counter for each function. This counter gets incremented as follows:

  • Each call to the function adds 15 points to the execution counter.
  • Each loop execution adds 1 point to the execution counter.

We trigger tier-up once the counter reaches some threshold. Thresholds are determined dynamically. To understand our thresholds, first consider their static versions and then let’s look at how we modulate these thresholds based on other information.

  • LLInt→Baseline tier-up requires 500 points.
  • Baseline→DFG tier-up requires 1000 points.
  • DFG→FTL tier-up requires 100000 points.

Over the years we’ve found ways to dynamically adjust these thresholds based on other sources of information, like:

  • Whether the function got JITed the last time we encountered it (according to our cache). Let’s call this wasJITed.
  • How big the function is. Let’s call this S. We use the number of bytecode opcodes plus operands as the size.
  • How many times it has been recompiled. Let’s call this R.
  • How much executable memory is available. Let’s use M to say how much executable memory we have total, and U is the amount we estimate that we would use (total) if we compiled this function.
  • Whether profiling is “full” enough.

We select the LLInt→Baseline threshold based on wasJITed. If we don’t know (the function wasn’t in the cache) then we use the basic threshold, 500. Otherwise, if the function wasJITed then we use 250 (to accelerate tier-up) otherwise we use 2000. This optimization is especially useful for improving page load times.

Baseline→DFG and DFG→FTL use the same scaling factor based on S, R, M, and U. The scaling factor is defined as follows:

(0.825914 + 0.061504 * sqrt(S + 1.02406)) * pow(2, R) * M / (M - U)

We multiply this by 1000 for Baseline→DFG and by 100000 for DFG→FTL. Let’s break down what this scaling factor does:

First we scale by the square root of the size. The expression 0.825914 + 0.061504 * sqrt(S + 1.02406) gives a scaling factor that is between 1 and 2 for functions smaller than about 350 bytecodes, which we consider to be “easy” functions to compile. The scaling factor uses square root so it grows somewhat gently. We’ve also tried having the staling factor be linear, but that’s much worse. It is worth it to delay compilations of large functions a bit, but it’s not worth it to delay it too much. Note that the ideal delay doesn’t just have to do with the cost of compilation. It’s also about running long enough to get good profiling. Maybe there is some deep reason why square root works well here, but all we really care about is that scaling by this amount makes programs run faster.

Then we introduce exponential backoff based on the number of times that the function has been recompiled. The pow(2, R) expression means that each recompilation doubles the thresholds.

After that we introduce a hyperbolic scaling factor, M / (M - U), to help avoid cases where we run out of executable memory altogether. This is important since some configurations of JavaScriptCore run with a small available pool of executable memory. This expression means that if we use half of executable memory then the thresholds are doubled. If we use 3/4 of executable memory then the thresholds are quadrupled. This makes filling up executable memory a bit like going at the speed of light: the math makes it so that as you get closer to filling it up the thresholds get closer to infinity. However, it’s worth noting that this is imperfect for truly large programs, since those might have other reasons to allocate executable memory not covered by this heuristic. The heuristic is also imperfect in cases of multiple things being compiled in parallel. Using this factor increases the maximum program size we can handle with small pools of executable memory, but it’s not a silver bullet.

Finally, if the execution count does reach this dynamically computed threshold, we check that some kinds of profiling (specifically, value and array profiling, discussed in detail in the upcoming profiling section) are full enough. We say that profiling is full enough if more than 3/4 of the profiling sites in the function have data. If this threshold is not met, we reset the execution counters. We let this process repeat five times. The optimizing compilers tend to speculate that unprofiled code is unreachable. This is profitable if that code really won’t ever run, but we want to be extra sure before doing that, hence we give functions with partial profiling 5× the time to warm up.

This is an exciting combination of heuristics! These heuristics were added early in the development of tiering in JSC. They were all added before we built the FTL, and the FTL inherited those heuristics just with a 100× multiplier. Each heuristic was added because it produced either a speed-up or a memory usage reduction or both. We try to remove heuristics that are not known to be speed-ups anymore, and to our knowledge, all of these still contribute to better performance on benchmarks we track.

Exit Counting

After we compile a function with the DFG or FTL, it’s possible that one of the speculations we made is wrong. This will cause the function to OSR exit back to LLInt or Baseline (we prefer Baseline, but may throw away Baseline code during GC, in which case exits from DFG and FTL will go to LLInt). We’ve found that the best way of dealing with a wrong speculation is to throw away the optimized code and try optimizing again later with better profiling. We detect if a DFG or FTL function should be recompiled by counting exits. The exit count thresholds are:

  • For a normal exit, we require 100 * pow(2, R) exits to recompile.
  • If the exit causes the Baseline JIT to enter its loop trigger (i.e. we got stuck in a hot loop after exit), then it’s counted specially. We only allow 5 * pow(2, R) of those kinds of exits before we recompile. Note that this can mean exiting five times and tripping the loop optimization trigger each time or it can mean exiting once and tripping the loop optimization trigger five times.

The first step to recompilation is to jettison the DFG or FTL function. That means that all future calls to the function will call the Baseline or LLInt function instead.

Recompilation

If a function is jettisoned, we increment the recompilation counter (R in our notation) and reset the tier-up functionality in the Baseline JIT. This means that the function will keep running in Baseline for a while (twice as long as it did before it was optimized last time). It will gather new profiling, which we will be able to combine with the profiling we collected before to get an even more accurate picture of how types behave in the function.

It’s worth looking at an example of this in action. We already showed an idealized case of tier-up in Figure 8, where a function gets compiled by each compiler exactly once and there are no OSR exits or recompilations. We will now show an example where things don’t go so well. This example is picked because it’s a particularly awful outlier. This isn’t how we expect our engine to behave normally. We expect amusingly bad cases like the following to happen occasionally since the success or failure of speculation is random and random behavior means having bad outliers.

_handlePropertyAccessExpression = function (result, node)
{
    result.possibleGetOverloads = node.possibleGetOverloads;
    result.possibleSetOverloads = node.possibleSetOverloads;
    result.possibleAndOverloads = node.possibleAndOverloads;
    result.baseType = Node.visit(node.baseType, this);
    result.callForGet = Node.visit(node.callForGet, this);
    result.resultTypeForGet = Node.visit(node.resultTypeForGet, this);
    result.callForAnd = Node.visit(node.callForAnd, this);
    result.resultTypeForAnd = Node.visit(node.resultTypeForAnd, this);
    result.callForSet = Node.visit(node.callForSet, this);
    result.errorForSet = node.errorForSet;
    result.updateCalls();
}

This function belongs to the WSL subtest of JetStream 2. It’s part of the WSL compiler’s AST walk. It ends up being a large function after inlining Node.visit. When I ran this on my computer, I found that JSC did 8 compilations before hitting equilibrium for this function:

  1. After running the function in LLInt for a bit, we compile this with Baseline. This is the easy part since Baseline doesn’t need to be recompiled.
  2. We compile with DFG. Unfortunately, the DFG compilation exits 101 times and gets jettisoned. The exit is due to a bad type check that the DFG emitted on this.
  3. We again compile with the DFG. This time, we exit twice due to a check on result. This isn’t enough times to trigger jettison and it doesn’t prevent tier-up to the FTL.
  4. We compile with the FTL. Unfortunately, this compilation gets jettisoned due to a failing watchpoint. Watchpoints (discussed in greater detail in later sections) are a way for the compiler to ask the runtime to notify it when bad things happen rather than emitting a check. Failing watchpoints cause immediate jettison. This puts us back in Baseline.
  5. We try the DFG again. We exit seven times due to a bad check on result, just like in step 3. This still isn’t enough times to trigger jettison and it doesn’t prevent tier-up to the FTL.
  6. We compile with the FTL. This time we exit 402 times due to a bad type check on node. We jettison and go back to Baseline.
  7. We compile with the DFG again. This time there are no exits.
  8. We compile with the FTL again. There are no further exits or recompilations.

This sequence of events has some intriguing quirks in addition to the number of compilations. Notice how in steps 3 and 5, we encounter exits due to a bad check on result, but none of the FTL compilations encounter those exits. This seems implausible since the FTL will do at least all of the speculations that the DFG did and a speculation that doesn’t cause jettison also cannot pessimise future speculations. It’s also surprising that the speculation that jettisons the FTL in step 6 wasn’t encountered by the DFG. It is possible that the FTL does more speculations than the DFG, but that usually only happens in inlined functions, and this speculation on node doesn’t seem to be in inlined code. A possible explanation for all of these surprising quirks is that the function is undergoing phase changes: during some parts of execution, it sees one set of types, and during another part of execution, it sees a somewhat different set. This is a common issue. Types are not random and they are often a function of time.

JavaScriptCore’s compiler control system is designed to get good outcomes both for functions where speculation “just works” and for functions like the one in this example that need some extra time. To summarize, control is all about counting executions, exits, and recompilations, and either launching a higher tier compiler (“tiering up”) or jettisoning optimized code and returning to Baseline.

Profiling

This section describes the profiling tiers of JavaScriptCore. The profiling tiers have the following responsibilities:

  • To provide a non-speculative execution engine. This is important for start-up (before we do any speculation) and for OSR exits. OSR exit needs to exit to something that does no speculation so that we don’t have chains of exits for the same operation.
  • To record useful profiling. Profiling is useful if it enables us to make profitable speculations. Speculations are profitable if doing them makes programs run faster.

In JavaScriptCore, the LLInt and Baseline are the profiling tiers while DFG and FTL are the optimizing tiers. However, DFG and FTL also collect some profiling, usually only when it’s free to do so and for the purpose of refining profiling collected by the profiling tiers.

This section is organized as follows. First we explain how JavaScriptCore’s profiling tiers execute code. Then we explain the philosophy of how to profile. Finally we go into the details of JavaScriptCore’s profiling implementation.

How Profiled Execution Works

JavaScriptCore profiles using the LLInt and Baseline tiers. LLInt interprets bytecode while Baseline compiles it. The two tiers share a nearly identical ABI so that it’s possible to jump from one to the other at any bytecode instruction boundary.

LLInt: The Low Level Interpreter

The LLInt is an interpreter that obeys JIT ABI (in the style of HotSpot‘s interpreter). To that end, it is written in a portable assembly language called offlineasm. Offlineasm has a functional macro language (you can pass macro closures around) embedded in it. The offlineasm compiler is written in Ruby and can compile to multiple CPUs as well as C++. This section tells the story of why this crazy design produces a good outcome.

The LLInt simultaneously achieves multiple goals for JavaScriptCore:

  • LLInt is JIT-friendly. The LLInt runs on the same stack that the JITs run on (which happens to be the C stack). The LLInt even agrees on register conventions with the JITs. This makes it cheap for LLInt to call JITed functions and vice versa. It makes LLInt→Baseline and Baseline→LLInt OSR trivial and it makes any JIT→LLInt OSR possible.
  • LLInt allows us to execute JavaScript code even if we can’t JIT. JavaScriptCore in no-JIT mode (we call it “mini mode”) has some advantages: it’s harder to exploit and uses less memory. Some JavaScriptCore clients prefer the mini mode. JSC is also used on CPUs that we don’t have JIT support for. LLInt works great on those CPUs.
  • LLInt reduces memory usage. Any machine code you generate from JavaScript is going to be big. Remember, there’s a reason why they call JavaScript “high level” and machine code “low level”: it refers to the fact that when you lower JavaScript to machine code, you’re going to get many instructions for each JavaScript expression. Having the LLInt means that we don’t have to generate machine code for all JavaScript code, which saves us memory.
  • LLInt starts quickly. LLInt interprets our bytecode format directly. It’s designed so that we could map bytecode from disk and point the interpreter at it. The LLInt is essential for achieving great page load time in the browser.
  • LLInt is portable. It can be compiled to C++.

It would have been natural to write the LLInt in C++, since that’s what most of JavaScriptCore is written in. But that would have meant that the interpreter would have a C++ stack frame constructed and controlled by the C++ compiler. This would have introduced two big problems:

  1. It would be unclear how to OSR from the LLInt to the Baseline JIT or vice versa, since OSR would have to know how to decode and reencode a C++ stack frame. We don’t doubt that it’s possible to do this with enough cleverness, but it would create constraints on exactly how OSR works and it’s not an easy piece of machinery to maintain.
  2. JS functions running in the LLInt would have two stack frames instead of one. One of those stack frames would have to go onto the C++ stack (because it’s a C++ stack frame). We have multiple choices of how to manage the JS stack frame (we could try to alloca it on top of the C++ frame, or allocate it somewhere else) but this inevitably increases cost: calls into the interpreter would have to do twice the work. A common optimization to this approach is to have interpreter→interpreter calls reuse the same C++ stack frame by managing a separate JS stack on the side. Then you can have the JITs use that separate JS stack. This still leaves cost when calling out of interpreter to JIT or vice versa.

A natural way to avoid these problems is to write the interpreter in assembly. That’s basically what we did. But a JavaScript interpreter is a complex beast. It would be awful if porting JavaScriptCore to a new CPU meant rewriting the interpreter in another assembly language. Also, we want to use abstraction to write it. If we wrote it in C++, we’d probably have multiple functions, templates, and lambdas, and we would want all of them to be inlined. So we designed a new language, offlineasm, which has the following features:

  • Portable assembly with our own mnemonics and register names that match the way we do portable assembly in our JIT. Some high-level mnemonics require lowering. Offlineasm reserves some scratch registers to use for lowering.
  • The macro construct. It’s best to think of this as a lambda that takes some arguments and returns void. Then think of the portable assembly statements as print statements that output that assembly. So, the macros are executed for effect and that effect is to produce an assembly program. These are the execution semantics of offlineasm at compile time.

Macros allow us to write code with rich abstractions. Consider this example from the LLInt:

macro llintJumpTrueOrFalseOp(name, op, conditionOp)
    llintOpWithJump(op_%name%, op, macro (size, get, jump, dispatch)
        get(condition, t1)
        loadConstantOrVariable(size, t1, t0)
        btqnz t0, ~0xf, .slow
        conditionOp(t0, .target)
        dispatch()

    .target:
        jump(target)

    .slow:
        callSlowPath(_llint_slow_path_%name%)
        nextInstruction()
    end)
end

This is a macro that we use for implementing both jtrue and jfalse and opcodes. There are only three lines of actual assembly in this listing: the btqnz (branch test quad not zero) and the two labels (.target and .slow). This also shows the use of first-class macros: on the second line, we call llintOpWithJump and pass it a macro closure as the third argument. The great thing about having a lambda-like construct like macro is that we don’t need much else to have a pleasant programming experience. The LLInt is written in about 5000 lines of offlineasm (if you only count the 64-bit version).

To summarize, LLInt is an interpreter written in offlineasm. LLInt understands JIT ABI so calls and OSR between LLInt and JIT are cheap. The LLInt allows JavaScriptCore to load code more quickly, use less memory, and run on more platforms.

Baseline: The Bytecode Template JIT

The Baseline JIT achieves a speed-up over the LLInt at the cost of some memory and the time it takes to generate machine code. Baseline’s speed-up is thanks to two factors:

  • Removal of interpreter dispatch. Interpreter dispatch is the costliest part of interpretation, since the indirect branches used for selecting the implementation of an opcode are hard for the CPU to predict. This is the primary reason why Baseline is faster than LLInt.
  • Comprehensive support for polymorphic inline caching. It is possible to do sophisticated inline caching in an interpreter, but currently our best inline caching implementation is the one shared by the JITs.

The Baseline JIT compiles bytecode by turning each bytecode instruction into a template of machine code. For example, a bytecode instruction like:

add loc6, arg1, arg2

Is turned into something like:

0x2f8084601a65: mov 0x30(%rbp), %rsi
0x2f8084601a69: mov 0x38(%rbp), %rdx
0x2f8084601a6d: cmp %r14, %rsi
0x2f8084601a70: jb 0x2f8084601af2
0x2f8084601a76: cmp %r14, %rdx
0x2f8084601a79: jb 0x2f8084601af2
0x2f8084601a7f: mov %esi, %eax
0x2f8084601a81: add %edx, %eax
0x2f8084601a83: jo 0x2f8084601af2
0x2f8084601a89: or %r14, %rax
0x2f8084601a8c: mov %rax, -0x38(%rbp)

The only parts of this code that would vary from one add instruction to another are the references to the operands. For example, 0x30(%rbp) (that’s x86 for the memory location at frame pointer plus 0x30) is the machine code representation of arg1 in bytecode.

The Baseline JIT does few optimizations beyond just emitting code templates. It does no register allocation between instruction boundaries, for example. The Baseline JIT does some local optimizations, like if an operand to a math operation is a constant, or by using profiling information collected by the LLInt. Baseline also has good support for code repatching, which is essential for implementing inline caching. We discuss inline caching in detail later in this section.

To summarize, the Baseline JIT is a mostly unoptimized JIT compiler that focuses on removing interpreter dispatch overhead. This is enough to make it a ~2× speed-up over the LLInt.

Profiling Philosophy

Profiling in JSC is designed to be cheap and useful.

JavaScriptCore’s profiling aims to incur little or no cost in the common case. Running with profiling turned on but never using the results to do optimizations should result in throughput that is about as good as if all of the profiling was disabled. We want profiling to be cheap because even in a long running program, lots of functions will only run once or for too short to make an optimizing JIT profitable. Some functions might finish running in less time than it takes to optimize them. The profiling can’t be so expensive that it makes functions like that run slower.

Profiling is meant to help the compiler make the kinds of speculations that cause the program to run faster when we factor in both the speed-ups from speculations that are right and the slow-downs from speculations that are wrong. It’s possible to understand this formally by thinking of speculation as a bet. We say that profiling is useful if it turns the speculation into a value bet. A value bet is one where the expected value (EV) is positive. That’s another way of saying that the average outcome is profitable, so if we repeated the bet an infinite number of times, we’d be richer. Formally the expected value of a bet is:

p * B - (1 - p) * C

Where p is the probability of winning, B is the benefit of winning, and C is the cost of losing (both B and C are positive). A bet is a value bet iff:

p * B - (1 - p) * C > 0

Let’s view speculation using this formula. The scenario in which we have the choice to make a bet or not is that we are compiling a bytecode instruction, we have some profiling that implies that we should speculate, and we have to choose whether to speculate or not. Let’s say that B and C both have to do with the latency, in nanoseconds, of executing a bytecode instruction once. B is the improvement to that latency if we do some speculation and it turns out to be right. C is the regression to that latency if the speculation we make is wrong. Of course, after we have made a speculation, it will run many times and may be right sometimes and wrong sometimes. But B is just about the speed-up in the right cases, and C is just about the slow-down in the wrong cases. The baseline relative to which B and C are measured is the latency of the bytecode instruction if it was compiled with an optimizing JIT but without that particular OSR-exit-based speculation.

For example, we may have a less-than operation, and we are considering whether to speculate that neither input is double. We can of course compile less-than without making that speculation, so that’s the baseline. If we do choose to speculate, then B is the speed-up to the average execution latency of that bytecode in those cases when neither input is double. Meanwhile, C is the slow-down to the average execution latency of that bytecode in those cases when at least one input is a double.

For B, let’s just compute some bounds. The lower bound is zero, since some speculations are not profitable. A pretty good first order upper bound for B is the difference in per-bytecode-instruction latency between the baseline JIT and the FTL. Usually, the full speed-up of a bytecode instruction between baseline to FTL is the result of multiple speculations as well as nonspeculative compiler optimizations. So, a single speculation being responsible for the full difference in performance between baseline and FTL is a fairly conservative upper bound for B. Previously, we said that on average in the JetStream 2 benchmark on my computer, a bytecode instruction takes 1.71 ns to execute in Baseline and .225 ns to execute in FTL. So we can say:

B <= 1.71 ns - .225 ns = 1.48 ns

Now let’s estimate C. C is how many more nanoseconds it takes to execute the bytecode instruction if we have speculated and we experience speculation failure. Failure means executing an OSR exit stub and then reexecuting the same bytecode instruction in baseline or LLInt. Then, all subsequent bytecodes in the function will execute in baseline or LLInt rather than DFG or FTL. Every 100 exits or so, we jettison and eventually recompile. Compiling is concurrent, but running a concurrent compiler is sure to slow down the main thread even if there is no lock contention. To fully capture C, we have to account for the cost of the OSR exit itself and then amortize the cost of reduced execution speed of the remainder of the function and the cost of eventual recompilation. Fortunately, it’s pretty easy to measure this directly by hacking the DFG frontend to randomly insert pointless OSR exits with low probability and by having JSC report a count of the number of exits. I did an experiment with this hack for every JetStream 2 benchmark. Running without the synthetic exits, we get an execution time and a count of the number of exits. Running with synthetic exits, we get a longer execution time and a larger number of exits. The slope between these two points is an estimate of C. This is what I found, on the same machine that I used for running the experiments to compute B:

[DFG] C = 2499 ns
[FTL] C = 9998 ns

Notice how C is way bigger than B! This isn’t some slight difference. We are talking about three orders of magnitude for the DFG and four orders of magnitude for the FTL. This paints a clear picture: speculation is a bet with tiny benefit and enormous cost.

For the DFG, this means that we need:

p > 0.9994

For speculation to be a value bet. p has to be even closer to 1 for FTL. Based on this, our philosophy for speculation is we won’t do it unless we think that:

p ~ 1

Since the cost of speculation failure is so enormous, we only want to speculate when we know that we won’t fail. The speed-up of speculation happens because we make lots of sure bets and only a tiny fraction of them ever fail.

It’s pretty clear what this means for profiling:

  • Profiling needs to focus on noting counterexamples to whatever speculations we want to do. We don’t want to speculate if profiling tells us that the counterexample ever happened, since if it ever happened, then the EV of this speculation is probably negative. This means that we are not interested in collecting probability distributions. We just want to know if the bad thing ever happened.
  • Profiling needs to run for a long time. It’s common to wish for JIT compilers to compile hot functions sooner. One reason why we don’t is that we need about 3-4 “nines” of confidence that that the counterexamples didn’t happen. Recall that our threshold for tiering up into the DFG is about 1000 executions. That’s probably not a coincidence.

Finally, since profiling is a bet, it’s important to approach it with a healthy gambler’s philosophy: the fact that a speculation succeeded or failed in a particular program does not tell us if the speculation is good or bad. Speculations are good or bad only based on their average behavior. Focusing too much on whether profiling does a good job for a particular program may result in approaches that cause it to perform badly on average.

Profiling Sources in JavaScriptCore

JavaScriptCore gathers profiling from multiple different sources. These profiling sources use different designs. Sometimes, a profiling source is a unique source of data, but other times, profiling sources are able to provide some redundant data. We only speculate when all profiling sources concur that the speculation would always succeed. The following sections describe our profiling sources in detail.

Case Flags

Case flags are used for branch speculation. This applies anytime the best way to implement a JS operation involves branches and multiple paths, like a math operation having to handle either integers or doubles. The easiest way to profile and speculate is to have the profiling tiers implement both sides of the branch and set a different flag on each side. That way, the optimizing tier knows that it can profitably speculate that only one path is needed if the flags for the other paths are not set. In cases where there is clearly a preferred speculation — for example, speculating that an integer add did not overflow is clearly preferred overspeculating that it did overflow — we only need flags on the paths that we don’t like (like the overflow path).

Let’s consider two examples of case flags in more detail: integer overflow and property accesses on non-object values.

Say that we are compiling an add operation that is known to take integers as inputs. Usually the way that the LLInt interpreter or Baseline compiler would “know” this is that the add operation we’ll talk about is actually the part of a larger add implementation after we’ve already checked that the inputs are integers. Here’s the logic that the profiling tier would use written as if it was C++ code to make it easy to parse:

int32_t left = ...;
int32_t right = ...;
ArithProfile* profile = ...; // This is the thing with the case flags.
int32_t intResult;
JSValue result; // This is a tagged JavaScript value that carries type.
if (UNLIKELY(addOverflowed(left, right, &intResult))) {
    result = jsNumber(static_cast<double>(left) +
                      static_cast<double>(right));

    // Set the case flag indicating that overflow happened.
    profile->setObservedInt32Overflow();
} else
    result = jsNumber(intResult);

When optimizing the code, we will inspect the ArithProfile object for this instruction. If !profile->didObserveInt32Overflow(), we will emit something like:

int32_t left = ...;
int32_t right = ...;
int32_t result;
speculate(!addOverflowed(left, right, &result));

I.e. we will add and branch to an exit on overflow. Otherwise we will just emit the double path:

double left = ...;
double right = ...;
double result = left + right;

Unconditionally doing double math is not that expensive; in fact on benchmarks that I’ve tried, it’s cheaper than doing integer math and checking overflow. The only reason why integers are profitable is that they are cheaper to use for bit operations and pointer arithmetic. Since CPUs don’t accept floats or doubles for bit and pointer math, we need to convert the double to an integer first if the JavaScript program uses it that way (pointer math arises when a number is used as an array index). Such conversions are relatively expensive even on CPUs that support them natively. Usually it’s hard to tell, using profiling or any static analysis, whether a number that a program computed will be used for bit or pointer math in the future. Therefore, it’s better to use integer math with overflow checks so that if the number ever flows into an operation that requires integers, we won’t have to pay for expensive conversions. But if we learn that any such operation overflows — even occasionally — we’ve found that it’s more efficient overall to unconditionally switch to double math. Perhaps the presence of overflows is strongly correlated with the result of those operations not being fed into bit math or pointer math.

A simpler example is how case flags are used in property accesses. As we will discuss in the inline caches section, property accesses have associated metadata that we use to track details about their behavior. That metadata also has flags, like the sawNonCell bit, which we set to true if the property access ever sees a non-object as the base. If the flag is set, the optimizing compilers know not to speculate that the property access will see objects. This typically forces all kinds of conservatism for that property access, but that’s better than speculating wrong and exiting in this case. Lots of case flags look like sawNonCell: they are casually added as a bit in some existing data structure to help the optimizing compiler know which paths were taken.

To summarize, case flags are used to record counterexamples to the speculations that we want to do. They are a natural way to implement profiling in those cases where the profiling tiers would have had to branch anyway.

Case Counts

A predecessor to case flags in JavaScriptCore is case counts. It’s the same idea as flags, but instead of just setting a bit to indicate that a bad thing happened, we would count. If the count never got above some threshold, we would speculate.

Case counts were written before we realized that the EV of speculation is awful unless the probability of success is basically 1. We thought that we could speculate in cases where we knew we’d be right a majority of the time, for example. Initial versions of case counts had variable thresholds — we would compute a ratio with the execution count to get a case rate. That didn’t work as well as fixed thresholds, so we switched to a fixed count threshold of 100. Over time, we lowered the threshold to 20 or 10, and then eventually found that the threshold should really be 1, at which point we switched to case flags.

Some functionality still uses case counts. We still have case counts for determining if the this argument is exotic (some values of this require the function to perform a possibly-effectful conversion in the prologue). We still have case counts as a backup for math operations overflowing, though that is almost certainly redundant with our case flags for math overflow. It’s likely that we will remove case counts from JavaScriptCore eventually.

Value Profiling

Value profiling is all about inferring the types of JavaScript values (JSValues). Since JS is a dynamic language, JSValues have a runtime type. We use a 64-bit JSValue representation that uses bit encoding tricks to hold either doubles, integers, booleans, null, undefined, or pointers to cell, which may be JavaScript objects, symbols, or strings. We refer to the act of encoding a value in a JSValue as boxing it and the act of decoding as unboxing (note that boxing is a term used in other engines to refer specifically to the act of allocating a box object in the heap to hold a value; our use of the term boxing is more like what others call tagging). In order to effectively optimize JavaScript, we need to have some way of inferring the type so that the compiler can assume things about it statically. Value profiling tracks the set of values that a particular program point saw so that we can predict what types that program point will see in the future.

Figure 12. Value profiling and prediction propagation for a sample data flow graph.

We combine value profiling with a static analysis called prediction propagation. The key insight is that prediction propagation can infer good guesses for the types for most operations if it is given a starting point for certain opaque operations:

  • Arguments incoming to the function.
  • Results of most load operations.
  • Results of most calls.

There’s no way that a static analysis running just on some function could guess what types loads from plain JavaScript arrays or calls to plain JavaScript functions could have. Value profiling is about trying to help the static analysis guess the types of those opaque operations. Figure 12 shows how this plays out for a sample data flow graph. There’s no way static analysis can tell the type of most GetByVal and GetById oerations, since those are loads from dynamically typed locations in the heap. But if we did know what those operations return then we can infer types for this entire graph by using simple type rules for Add (like that if it takes integers as inputs and the case flags tell us there was no overflow then it will produce integers).

Let’s break down value profiling into the details of how exactly values are profiled, how prediction propagation works, and how the results of prediction propagation are used.

Recording value profiles. At its core, value profiling is all about having some program point (either a point in the interpreter or something emitted by the Baseline JIT) log the value that it saw. We log values into a single bucket so that each time the profiling point runs, it overwrites the last seen value. The code looks like this in the LLInt:

macro valueProfile(op, metadata, value)
    storeq value, %op%::Metadata::profile.m_buckets[metadata]
end

Let’s look at how value profiling works for the get_by_val bytecode instruction. Here’s part of the code for get_by_val in LLInt:

llintOpWithMetadata(
    op_get_by_val, OpGetByVal,
    macro (size, get, dispatch, metadata, return)
        macro finishGetByVal(result, scratch)
            get(dst, scratch)
            storeq result, [cfr, scratch, 8]
            valueProfile(OpGetByVal, t5, result)
            dispatch()
        end

        ... // more code for get_by_val

The implementation of get_by_val includes a finishGetByVal helper macro that stores the result in the right place on the stack and then dispatches to the next instruction. Note that it also calls valueProfile to log the result just before finishing.

Each ValueProfile object has a pair of buckets and a predicted type. One bucket is for normal execution. The valueProfile macro in the LLInt uses this bucket. The other bucket is for OSR exit: if we exit due to a speculation on a type that we got from value profiling, we feed the value that caused OSR exit back into the second bucket of the ValueProfile.

Each time that our execution counters (used for controlling when to invoke the next tier) count about 1000 points, the execution counting slow path updates all predicted types for the value profiles in that function. Updating value profiles means computing a predicted type for the value in the bucket and merging that type with the previously predicted type. Therefore, after repeated predicted type updates, the type will be broad enough to be valid for multiple different values that the code saw.

Predicted types use the SpeculatedType type system. A SpeculatedType is a 64-bit integer in which we use the low 40 bits to represent a set of 40 fundamental types. The fundamental types, shown in Figure 13, represent non-overlapping set of possible JSValues. 240 SpeculatedTypes are possible by setting any combination of bits.

Figure 13. All of the fundamental SpeculatedTypes.

This allows us to invent whatever types are useful for optimization. For example, we distinguish between 32-bit integers whose value is either 0 or 1 (BoolInt32) versus whose value is anything else (NonBoolInt32). Together these form the Int32Only type, which just has both bits set. BoolInt32 is useful for cases there integers are converted to booleans.

Prediction propagation. We use value profiling to fill in the blanks for the prediction propagation pass of the DFG compiler pipeline. Prediction propagation is an abstract interpreter that tracks the set of types that each variable in the program can have. It’s unsound since the types it produces are just predictions (it can produce any combination of types and at worst we will just OSR exit too much). However, it can be said that we optimize it to be sound; the more sound it is, the fewer OSR exits we have. Prediction propagation fills in the things that the abstract interpreter can’t reason about (loads from the heap, results returned by calls, arguments to the function, etc.) using the results of value profiling. On the topic of soundness, we would consider it to be a bug if the prediction propagation was unsound in a world where value profiling is never wrong. Of course, in reality, we know that value profiling will be wrong, so we know that prediction propagation is unsound.

Let’s consider some of the cases where prediction propagation can be sure about the result type of an operation based on the types of its inputs.

Figure 14. Some of the prediction propagation rules for Add. This figure doesn’t show the rules for string concatenation and objects. Figure 15. Some of the prediction propagation rules for GetByVal (the DFG opcode for subscript access like array[index]). This figure only shows a small sample of the GetByVal rules.

Figure 14 shows some of the rules for the Add operation in DFG IR. Prediction propagation and case flags tell us everything we want to know about the output of Add. If the inputs are integers and the overflow flag isn’t set, the output is an integer. If the inputs are any other kinds of numbers or there are overflows, the output is a double. We don’t need anything else (like value profiling) to understand the output type of Add.

Figure 15 shows some of the rules for GetByVal, which is the DFG representation of array[index]. In this case, there are types of arrays that could hold any type of value. So, even knowing that it is a JSArray isn’t enough to know the types of values inside the array. Also, if the index is a string, then this could be accessing some named property on the array object or one of its prototypes and those could have any type. It’s in cases like GetByVal that we leverage value profiling to guess what the result type is.

Prediction propagation combined with value profiling allows the DFG to infer a predicted type at every point in the program where a variable is used. This allows operations that don’t do any profiling on their own to still perform type-based speculations. It’s of course possible to also have bytecode instructions that can speculate on type collect case flags (or use some other mechanism) to drive those speculations — and that approach can be more precise — but value profiling means that we don’t have to do this for every operation that wants type-based speculation.

Using predicted types. Consider the CompareEq operation in DFG IR, which is used for the DFG lowering of the eq, eq_null, neq, neq_null, jeq, jeq_null, jneq, and jneq_null bytecodes. These bytecodes do no profiling of their own. But CompareEq is one of the most aggressive type speculators in all of the DFG. CompareEq can speculate on the types it sees without doing any profiling of its own because the values it uses will either have value profiling or will have a predicted type filled in by prediction propagation.

Type speculations in the DFG are written like:

CompareEq(Int32:@left, Int32:@right)

This example means that the CompareEq will specuate that both operands are Int32. CompareEq supports the following speculations, plus others we don’t list here:

CompareEq(Boolean:@left, Boolean:@right)
CompareEq(Int32:@left, Int32:@right)
CompareEq(Int32:BooleanToNumber(Boolean:@left), Int32:@right)
CompareEq(Int32:BooleanToNumber(Untyped:@left), Int32:@right)
CompareEq(Int32:@left, Int32:BooleanToNumber(Boolean:@right))
CompareEq(Int32:@left, Int32:BooleanToNumber(Untyped:@right))
CompareEq(Int52Rep:@left, Int52Rep:@right)
CompareEq(DoubleRep:DoubleRep(Int52:@left), DoubleRep:DoubleRep(Int52:@right))
CompareEq(DoubleRep:DoubleRep(Int52:@left), DoubleRep:DoubleRep(RealNumber:@right))
CompareEq(DoubleRep:DoubleRep(Int52:@left), DoubleRep:DoubleRep(Number:@right))
CompareEq(DoubleRep:DoubleRep(Int52:@left), DoubleRep:DoubleRep(NotCell:@right))
CompareEq(DoubleRep:DoubleRep(RealNumber:@left), DoubleRep:DoubleRep(RealNumber:@right))
CompareEq(DoubleRep:..., DoubleRep:...)
CompareEq(StringIdent:@left, StringIdent:@right)
CompareEq(String:@left, String:@right)
CompareEq(Symbol:@left, Symbol:@right)
CompareEq(Object:@left, Object:@right)
CompareEq(Other:@left, Untyped:@right)
CompareEq(Untyped:@left, Other:@right)
CompareEq(Object:@left, ObjectOrOther:@right)
CompareEq(ObjectOrOther:@left, Object:@right)
CompareEq(Untyped:@left, Untyped:@right)

Some of these speculations, like CompareEq(Int32:, Int32:) or CompareEq(Object:, Object:), allow the compiler to just emit an integer compare instruction. Others, like CompareEq(String:, String:), emit a string compare loop. We have lots of variants to optimally handle bizarre comparisons that are not only possible in JS but that we have seen happen frequently in the wild, like comparisons between numbers and booleans and comparisons between one value that is always a number and another that is either a number or a boolean. We provide additional optimizations for comparisons between doubles, comparisons between strings that have been hash-consed (so-called StringIdent, which can be compared using comparison of the string pointer), and comparisons where we don’t know how to speculate (CompareEq(Untyped:, Untyped:)).

The basic idea of value profiling — storing a last-seen value into a bucket and then using that to bootstrap a static analysis — is something that we also use for profiling the behavior of array accesses. Array profiles and array allocation profiles are like value profiles in that they save the last result in a bucket. Like value profiling, data from those profiles is incorporated into prediction propagation.

To summarize, value profiling allows us to predict the types of variables at all of their use sites by just collecting profiling at those bytecode instructions whose output cannot be predicted with abstract interpretation. This serves as the foundation for how the DFG (and FTL, since it reuses the DFG’s frontend) speculates on the types of JSValues.

Inline Caches

Property accesses and function calls are particularly difficult parts of JavaScript to optimize:

  • Objects behave as if they were just ordered mappings from strings to JSValues. Lookup, insertion, deletion, replacement, and iteration are possible. Programs do these operations a lot, so they have to be fast. In some cases, programs use objects the same way that programs in other languages would use hashtables. In other cases, programs use objects the same way that they would in Java or some sensibly-typed object-oriented language. Most programs do both.
  • Function calls are polymorphic. You can’t make static promises about what function will be called.

Both of these dynamic features are amenable to optimization with Deutsch and Schiffman’s inline caches (ICs). For dynamic property access, we combine this with structures, based on the idea of maps in the Chambers, Ungar, and Lee’s Self implementation. We also follow Hölzle, Chambers, and Ungar: our inline caches are polymorphic and we use data from these caches as profiling of the types observed at a particular property access or call site.

It’s worth dwelling a bit on the power of inline caching. Inline caches are great optimizations separately from speculative compilation. They make the LLInt and Baseline run faster. Inline caches are our most powerful profiling source, since they can precisely collect information about every type encountered by an access or call. Note that we previously said that good profiling has to be cheap. We think of inline caches as negative cost profiling since inline caches make the LLInt and Baseline faster. It doesn’t get cheaper than that!

This section focuses on inline caching for dynamic property access, since it’s strictly more complex than for calls (accesses use structures, polymorphic inline caches (PICs), and speculative compilation; calls only use polymorphic inline caches and speculative compilation). We organize our discussion of inline caching for dynamic property access as follows. First we describe how structures work. Then we show the JavaScriptCore object model and how it incorporates structures. Next we show how inline caches work. Then we show how profiling from inline caches is used by the optimizing compilers. After that we show how inline caches support polymorphism and polyvariance. Finally we talk about how inline caches are integrated with the garbage collector.

Structures. Objects in JavaScript are just mappings from strings to JSValues. Lookup, insertion, deletion, replacement, and iteration are all possible. We want to optimize those uses of objects that would have had a type if the language had given the programmer a way to say it.

Figure 16. Some JavaScript objects that have x and y properties. Some of them have exactly the same shape (only x and y in the same order).

Consider how to implement a property access like:

var tmp = o.x;

Or:

o.x = tmp;

One way to make this fast is to use hashtables. That’s certainly a necessary fallback mechanism when the JavaScript program uses objects more like hashtables than like objects (i.e. it frequently inserts and deletes properties). But we can do better.

This problem frequently arises in dynamic programming languages and it has a well-understood solution. The key insight of Chambers, Ungar, and Lee’s Self implementation is that property access sites in the program will typically only see objects of the same shape. Consider the objects in Figure 16 that have x and y properties. Of course it’s possible to insert x and y in two possible orders, but folks will tend to pick some order and stick to it (like x first). And of course it’s possible to also have objects that have a z property, but it’s less likely that a property access written as part of the part of the program that works with {x, y} objects will be reused for the part that uses {x, y, z}. It’s possible to have shared code for many different kinds of objects but unshared code is more common. Therefore, we split the object representation into two parts:

  • The object itself, which only contains the property values and a structure pointer.
  • The structure, which is a hashtable that maps property names (strings) to indices in the objects that have that structure.
Figure 17. The same objects as in Figure 16, but using structures.

Figure 17 shows objects represented using structures. Objects only contain object property values and a pointer to a structure. The structure tells the property names and their order. For example, if we wanted to ask the {1, 2} object in Figure 17 for the value of property x, we would load the pointer to its structure, {x, y}, and ask that structure for the index of x. The index is 0, and the value at index 0 in the {1, 2} object is 1.

A key feature of structures is that they are hash consed. If two objects have the same properties in the same order, they are likely to have the same structure. This means that checking if an object has a certain structure is O(1): just load the structure pointer from the object header and compare the pointer to a known value.

Structures can also indicate that objects are in dictionary or uncacheable dictionary mode, which are basically two levels of hashtable badness. In both cases, the structure stops being hash consed and is instead paired 1:1 with its object. Dictionary objects can have new properties added to them without the structure changing (the property is added to the structure in-place). Uncacheable dictionary objects can have properties deleted from them without the structure changing. We won’t go into these modes in too much detail in this post.

To summarize, structures are hashtables that map property names to indices in the object. Object property lookup uses the object’s structure to find the index of the property. Structures are hash consed to allow for fast structure checks.

Figure 18. The JavaScriptCode object model.

JavaScriptCore object model. JavaScriptCore uses objects with a 64-bit header that includes a 32-bit structure ID and 32 bits worth of extra state for GC, type checks, and arrays. Figure 18 shows the object model. Named object properties may end up either in the inline slots or the out-of-line slots. Objects get some number of inline slots based on simple static analysis around the allocation site. If a property is added that doesn’t fit in the inline slots, we allocate a butterfly to store additional properties out-of-line. Accessing out-of-line properties in the butterfly costs one extra load.

Figure 19 shows an example object that only has two inline properties. This is the kind of object you would get if you used the object literal {f:5, g:6} or if you assigned to the f and g properties reasonably close to the allocation.

Figure 19. Example JavaScriptCore object together with its structure.

Simple inline caches. Let’s consider the code:

var v = o.f;

Let’s assume that all of the objects that flow into this have structure 42 like the object in Figure 19. Inline caching this property access is all about emitting code like the following:

if (o->structureID == 42)
    v = o->inlineStorage[0]
else
    v = slowGet(o, "f")

But how do we know that o will have structure 42? JavaScript does not give us this information statically. Inline caches get this information by filling it in once the code runs. There are a number of techniques for this, all of which come down to self-modifying code. Let’s look at how the LLInt and Baseline do it.

In the LLInt, the metadata for get_by_id contains a cached structure ID and a cached offset. The cached structure ID is initialized to an absurd value that no structure can have. The fast path of get_by_id loads the property at the cached offset if the object has the cached structure. Otherwise, we take a slow path that does the full lookup. If that full lookup is cacheable, it stores the structure ID and offset in the metadata.

The Baseline JIT does something more sophisticated. When emitting a get_by_id, it reserves a slab of machine code space that the inline caches will later fill in with real code. The only code in this slab initially is an unconditional jump to a slow path. The slow path does the fully dynamic lookup. If that is deemed cacheable, the reserved slab is replaced with code that does the right structure check and loads at the right offset. Here’s an example of a get_by_id initially compiled with Baseline:

0x46f8c30b9b0: mov 0x30(%rbp), %rax
0x46f8c30b9b4: test %rax, %r15
0x46f8c30b9b7: jnz 0x46f8c30ba2c
0x46f8c30b9bd: jmp 0x46f8c30ba2c
0x46f8c30b9c2: o16 nop %cs:0x200(%rax,%rax)
0x46f8c30b9d1: nop (%rax)
0x46f8c30b9d4: mov %rax, -0x38(%rbp)

The first thing that this code does is check that o (stored in %rax) is really an object (using a test and jnz). Then notice the unconditional jmp followed by two long nop instructions. This jump goes to the same slow path that we would have branched to if o was not an object. After the slow path runs, this is repatched to:

0x46f8c30b9b0: mov 0x30(%rbp), %rax
0x46f8c30b9b4: test %rax, %r15
0x46f8c30b9b7: jnz 0x46f8c30ba2c
0x46f8c30b9bd: cmp $0x125, (%rax)
0x46f8c30b9c3: jnz 0x46f8c30ba2c
0x46f8c30b9c9: mov 0x18(%rax), %rax
0x46f8c30b9cd: nop 0x200(%rax)
0x46f8c30b9d4: mov %rax, -0x38(%rbp)

Now, the is-object check is followed by a structure check (using cmp to check that the structure is 0x125) and a load at offset 0x18.

Inline caches as a profiling source. The metadata we use to maintain inline caches makes for a fantastic profiling source. Let’s look closely at what this means.

Figure 20. Timeline of using an inline cache at each JIT tier. Note that we end up having to generate code for this `get_by_id` *six times* in the berst case that each tier compiles this only once.

Figure 20 shows a naive use of inline caches in a multi-tier engine, where the DFG JIT forgets everything that we learned from the Baseline inline cache and just compiles a blank inline cache. This is reasonably efficient and we fall back on this approach when the inline caches from the LLInt and Baseline tell us that there is unmanagable polymorphism. Before we go into how polymorphism is profiled, let’s look at how a speculative compiler really wants to handle simple monomorphic inline caches like the one in Figure 20, where we only see one structure (S1) and the code that the IC emits is trivial (load at offset 10 from %rax).

When the DFG frontend (shared by DFG and FTL) sees an operation like get_by_id that can be implemented with ICs, it reads the state of all ICs generated for that get_by_id. By “all ICs” we mean all ICs that are currently in the heap. This usually just means reading the LLInt and Baseline ICs, but if there exists a DFG or FTL function that generated an IC for this get_by_id then we will also read that IC. This can happen if a function gets compiled multiple times due to inlining — we may be compiling function bar that inlines a call to function foo and foo already got compiled with FTL and the FTL emitted an IC for our get_by_id.

If all ICs for a get_by_id concur that the operation is monomorphic and they tell us the structure to use, then the DFG frontend converts the get_by_id into inline code that does not get repatched. This is shown in Figure 21. Note that this simple get_by_id is lowered to two DFG operations: CheckStructure, which OSR exits if the given object does not have the required structure, and GetByOffset, which is just a load with known offset and field name.

Figure 21. Inlining a simple momomorphic inline cache in DFG and FTL.

CheckStructure and GetByOffset are understood precisely in DFG IR:

  • CheckStructure is a load to get the structure ID of an object and a branch to compare that structure ID to a constant. The compiler knows what structures are. After a CheckStructcure, the compiler knows that it’s safe to execute loads to any of the properties that the structure says that the object has.
  • GetByOffset is a load from either an inline or out-of-line property of a JavaScript object. The compiler knows what kind of property is being loaded, what its offset is, and what the name of the property would have been.

The DFG knows all about how to model these operations and the dependency between them:

  • The DFG knows that neither operation causes a side effect, but that the CheckStructure represents a conditional side exit, and both operations read the heap.
  • The DFG knows that two CheckStructures on the same structure are redundant unless some operation between them could have changed object structure. The DFG knows a lot about how to optimize away redundant structure checks, even in cases where there is a function call between two of them (more on this later).
  • The DFG knows that two GetByOffsets that speak of the same property and object are loading from the same memory location. The DFG knows how to do alias analaysis on those properties, so it can precisely know when a GetByOffset’s memory location got clobbered.
  • The DFG knows that if it wants to hoist a GetByOffset then it has to ensure that the corresponding CheckStructure gets hoisted first. It does this using abstract interpretation, so there is no need to have a dependency edge between these operations.
  • The DFG knows how to generate either machine code (in the DFG tier) or B3 IR (in the FTL tier) for CheckStructure and GetByOffset. In B3, CheckStructure becomes a Load, NotEqual, and Check, while GetByOffset usually just becomes a Load.
Figure 22. Inlining two momomorphic inline caches, for different properties on the same object, in DFG and FTL. The DFG and FTL are able to eliminate the CheckStructure for the second IC.

The biggest upshot of lowering ICs to CheckStructure and GetByOffset is the redundancy elimination. The most common redundancy we eliminate is multiple CheckStrutures. Lots of code will do multiple loads from the same object, like:

var f = o.f;
var g = o.g;

With ICs, we would check the structure twice. Figure 22 shows what happens when the speculative compilers inline these ICs. We are left with just a single CheckStructure instead of two thanks to the fact that:

  • CheckStructure is an OSR speculation.
  • CheckStructure is not an IC. The compiler knows exactly what it does, so that it can model it, so that it can eliminate it.

Let’s pause to appreciate what this technique gives us so far. We started out with a language in which property accesses seem to need hashtable lookups. A o.f operation requires calling some procedure that is doing hashing and so forth. But by combining inline caches, structures, and speculative compilation we have landed on something where some o.f operations are nothing more than load-at-offset like they would have been in C++ or Java. But this assumes that the o.f operation was monomorphic. The rest of this section considers minimorphism, polymorphism, and polyvariance.

Minimorphism. Certain kinds of polymorphic accesses are easier to handle than others. Sometimes an access will see two or more structures but all of those structures have the property at the same offset. Other times an access will see multiple structures and those structures do not agree on the offset of the property. We say that an access is minimorphic if it sees more than one structure and all structures agree on the offset of the property.

Our inline caches handle all forms of polymorphism by generating a stub that switches on the structure. But in the DFG, minimorphic accesses are special because they still qualify for full inlining. Consider an access o.f that sees structures S1 and S2, and both agree that f is at offset 0. Then we would have:

CheckStructure(@o, S1, S2)
GetByOffset(@o, 0)

This minimorphic CheckStructure will OSR exit if @o has none of the listed structures. Our optimizations for CheckStructure generally work for both monomorphic and minimorphic variants. So, minimorphism usually doesn’t hurt performance much compared to monomorphism.

Polymorphism. But what about some access sees different structures, and those structures have the property at different offsets? Consider an access to o.f that sees structures S1 = {f, g}, S2 = {f, g, h}, and S3 = {g, f}. This would be a minimorphic access if it was just S1 or S2, but S3 has f at a different offset. In this case, the FTL will convert this to:

MultiGetByOffset(@o, [S1, S2] => 0, [S3] => 1)

in DFG IR and then lower it to something like:

if (o->structureID == S1 || o->structureID == S2)
    result = o->inlineStorage[0]
else
    result = o->inlineStorage[1]

in B3 IR. In fact, we would use B3’s Switch since that’s the canonical form for this code pattern in B3.

Note that we only do this optimization in the FTL. The reason is that we want polymorphic accesses to remain ICs in the DFG so that we can use them to collect refined profiling.

Figure 23. Polyvariant inlining of an inline cache. The FTL can inline the inline cache in foo-inlined-into-bar after DFG compiles bar and uses an IC to collect polyvariant profiling about the get_by_id.

Polyvariance. Polyvariance is when an analysis is able to reason about a function differently depending on where it is called from. We achieve this by inlining in the DFG tier and keeping polymorphic ICs as ICs. Consider the following example. Function foo has an access to o.f that is polymorphic and sees structures S1 = {f, g}, S2 = {f, g, h}, and S3 = {g, f}:

function foo(o)
{
    // o can have structure S1, S2, or S3.
    return o.f;
}

This function is small, so it will be inlined anytime our profiling tells us that we are calling it (or may be calling it, since call inlining supports inlining polymorphic calls). Say that we have another function bar that always passes objects with structure S1 = {f, g} to foo:

function bar(p)
{
    // p.g always happens to have structure S1.
    return foo(p.g);
}

Figure 23 shows what happens. When the DFG compiles bar (step 3), it will inline foo based on the profiling of its call opcode (in step 2). But it will leave foo‘s get_by_id as an IC because foo‘s Baseline version told us that it’s polymorphic (also step 2). But then, since the DFG’s IC for foo‘s get_by_id is the context of that call from bar, it only ever sees S1 (step 4). So, when the FTL compiles bar and inlines foo, it knows that this get_by_id can be inlined with a monomorphic structure check for just S1 (step 5).

Inline caches also support more exotic forms of property access, like loading from objects in the prototype chain, calling accessors, adding/replacing properties, and even deleting properties.

Inline caches, structures, and garbage collection. Inline caches results in objects that are allocated and referenced only for inline caching. Structures are the most notorious example of these kinds of objects. Structures are particularly problematic because they need strong references to both the object’s prototype and its global object. In some cases, a structure will only be reachable from some inline cache, that inline cache will never run again (but we can’t prove it), and there is a large global object only referenced by that structure. It can be difficult to determine if that means that the structure has to be deleted or not. If it should be deleted, then the inline cache must be reset. If any optimized code inlined that inline cache, then that code must be jettisoned and recompiled. Fortunately, our garbage collector allows us to describe this case precisely. Since the garbage collector runs to fixpoint, we simply add the constraint that the pointer from an inline cache to a structure only marks the structure if the structure’s global object and prototype are already marked. Otherwise, the pointer behaves like a weak pointer. So, an inline cache will only be reset if the only way to reach the structure is through inline caches and the corresponding global object and prototype are dead. This is an example of how our garbage collector is engineered to make speculation easy.

To summarize, inline caching is an optimization employed by all of our tiers. In addition to making code run faster, inline caching is a high-precision profiling source that can tell us about the type cases that an operation saw. Combined with structures, inline caches allow us to turn dynamic property accesses into easy-to-optimize instructions.

Watchpoints

We allow inline caches and speculative compilers to set watchpoints on the heap. A watchpoint in JavaScriptCore is nothing more than a mechanism for registering for notification that something happened. Most watchpoints are engineered to trigger only the first time that something bad happens; after that, the watchpoint just remembers that the bad thing had ever happened. So, if an optimizing compiler wants to do something that is valid only if some bad thing never happened, and the bad thing has a watchpoint, the compiler just checks if the watchpoint is still valid (i.e. the bad thing hasn’t happened yet) and then associates its generated code with the watchpoint (so the code will only get installed if the watchpoint is still valid when the code is done getting compiled, and will be jettisoned as soon as the watchpoint is fired). The runtime allows for setting watchpoints on a large number of activities. The following stick out:

  • It’s possible to set a watchpoint on structures to get a notification whenever any object switches from that structure to another one. This only works for structures whose objects have never transitioned to any other structure. This is called a structure transition watchpoint. It establishes a structure as a leaf in the structure transition tree.
  • It’s possible to set a watchpoint on properties in a structure to get a notification whenever the property is overwritten. Overwriting a property is easy to detect because the first time this happens, it usually involves repatching a put_by_id inline cache so that it’s in the property replacement mode. This is called a property replacement watchpoint.
  • It’s possible to set a watchpoint on the mutability of global variables.

Putting these watchpoints together gives the speculative compiler the ability to constant-fold object properties that happen to be immutable. Let’s consider a simple example:

Math.pow(42, 2)

Here, Math is a global property lookup. The base object is known to the compiler: it’s the global object that the calling code belongs to. Then, Math.pow is a lookup of the pow propery on the Math object. It’s extremely unlikely that the Math property of the global object or the pow property of the Math object had ever been overwritten. Both the global object and the Math object have structures that are unique to them (both because those structures have special magic since those are special objects and because those objects have what is usually a globally unique set of properties), which guarantees that they have leaf structures, so the structure transition watchpoint can be set. Therefore, except for pathological programs, the expression Math.pow is compiled to a constant by the speculative compiler. This makes lots of stuff fast:

  • It’s common to have named and scoped enumerations using objects and object properties, like TypeScript.NodeType.Error in the typescript compiler benchmark in JetStream 2. Watchpoints make those look like a constant to the speculative compiler.
  • Method calls like o.foo(things) are usually turned just into a structure check on o and a direct call. Once the structure is checked, watchpoints establish that the object’s prototype has a property called foo and that this property has some constant value.
  • Inline caches use watchpoints to remove some checks in their generated stubs.
  • The DFG can use watchpoints to remove redundant CheckStructures even when there is a side effect between them. If we set the structure transition watchpoint then we know that no effect can change the structure of any object that has this structure.
  • Watchpoints are used for lots of miscellaneous corner cases of JavaScript, like having a bad time.

To summarize, watchpoints let inline caches and the speculative compilers fold certain parts of the heap’s state to constants by getting a notification when things change.

Exit Flags

All of the profiling sources in our engine have a chance of getting things wrong. Profiling sources get things wrong because:

  • The program may change behavior between when we collected the profiling and when we speculated on it.
  • The profiling has some stochastic element and the program is getting unlucky, leading to wrong profiling.
  • The profiling source has a logic bug that makes it not able to see that something happened.
  • We neglected to implement a profiler for something and instead just speculated blind.

The first of these issues – behavior change over time – is inevitable and is sure to happen for some functions in any sufficiently large program. Big programs tend to experience phase changes, like some subroutine going from being called from one part of a larger library that uses one set of types, to being called from a different part with different types. Those things inevitably cause exits. The other three issues are all variants of the profiling being broken. We don’t want our profiling to be broken, but we’re only human. Recall that for speculation to have good EV, the probability of being right has to be about 1. So, it’s not enough to rely on profiling that was written by imperfect lifeforms. Exit flags are a check on the rest of the profiling and are there to ensure that we get things right eventually for all programs.

In JavaScriptCore, every OSR exit is tagged with an exit kind. When a DFG or FTL function exits enough times to get jettisoned, we record all of the exit kinds that happened along with the bytecode locations that semantically caused the exits (for example if we do a type check for add at bytecode #63 but then hoist the check so that it ends up exiting to bytecode #45, then we will blame #63 not #45). Whenever the DFG or FTL decide whether to perform a kind of speculation, they are expected to check whether there is an exit flag for that speculation at the bytecode that we’re compiling. Our exit flag checking discipline tends to be strictly better than our profiling discipline, and it’s way easier to get right — every phase of the DFG has fast access to exit flags.

Here’s an example of an actual OSR exit check in DFG:

speculationCheck(
    OutOfBounds, JSValueRegs(), 0,
    m_jit.branch32(
        MacroAssembler::AboveOrEqual,
        propertyReg,
        MacroAssembler::Address(storageReg, Butterfly::offsetOfPublicLength())));

Note that the first argument is OutOfBounds. That’s an example exit kind. Here’s another example, this time from the FTL:

speculate(NegativeZero, noValue(), nullptr, m_out.lessThan(left, m_out.int32Zero));

Again, the the first argument is the exit kind. This time it’s NegativeZero. We have 26 exit kinds, most of which describe a type check condition (some are used for other uses of OSR, like exception handling).

We use the exit kinds by querying if an exit had happened at the bytecode location we are compiling when choosing whether to speculate. We typically use the presence of an exit flag as an excuse not to speculate at all for that bytecode. We effectively allow ourselves to overcompensate a bit. The exit flags are a check on the rest of the profiler. They are telling the compiler that the profiler had been wrong here before, and as such, shouldn’t be trusted anymore for this code location.

Summary of Profiling

JavaScriptCore’s profiling is designed to be cheap and useful. Our best profiling sources tend to either involve minimal instrumentation (like just setting a flag or storing a value to a known location) or be intertwined with optimizations (like inline caching). Our profilers gather lots of rich information and in some cases we even collect information redundantly. Our profiling is designed to help us avoid making speculative bets that turn out to be wrong even once.

Compilation and OSR

Now that we have covered bytecode, control, and profiling, we can get to the really fun part: how to build a great speculative optimizing compiler. We will discuss the OSR aspect of speculation in tandem with our descriptions of the two optimizing compilers.

This section is organized into three parts. First we give a quick and gentle introduction to DFG IR, the intermediate representation used by both the DFG and FTL tiers. Then we describe the DFG tier in detail, including how it handles OSR. Finally we describe how the FTL tier works.

DFG IR

The most important component of a powerful optimizing compiler is the IR. We want to have the best possible speculative optimizing compiler for JavaScript, so we have the following goals for our IR:

  • The IR has to describe all of the parts of the program that are interesting to the optimizer. Like other high quality optimizing IRs, DFG IR has good support for talking about data flow, aliasing, effects, control flow, and debug information. Additionally, it’s also good at talking about profiling data, speculation decisions, and OSR.
  • The IR has to be mutable. Anything that is possible to express when first lowering a program to the IR should also be expressible during some later optimization. We prefer that decisions made during lowering to the IR can be be refined by optimizations later.
  • The IR has to have some validation support. It’s got to be possible to catch common mistakes in a validator instead of debugging generated code.
  • The IR has to be purpose-built. If there exists an optimization whose most comprehensive implementation requires a change to the IR or one of its core data structures, then we need to be able to make that change without asking anyone for permission.

Note that IR mutability is closely tied to how much it describes and how easy it is to validate. Any optimization that tries to transform one piece of code into a different, better, piece of code needs to be able to determine if the new code is a valid replacement for the old code. Generally, the more information the IR carries and the easier it is to validate, the easier it is to write the analyses that guard optimizations.

Let’s look at what the DFG IR looks like using a simple example:

function foo(a, b)
{
    return a + b;
}

This results in bytecode like:

[   0] enter             
[   1] get_scope         loc3
[   3] mov               loc4, loc3
[   6] check_traps       
[   7] add               loc6, arg1, arg2
[  12] ret               loc6

Note that only the last two lines (add and ret) are important. Let’s look at the DFG IR that we get from lowering those two bytecode instructions:

  23:  GetLocal(Untyped:@1, arg1(B<Int32>/FlushedInt32), R:Stack(6), bc#7)
  24:  GetLocal(Untyped:@2, arg2(C<BoolInt32>/FlushedInt32), R:Stack(7), bc#7)
  25:  ArithAdd(Int32:@23, Int32:@24, CheckOverflow, Exits, bc#7)
  26:  MovHint(Untyped:@25, loc6, W:SideState, ClobbersExit, bc#7, ExitInvalid)
  28:  Return(Untyped:@25, W:SideState, Exits, bc#12)

In this example, we’ve lowered the add opcode to four operations: two GetLocals to get the argument values from the stack (we load them lazily and this is the first operation that needs them), a speculative ArithAdd instruction, and a MovHint to tell the OSR part of the compiler about the ArithAdd. The ret opcode is just lowered to a Return.

In DFG jargon, the instructions are usually called nodes, but we use the terms node, instruction, and operation interchangeably. DFG nodes are simultaneously nodes in a data flow graph and instructions inside of a control flow graph, with semantics defined “as if” they executed in a particular order.

Figure 24. Explanation of an example ArithAdd DFG instruction.

Let’s consider the ArithAdd in greater detail (Figure 24). This instruction is interesting because it’s exactly the sort of thing that the DFG is designed to optimize: it represents a JavaScript operation that is dynamic and impure (it may call functions) but here we have inferred it to be free of side effects using the Int32: type speculations. These indicate that that before doing anything else, this instruction will check that its inputs are Int32’s. Note that the type speculations of DFG instructions should be understood like function overloads. ArithAdd also allows for both operands to be double or other kinds of integer. It’s as if ArithAdd was a C++ function that had overloads that took a pair of integers, a pair of doubles, etc. It’s not possible to add any type speculation to any operand, since that may result in an instruction overload that isn’t supported.

Another interesting feature of this ArithAdd is that it knows exactly which bytecode instruction it originated from and where it will exit to. These are separate fields in the IR (the semantic and forExit origins) but when the are equal we dump them as one, bc#7 in the case of this instruction.

Any DFG node that may exit will have the Exits flag. Note that we set this flag conservatively. For example, the Return in our example has it set not because Return exits but because we haven’t found a need to make the exit analysis any more precise for that instruction.

Figure 25. Example data flow graph.

DFG IR can be simultaneously understood as a sequence of operations that should be performed as if in the given order and as a data flow graph with backwards pointers. The data flow graph view of our running example is shown in Figure 25. This view is useful since lots of optimizations are concerned with asking questions like: “what instructions produce the values consumed by this instruction?” These data flow edges are the main way that values move around in DFG IR. Also, representing programs this way makes it natural to add SSA form, which we do in the FTL.

Figure 26. DFG and FTL compiler architecture. The pass pipeline depicted above the dotten line is shared between the DFG and FTL compilers. Everything below the dotted line is specialized for DFG or FTL.

DFG, in both non-SSA and SSA forms, forms the bulk of the DFG and FTL compilers. As shown in Figure 26, both JITs share the same frontend for parsing bytecode and doing some optimizations. The difference is what happens after the DFG optimizer. In the DFG tier, we emit machine code directly. In the FTL tier, we convert to DFG SSA IR (which is almost identical to DFG IR but uses SSA to represent data flow) and do more optimizations, and then lower through two additional optimizers (B3 and Assembly IR or Air). The remaining sections talk about the DFG and FTL compilers. The section on the DFG compiler covers the parts of DFG and FTL that are common.

DFG Compiler

The point of the DFG compiler is to remove lots of type checks quickly. Fast compilation is the DFG feature that differentiates it from the FTL. To get fast compilation, the DFG lacks SSA, can only do very limited code motion, and uses block-local versions of most optimizations (common subexpression elimination, register allocation, etc). The DFG has two focus areas where it does a great job despite compiling quickly: how it handles OSR and how it uses static analysis.

This section explains the DFG by going into these three concepts in greater detail:

  • OSR exit as a first-class concept in the compiler.
  • Static analysis as the main driver of optimization.
  • Fast compilation so that we get the benefits of optimization as soon as possible.

OSR Exit

OSR is all about flattening control flow by making failing checks exit sideways. OSR is a difficult optimization to get right. It’s especially difficult to reason about at a conceptual level. This section tries to demystify OSR exit. We’re going to explain the DFG compiler’s approach to OSR, which includes both parts that are specific to the DFG tier and parts that are shared with the FTL. The FTL section explains extensions to this approach that we use to do more aggressive optimizations.

Our discussion proceeds as follows. First we use a high-level example to illustrate what OSR exit is all about. Then we describe what OSR exit means at the machine level, which will take us into the details of how optimizing compilers handle OSR. We will show a simple OSR exit IR idea based on stackmaps to give a sense of what we’re trying to achieve and then we describe how DFG IR compresses stackmaps. Finally we talk about how OSR exit is integrated with watchpoints and invalidation.

High-level OSR example. To start to demystify DFG exit, let’s think of it as if it was an optimization we were doing to a C program. Say we had written code like:

int foo(int* ptr)
{
    int w, x, y, z;
    w = ... // lots of stuff
    x = is_ok(ptr) ? *ptr : slow_path(ptr);
    y = ... // lots of stuff
    z = is_ok(ptr) ? *ptr : slow_path(ptr);
    return w + x + y + z;
}

Let’s say we wanted to optimize out the second is_ok check. We could do that by duplicating all of the code after the first is_ok check, and having one copy statically assume that is_ok is true while another copy either assumes it’s false or makes no assumptions. This might make the fast path look like:

int foo(int* ptr)
{
    int w, x, y, z;
    w = .. // lots of stuff
    if (!is_ok(ptr))
        return foo_base1(ptr, w);
    x = *ptr;
    y = ... // lots of stuff
    z = *ptr;
    return w + x + y + z;
}

Where foo_base1 is the original foo function after the first is_ok check. It takes the live state at that point as an argument and looks like this:

int foo_base1(int* ptr, int w)
{
    int x, y, z;
    x = is_ok(ptr) ? *ptr : slow_path(ptr);
    y = ... // lots of stuff
    z = is_ok(ptr) ? *ptr : slow_path(ptr);
    return w + x + y + z;
}

What we’ve done here is OSR exit. We’re optimizing control flow on the fast path (removing one is_ok check) by exiting (tail-calling foo_base1) if !is_ok. OSR exit requires:

  • Somewhere to exit, like foo_base1 in this case. It should be a thing that can complete execution of the current function without getting stuck on the same speculation.
  • The live state at exit, like ptr and w in this case. Without that, the exit target can’t pick up where we left off.

That’s OSR exit at a high level. We’re trying to allow an optimizing compiler to emit checks that exit out of the function on failure so that the compiler can assume that the same check won’t be needed later.

OSR at the machine level. Now let’s look at what OSR exit looks like at a lower level. Figure 27 shows an example of OSR at a particular bytecode index.

Figure 27. OSR exit at the machine level for an example bytecode instruction.

OSR is all about replacing the current stack frame and register state, which correspond to some bytecode index in the optimizing tier, with a different frame and register state, which correspond to the same point in the profiling tier. This is all about shuffling live data from one format to another and jumping to the right place.

Knowing where to jump to is easy: each DFG node (aka instruction or operation) has forExit, or just exit, origin that tells us which bytecode location to exit to. This may even be a bytecode stack in case of inlining.

The live data takes a bit more effort. We have to know what the set of live data is and what its format is in both the profiling and optimizing tiers. It turns out that knowing what the set of live data is and how to represent it for the profiling tiers is easy, but extracting that data from the optimizing tier is hard.

First let’s consider what’s live. The example in Figure 27 says that we’re exiting at an add and it has loc3, loc4, and loc8 live before. We can solve for what’s live at any bytecode instruction by doing a liveness analysis. JavaScriptCore has an optimized bytecode liveness analysis for this purpose.

Note that the frame layout in the profiling tier is an orderly representation of the bytecode state. In particular, locN just means framePointer - 8 * N and argN just means framePointer + FRAME_HEADER_SIZE + 8 * N, where FRAME_HEADER_SIZE is usually 40. The only difference between frame layouts between functions in the profiling tier is the frame size, which is determined by a constant in each bytecode function. Given the frame pointer and the bytecode virtual register name, it’s always possible to find out where on the stack the profiling tiers would store that variable. This makes it easy to figure out how to convert any bytecode live state to what the Baseline JIT or LLInt would expect.

The hard part is the optimizing tier’s state. The optimizing compiler might:

  • Allocate the stack in any order. Even if a variable is on the stack, it may be anywhere.
  • Register-allocate a variable. In that case there may not be any location on the stack that contains the value of that variable.
  • Constant-fold a variable. In that case there may not be any location on the stack or in the register file that contains the value of that variable.
  • Represent a variable’s value in some creative way. For example, your program might have had a statement like x = y + z but the compiler chose to never actually emit the add except lazily at points of use. This can easily happen because of pattern-matching instruction selection on x86 or ARM, where some instructions (like memory accesses) can do some adds for free as part of address computation. We do an even more aggressive version of this for object allocations: some program variable semantically points to an object, but because our compiler is smart, we never actually allocated any object and the object’s fields may be register-allocated, constant-folded, or represented creatively.

We want to allow the optimizing compiler to do things like this, since we want OSR exit to be an enabler of optimization rather than an inhibitor. This turns out to be tricky: how do we let the optimizing compiler do all of the optimizations that it likes to do while still being able to tell us how to recover the bytecode state?

The trick to extracting the optimized-state-to-bytecode-state shuffle from the optimizing compiler is to leverage the original bytecode→IR conversion. The main difference between an SSA-like IR (like DFG IR) and bytecode is that it represents data flow relationships instead of variables. While bytecode says add x, y, z, DFG IR would have an Add node that points to the nodes that produced y and z (like in Figure 25). The conversion from bytecode to DFG IR looks like this pseudocode:

case op_add: {
    VirtualRegister result = instruction->result();
    VirtualRegister left   = instruction->left();
    VirtualRegister right  = instruction->right();

    stackMap[result] = createAdd(
        stackMap[left], stackMap[right]);
    break;
}

This uses a standard technique for converting variable-based IRs to data-flow-based IRs: the converter maintains a mapping from variables in the source IR to data flow nodes in the target IR. We’re going to call this the stackMap for now. Each bytecode instruction is handled by modeling the bytecode’s data flow: we load the left and right operands from the stackMap, which gives us the DFG nodes for those locals’ values. Then we create an ArithAdd node and store it into the result local in the stackMap to model the fact that the bytecode wanted to store the result to that local. Figure 28 shows the before-and-after of running this on the add bytecode in our running example.

Figure 28. Example of stackMap before and after running the SSA conversion on add at bc#42 along with an illustration of the data flow graph around the resulting ArithAdd.

The stackMap, pruned to bytecode liveness as we are doing in these examples, represents the set of live state that would be needed to be recovered at any point in bytecode execution. It tells us, for each live bytecode local, what DFG node to use to recover the value of that local. A simple way to support OSR would be to give each DFG node that could possibly exit a data flow edge to each node in the liveness-pruned stackMap.

This isn’t what the DFG actually does; DFG nodes do not have data flow edges for the stackmap. Doing literally that would be too costly in terms of memory usage since basically every DFG node may exit and stackmaps have O(live state) entries. The DFG’s actual approach is based on delta-compression of stackmaps. But it’s worth considering exactly how this uncompressed stackmap approach would work because it forms part of the FTL’s strategy and it gives a good mental model for understanding the DFG’s more sophisticated approach. So, we will spend some time describing the DFG IR as if it really did have stackmaps. Then we will show how the stackmap is expressed using delta compression.

OSR exit with uncompressed stackmaps. Imagine that DFG nodes really had extra operands for the stackmap. Then we would have an ArithAdd like the following, assuming that bc#42 is the exit origin and that loc3, loc4, and loc8 are live, as they are in Figures 27 and 28:

c: ArithAdd(@a, @b, loc3->@s, loc4->@a, loc8->@b, bc#42)

In this kind of IR, we’d let the first two operands of ArithAdd behave the expected way (they are the actual operands to the add), and we’d treat all of the other operands as the stackmap. The exit origin, bc#42, is a control flow label. Together, this tells the ArithAdd where to exit (bc#42) and the stackmap (@s, @a, and @b). The compiler treats the ArithAdd, and the stackmap operands, as if the ArithAdd had a side exit from the function the compiler was compiling.

One way to think about it is in terms of C pseudocode. We are saying that the semantics of ArithAdd and any other instruction that may exit are as if they did the following before any of their effects:

if (some conditions)
    return OSRExit(bc#42, {loc3: @s, loc4: @a, loc8: @b});

Where the return statement is an early return from the compiled function. So, this terminates the execution of the compiled function by tail-calling (jumping to) the OSRExit. That operation will transfer control to bc#42 and pass it the given stackmap.

Figure 29. Example of control flow in a compiler with OSR exit. OSR exit means having an additional implicit set of control flow edges that come out of almost every instruction and represent a side exit from the control flow graph.

This is easy to model in a compiler. We don’t allocate any kind of control flow constructs to represent the condition check and side exit but we assume it to exist implicitly when analyzing ArithAdd or any other node that may exit. Note that in JavaScript, basically every instruction is going to possibly exit, and JavaScriptCore’s may exit analysis defaults to true for most operations. Figure 29 illustrates what this looks like. We are going to have three kinds of control flow edges instead of the usual two:

  1. The normal control flow edges between basic blocks. This is what you normally think of as “control flow”. These edges are explicitly represented in the IR, as in, there is an actual data structure (usually vector of successors and vector of predecessors) that each block uses to tell what control flow edges it participates in.
  2. The implicit fall-through control flow for instructions within blocks. This is standard for compilers with basic blocks.
  3. A new kind of control flow edge due to OSR, which goes from instructions in blocks to OSR exit landing sites. This means changing the definition of basic blocks slightly. Normally the only successors of basic blocks are the ones in the control flow graph. Our basic blocks have a bunch of OSR exit successors as well. Those successors don’t exist in the control flow graph, but we have names for them thanks to the exit origins found in the exiting instructions. The edges to those exit origins exit out of the middle of blocks, so they may terminate the execution of blocks before the block terminal.

The OSR landing site is understood by the compiler as having the following behaviors:

  • It ends execution of this function in DFG IR. This is key, since it means that there is no merge point in our control flow graph that has to consider the consequences of exit.
  • It possibly reads and writes the whole world. The DFG has to care about the reads (since they may observe whatever happened just before the exit) but not the writes (since they affect execution after execution exited DFG).
  • It reads some set of values, namely those passed as the stackmap.

This understanding is abstract, so the compiler will just assume the worst case (after exit every location in memory is read and written and all of the bits in all of the values in the stackmap are etched into stone).

This approach is great because it allows precise reconstruction of baseline state when compiling OSR exit and it mostly doesn’t inhibit optimization because it “only” involves adding a new kind of implicit control flow edge to the control flow graph.

This approach allows for simple reconstruction of state at exit because the backend that compiles the DFG nodes would have treated the stackmap data flow edges (things like loc3->@s in our example) the same way it would have treated all other edges. So, at the ArithAdd, the backend would know which registers, stack slots, or constant values to use to materialize the stackmap values. It would know how to do this for the same reason that it would know how to materialize the two actual add operands.

If we survey the most common optimizations that we want the compiler to do, we find that only one major optimization is severely inhibited by this approach to OSR exit. Let’s first review the optimizations this doesn’t break. It’s still possible to perform common subexpression elimination (CSE) on ArithAdd. It’s still possible to hoist it out of loops, though if we do that then we have to edit the exit metadata (the exit destination and stackmap will have to be overwritten to be whatever they are at the loop pre-header). It’s still possible to model the ArithAdd to be pure in lots of the ways that matter, like that if there are two loads, one before and one after the ArithAdd, then we can assume them to be redundant. The ArithAdd could only cause effects on the exit path, in which case the second load doesn’t matter. It’s still possible to eliminate the ArithAdd if it’s unreachable.

The only thing we cannot easily do is what compilers call dead code elimination, i.e. the elimination of instructions if their results are not used. Note that the compiler terminology is confusing here. Outside the compiler field we use the term dead code to mean something that compilers call unreachable code. Code is unreachable if control flow doesn’t reach it and so it doesn’t execute. Outside the compiler field, we would say that such code is dead. It’s important that compilers be able to eliminate unreachable code. Happily, our approach to OSR has no impact on unreachable code elimination. What compilers call dead code is code that is reached by control flow (so live in the not-compiler sense) but that produces a result that no subsequent code uses. Here’s an example of dead code in the compiler sense:

int tmp = a + b;
// nobody uses tmp.

Dead code elimination (DCE) is the part of a compiler that removes this kind of code. Dead code elimination doesn’t quite work for the ArithAdd because:

  • ArithAdd’s speculation checks must be assumed live even if the result of the add is unused. We may do some optimization to a later check because we find that it is subsumed by checks done by this ArithAdd. That’s a pretty fundamental optimization that we do for OSR checks and it’s the reason why OSR ultimately flattens control flow. But we don’t bother recording whenever this ArithAdd’s check is used to unlock a later optimization, so we have to assume that some later operation is already depending on the ArithAdd doing all of its checks. This means that: say that the result of some operation A is used by a dead operation B. B will still have to do whatever checks it was doing on its inputs, which will keep A alive even though B is dead. This is particularly devastating for ArithAdd, since ArithAdd usually does an overflow check. You have to do the add to check overflow. So, ArithAdd is never really dead. Consider the alternative: if we did not considider the ArithAdd’s overflow check’s effect on abstract state, then we wouldn’t be able to do our range analysis, which uses the information inferred from overflow checks to remove array bounds checks and vice versa.
  • The ArithAdd is almost sure to end up in the stackmap of some later operation, as is basically every node in the DFG program, unless the node represents something that was dead in bytecode. Being dead in bytecode is particularly unlikely because in bytecode we must assume that everything is polymorphic and possibly effectful. Then the add is really not dead: it might be a loop with function calls, after all.

The DFG and FTL still do DCE, but it’s hard and usually only worth the effort for the most expensive constructs. We support decaying an operation just to its checks, for those rare cases where we can prove that the result is not used. We also support sinking to OSR, where an operation is replaced by a phantom version of itself that exists only to tell OSR how to perform the operation for us. We mainly use this complex feature for eliminating object allocations.

To summarize the effect on optimizations: we can still do most of the optimizations. The optimization most severely impacted is DCE, but even there, we have found ways to make it work for the most important cases.

The only real downside of this simple approach is repetition: almost every DFG operation may exit and the state at exit may easily have tens or hundreds of variables, especially if we have done significant inlining. Storing the stackmap in each DFG node would create a case of O(n2) explosion in memory usage and processing time within the compiler. Note that the fact that this explosion happens is somewhat of a JavaScript-specific problem, since JavaScript is unusual in the sheer number of speculations we have to make per operation (even simple ones like add or get_by_id). If the speculations were something we did seldom, like in Java where they are mostly used for virtual calls, then the simple approach would be fine.

Stackmap compression in DFG IR. Our solution to the size explosion of repeated stackmaps is to use a delta encoding. The stackmaps don’t change much. In our running example, the add just kills loc8 and defines loc7. The kill can be discovered by analyzing bytecode, so there’s no need to record it. All we have to record about this operation is that it defines loc7 to be the ArithAdd node.

We use an operation called MovHint as our delta encoding. It tells which bytecode variable is defined by which DFG node. For example, let’s look at the MovHint we would emit for the add in Figure 28:

c: ArithAdd(@a, @b, bc#42)
   MovHint(@c, loc7, bc#42)

We need to put some care into how we represent MovHints so that they are easy to preserve and modify. Our approach is two-fold:

  • We treat MovHint as a store effect.
  • We explicitly label the points in the IR where we expect it to be valid to exit based on the state constructed out of the MovHint deltas.

Let’s first look at how we use the idea of store effects to teach the compiler about MovHint. Imagine a hypothetical DFG IR interpreter and how it would do OSR exit. They key idea is that in that interpreter, the state of the DFG program comprises not just the mapping from DFG nodes to their values, but also an OSR exit state buffer containing values indexed by bytecode variable name. That OSR exit state buffer contains exactly the stack frame that the profiling tiers would use. MovHint’s interpreter semantics are to store the value of its operand into some slot in the OSR exit state buffer. This way, the DFG interpreter is able to always maintain an up-to-date bytecode stack frame in tandem with the optimized representation of program state. Although no such interpreter exists, we make sure that the way we compile MovHint produces something with semantics consistent with what this interpreter would have done.

MovHint is not compiled to a store. But any phase operating on MovHints or encountering MovHints just needs to understand it as a store to some abstract location. The fact that it’s a store means that it’s not dead code. The fact that it’s a store means that it may need to be ordered with other stores or loads. Lots of desirable properties we need for soundly preserving MovHints across compiler optimizations fall out naturally from the fact that we tell all the phases that it’s just a store.

The compiler emits zero code for MovHint. Instead, we use a reaching defs analysis of MovHints combined with a bytecode liveness analysis to rebuild the stackmaps that we would have had if each node carried a stackmap. We perform this analysis in the backend and as part of any optimization that needs to know what OSR is doing. In the DFG tier, the reaching defs analysis happens lazily (when the OSR exit actually occurs — so could be long after the DFG compiled the code), which ensures that the DFG never experiences the O(n2) blow-up of stackmaps. OSR exit analysis is not magical: in the “it’s just a store” model of MovHint, this analysis reduces to load elimination.

DFG IR’s approach to OSR means that OSR exit is possible at some points in DFG IR and not at others. Consider some examples:

  • A bytecode instruction may define multiple bytecode variables. When lowered to DFG IR, we would have two or more MovHints. It’s not possible to have an exit between those MovHints, since the OSR exit state is only partly updated at that point.
  • It’s not possible to exit after a DFG operation that does an observable effect (like storing to a JS object property) but before its corresponding MovHint. If we exit to the current exit origin, we’ll execute the effect again (which is wrong), but if we exit to the next exit origin, we’ll neglect to store the result into the right bytecode variable.

We need to make it easy for DFG transformations to know if it’s legal to insert operations that may exit at any point in the code. For example, we may want to write instrumentation that adds a check before every use of @x. If that use is a MovHint, then we need to know that it may not be OK to add that check right before that MovHint. Our approach to this is based on the observation that the lowering of a bytecode instruction produces two phases of execution in DFG IR of that instruction:

  • The speculation phase: at the start of execution of a bytecode, it’s both necessary and possible to speculate. It’s necessary to speculate since those speculations guard the optimizations that we do in the subsequent DFG nodes for that bytecode instruction. It’s possible to speculate because we haven’t done any of the instruction’s effects, so we can safely exit to the start of that bytecode instruction.
  • The effects phase: as soon as we perform any effect, we are no longer able to do any more speculations. That effect could be an actual effect (like storing to a property or making a call) or an OSR effect (like MovHint).

To help validate this, all nodes in DFG IR have an exitOK flag that they use to record whether they think that they are in the speculative phase (exitOK is true) or if they think that they might be in the effects phase (exitOK is false). It’s fine to say that exitOK is false if we’re not sure, but to say exitOK is true, we have to be completely sure. The IR validation checks that exitOK must become false after operations that do effects, that it becomes true again only at prescribed points (like a change in exit origin suggesting that we’ve ended the effects phase of one instruction and begun the speculation phase of the next one), and that no node that may exit has exitOK set to false. This validator helps prevent errors, like when dealing with bytecode operations that can be lowered to multiple effectful DFG nodes. One example is when put_by_id (i.e. something like o.f = v) is inferred to be a transition (the property f doesn’t exist on o so we need to add it), which results in two effects:

  • Storing a value v into the memory location for property o.f.
  • Changing o‘s structure to indicate that it now has an f.

The DFG IR for this will look something like:

CheckStructure(@o, S1)
PutByOffset(@o, @v, f)
PutStructure(@o, S2, ExitInvalid)

Note that PutStructure will be flagged with ExitInvalid, which is the way we say that exitOK is false in IR dumps. Failing to set exitOK to false for PutStructure would cause a validation error since PutByOffset (right before it) is an effect. This prevents us from making mistakes like replacing all uses of @o with some operation that could speculate, like:

a: FooBar(@o, Exits)
   CheckStructure(@a, S1)
b: FooBar(@o, Exits)
   PutByOffset(@b, @v, f)
c: FooBar(@o, Exits)
   PutStructure(@c, S2, ExitInvalid)

In this example, we’ve used some new FooBar operation, which may exit, as a filter on @o. It may seem absurd to instrument code this way, but it is a goal of DFG IR to:

  • Allow replacing uses of nodes with uses of other nodes that produce an equivalent value. Let’s assume that FooBar is an identity that also does some checks that may exit.
  • Allow inserting new nodes anywhere.

Therefore, the only bug here is that @c is right after the PutByOffset. The validator will complain that it is not marked ExitInvalid. It should be marked ExitInvalid because the previous node (PutByOffset) has an effect. But if you add ExitInvalid to @c, then the validator will complain that a node may exit with ExitInvalid. Any phase that tries to insert such a FooBar would have all the API it needs to realize that it will run into these failures. For example, it could ask the node that it’s inserting itself in front of (the PutStructure) whether it has ExitInvalid. Since it is ExitInvalid, we could do either of these things instead of inserting @c just before the PutStructure:

  1. We could use some other node that does almost what FooBar does but without the exit.
  2. We could insert @c earlier, so it can still exit.

Let’s look at what the second option would look like:

a: FooBar(@o, Exits)
   CheckStructure(@a, S1)
b: FooBar(@o, Exits)
c: FooBar(@o, Exits)
   PutByOffset(@b, @v, f)
   PutStructure(@c, S2, ExitInvalid)

Usually this is all it takes to deal with regions of code with !exitOK.

Note that in cases where something like FooBar absolutely needs to do a check after an effect, DFG IR does support exiting into the middle of a bytecode instruction. In some cases, we have no choice but to use that feature. This involves introducing extra non-bytecode state that can be passed down OSR exit, issuing OSR exit state updates before/after effects, using an exit origin that indicates that we’re exiting to some checkpoint in the middle of a bytecode instruction’s execution, and implementing a way to execute a bytecode starting at a checkpoint during OSR exit. It’s definitely possible, but not the sort of thing we want to have to do every time that some DFG node needs to do an effect. For this reason, canonical DFG IR use implies having !exitOK phases (aka effect phases) during some bytecode instructions’ execution.

Watchpoints and Invalidation. So far we have considered OSR exit for checks that the compiler emits. But the DFG compiler is also allowed to speculate by setting watchpoints in the JavaScript heap. If it finds something desirable — like that Math.sqrt points to the sqrt intrinsic function — it can often incorporate it into optimization without emitting checks. All that is needed is for the compiler to set a watchpoint on what it wants to prove (that the Math and sqrt won’t change). When the watchpoint fires, we want to invalidate the compiled code. That means making it so that the code never runs again:

  • no new calls to that function go to the optimized version and
  • all returns into that optimized function are redirected to go to baseline code instead.

Ensuring that new calls avoid optimized code is easy: we just patch all calls to the function to call the profiled code (Baseline, if available, or LLInt) instead. Handling returns is the interesting part.

One approach to handling invalidation is to walk the stack to find all returns to the invalidated code, and repoint those returns to an OSR exit. This would be troublesome for us due to our use of effects phases: it’s possible for multiple effects to happen in a row in a phase of DFG IR execution where it is not possible to exit. So, the DFG approach to invalidation involves letting the remaining effects of the current bytecode instruction finish executing in optimized code and then triggering an OSR exit right before the start of the next bytecode instruction.

Figure 30. How OSR exit and invalidation might work for hypothetical bytecodes.

Invalidation in DFG IR is enabled by the InvalidationPoint instruction, which is automatically inserted by the DFG frontend at the start of every exit origin that is preceded by effects that could cause a watchpoint to fire. InvalidationPoint is modeled as if it was a conditional OSR exit, and is given an OSR exit jump label as if there was a branch to link it to. But, InvalidationPoint emits no code. Instead, it records the location in the machine code where the InvalidationPoint would have been emitted. When a function is invalidated, all of those labels are overwritten with unconditional jumps to the OSR exit.

Figure 30 shows how OSR exit concepts like speculation and effect phases combine with InvalidationPoint for three hypothetical bytecode instructions. We make up intentionally absurd instructions because we want to show the range of possibilities. Let’s consider wat in detail. The first DFG IR node for wat is an InvalidationPoint, automatically inserted because the previous bytecode (foo) had an effect. Then wat does a CheckArray, which may exit but has no effects. So, the next DFG node, Wat, is still in the speculation phase. Wat is in a sort of perfect position in DFG IR: it is allowed to perform speculations and effects. It can perform speculations because no previous node for wat‘s exit origin has performed effects. It can also perform effects, but then the nodes after it (Stuff and Derp) cannot speculate anymore. But, they can perform more effects. Since wat has effects, an InvalidationPoint is immediately inserted at the start of the next bytecode (bar). Note that in this example, Foo, Wat, and StartBar are all in the perfect position (they can exit and have effects). Since Stuff, Derp, and FinishBar are in the effects region, the compiler will assert if they try to speculate.

Note that InvalidationPoint makes code layout tricky. On x86, the unconditional jump used by invalidation is five bytes. So, we must ensure that there are no other jump labels in the five bytes after an invalidation label. Otherwise, it would be possible for invalidation to cause one of those labels to point into the middle of a 5-byte invalidation jump. We solve this by adding nop padding to create at least a 5-byte gap between a label used for invalidation and any other kind of label.

To summarize, DFG IR has extensive support for OSR exit. We have a compact delta encoding of changes to OSR exit state. Exit destinations are encoded as an exit origin field in every DFG node. OSR exit due to invalidation is handled by automatic InvalidationPoint insertion.

Static Analysis

The DFG uses lots of static analysis to complement how it does speculation. This section covers three static analyses in the DFG that have particularly high impact:

  • We use prediction propagation to fill in predicted types for all values based on value profiling of some values. This helps us figure out where to speculate on type.
  • We use the abstract interpreter (or just AI for short in JavaScriptCore jargon) to find redundant OSR speculations. This helps us emit fewer OSR checks. Both the DFG and FTL include multiple optimization passes in their pipelines that can find and remove redundant checks but the abstract interpreter is the most powerful one. The abstract interpreter is the DFG tier’s primary optimization and it is reused with small enhancements in the FTL.
  • We use clobberize to get aliasing information about DFG operations. Given a DFG instruction, clobberize can describe the aliasing properties. In almost all cases that description is O(1) in time and space. That description implicitly describes a rich dependency graph.

Both the prediction propagator and the abstract interpreter work by forward-propagating type infromation. They’re both built on the principles of abstract interpretation. It’s useful to understand at least some of that theory, so let’s do a tiny review. Abstract interpreters are like normal interpreters, except that they describe program state abstractly rather than considering exact values. A classic example due to Kildall involves just remembering which variables have known constant values and forgetting any variable that may have more than one value. Abstract interpreters are run to fixpoint: we keep executing every instruction until we no longer observe any changes. We can execute forward (like Kildall) or backward (like liveness analysis). We can either have sets that shrink as we learn new things (like Kildall, where variables get removed if we learn that they may have more than one value) or we can have sets that grow (like liveness analysis, where we keep adding variables to the live set).

Now let’s go into more details about the two abstract interpreters and the alias analysis.

Prediction propagation. The prediction propagator’s abstract state comprises variable to speculated type (Figure 13) mappings. The speculated type is a set of fundamental types. The sets tell which types a value is predicted to have. The prediction propagator is not flow sensitive; it has one copy of the abstract state for all program statements. So, each execution of a statement considers the whole set of input types (even from program statements that can’t reach us) and joins the result with the speculated type of the result variable. Note that the input to the prediction propagator is a data flow IR, so multiple assignments to the same variable aren’t necessarily joined.

The prediction propagator doesn’t have to be sound. The worst case outcome of the prediction propagator being wrong is that we either:

  • do speculations that are too strong, and so we exit too much and then recompile.
  • do speculations that are too weak, so we run slower than we could forever.

Note that the second of those outcomes is generally worse. Recompiling and then speculating less at least means that the program eventually runs with the optimal set of speculations. Speculating too weakly and never recompiling means that we never get to optimal. Therefore, the prediction propagator is engineered to sometimes be unsound instead of conservative, since unsoundness can be less harmful.

The abstract interpreter. The DFG AI is the DFG tier’s most significant optimization. While there are many abstract interpreters throughout JavaScriptCore, this one is the biggest in terms of total code and the number of clients — hence to us it is the abstract interpreter.

The DFG AI’s abstract state comprises variable to abstract value mappings where each abstract value represents a set of possible JSValues that the variable could have. Those sets describe what type information we have proved from past checks. We join abstract states at control flow merge points. The solution after the fixpoint is a minimal solution (smallest possible sets that have a fixpoint). The DFG AI is flow-sensitive: it maintains a separate abstract state per instruction boundary. AI looks at the whole control flow graph at once but does not look outside the currently compiled function and whatever we inlined into it. AI is also sparse conditional.

The DFG abstract value representation has four sub-values:

  • Whether the value is known to be a constant, and if so, what that constant is.
  • The set of possible types (i.e. a SpeculatedType bitmap, shown in Figure 13).
  • The set of possible indexing types (also known as array modes) that the object pointed to by this value can have.
  • The set of possible structures that the object pointed to by this value can have. This set has special infinite set powers.

The last two sub-values can be mutated by effects. DFG AI assumes that all objects have escaped, so if an effect happens that can change indexing types and structures, then we have to clobber those parts of all live abstract values.

We interpret the four sub-values as follows: the abstract value represents the set of JSValues that reside in the intersection of the four sub-value sets. This means that when interpreting abstract values, we have the option of just looking at whichever sub-value is interesting to us. For example, an optimization that removes structure checks only needs to look at the structure set field.

Figure 31. Examples of check elimination with abstract interpretation.

The DFG AI gives us constant and type propagation simultaneously. The type propagation is used to remove checks, simplify checks, and replace dynamic operations with faster versions.

Figure 31 shows examples of checks that the DFG AI lets us remove. Note that in addition to eliminating obvious same-basic-block check redundancies (Figure 31(a)), AI lets us remove redundancies that span multiple blocks (like Figure 31(b) and (c)). For example, in Figure 31(c), the AI is able to prove that @x is an Int32 at the top of basic block #7 because it merges the Int32 states of @x from BB#5 and #6. Check elimination is usually performed by mutating the IR so that later phases know which checks are really necessary without having to ask the AI.

The DFG AI has many clients, including the DFG backend and the FTL-to-B3 lowering. Being an AI client means having access to its estimate of the set of JSValues that any variable or DFG node can have at any program point. The backends use this to simplify checks that were not removed. For example, the backend may see an Object-or-Undefined, ask AI about it, and find that AI already proved that that we must have either an object or a string. The backend will be able to combine those two pieces of information to only emit an is-object check and ignore the possibility of the value being undefined.

Type propagation also allows us to replace dynamic heap accesses with inlined ones. Most fast property accesses in DFG IR arise from inline cache feedback telling us that we should speculate, but sometimes the AI is able to prove something stronger than the profiler told us. This is especially likely in inlined code.

Clobberize. Clobberize is the alias analysis that the DFG uses to describe what parts of the program’s state an instruction could read and write. This allows us to see additional dependency edges between instructions beyond just the ones expressed as data flow. Dependency information tells the compiler what kinds of instruction reorderings are legal. Clobberize has many clients in both the DFG and FTL. In the DFG, it’s used for common subexpression elimination, for example.

To understand clobberize, it’s worth considering what it is about a program’s control flow that a compiler needs to remember. The control flow graph shows us one possible ordering of the program and we know that this ordering is legal. But both the DFG and FTL tiers want to move code around. The DFG tier mostly only moves code around within basic blocks rather than between them while the FTL tier can also move code between basic blocks. Even with the DFG’s block-local code motion, it’s necessary to know more than just the current ordering of the program. It’s also necessary to know how that ordering can be changed.

Some of this is already solved by the data flow graph. DFG IR provides a data flow graph that shows some of the dependencies between instructions. It’s obvious that if one instruction has a data flow edge to another, then only one possible ordering (source executes before sink) is valid. But what about:

  • Stores to memory.
  • Loads from memory.
  • Calls that can cause any effects.
  • OSR effects (like MovHint).

Data flow edges don’t talk about those dependencies. Data flow also cannot tell which instructions have effects at all. So, the data flow graph cannot tell us anything about the valid ordering of instructions if those instructions have effects.

The issue of how to handle dependencies that arise from effects is particularly relevant to JavaScript compilation — and speculative compilation in general — because of the precision about aliasing that speculation gives us. For example, although the JavaScript o.f operation could have any effect, after speculation we often know that it can only affect properties named f. Additionally, JavaScript causes us to have to emit lots of loads to fields that are internal to our object model and it’s good to know exactly when those loads are redundant so that we can remove as many of them as possible. So, we need to have the power to ask, for any operation that may access internal VM state, whether that state could be modified by any other operation, and we want that answer to be as precise as it can while being O(1)-ish.

Clobberize is a static analysis that augments the data flow and control flow graphs by telling us constraints on how instructions can be reordered. The neat thing about clobberize is that it avoids storing dependency information in the instructions themselves. So, while the compiler is free to query dependency information anytime it likes by running the analysis, it doesn’t have to do anything to maintain it.

Figure 32. Some of the abstract heap hierarchy. All heaps are subsets of World, which is subdivided into Heap, Stack and SideState. For example, JS function calls say that they write(Heap) and read(World). Subheaps of Heap include things like JSObject_butterfly, which refer to fields that are internal to the JSC object model and are not directly user-visible, and things like NamedProperties, a heap that contains subheaps for every named property the function accesses.

For each DFG instruction, clobberize reports zero or more reads or writes. Each read or write says which abstract heaps it is accessing. Abstract heaps are sets of memory locations. A read (or write) of an abstract heap means that the program will read (or write) from zero or more actual locations in that abstract heap. Abstract heaps form a hierarchy with World at the top (Figure 32). A write to World means that the effect could write to anything, so any read might see that write. The hierarchy can get very specific. For example, fully inferred, direct forms of property access like GetByOffset and PutByOffset report that they read and write (respectively) an abstract heap that names the property. So, accesses to properties of different names are known not to alias. The heaps are known to alias if either one is a descendant of the other.

It’s worth appreciating how clobberize combined with control flow is just a way of encoding a dependence graph. To build a dependence graph from clobberize information, we apply the following rule. If instruction B appears after instruction A in control flow, then we treat B as having a dependence edge to A (B depends on A) if:

  • any heap read by B overlaps any heap written by A, or
  • any heap written by B overlaps any heap read or written by A.

Conversely, any dependence graph can be expressed using clobberize. An absurd but correct representation would involve giving each edge in the dependence graph its own abstract heap and having the source of the edge write the heap while the sink reads it. But what makes clobberize such an efficient representation of dependence graphs is that every dependence that we’ve tried to represent can be intuitively described by reads and writes to a small collection of abstract heaps.

Those abstract heaps are either collections of concrete memory locations (for example the "foo" abstract heap is the set of memory locations used to represent the values of properties named “foo”) or they are metaphorical. Let’s explore some metaphorical uses of abstract heaps:

  • MovHint wants to say that it is not dead code, that it must be ordered with other MovHints, and that it must be ordered with any real program effects. We say this in clobberize by having MovHint write SideState. SideState is a subheap of World but disjoint from other things, and we have any operation that wants to be ordered with OSR exit state either read or write something that overlaps SideState. Note that DFG assumes that operations that may exit implicitly read(World) even if clobberize doesn’t say this, so MovHint’s write of SideState ensures ordering with exits.
  • NewObject wants to say that it’s not valid to hoist it out of loops because two successive executions of NewObject may produce different results. But it’s not like NewObject clobbers the world; for example if we had two accesses to the same property on either sides of a NewObject then we’d want the second one to be eliminated. DFG IR has many NewObject-like operations that also have this behavior. So, we introduce a new abstract heap called HeapObjectCount and we say that NewObject is metaphorically incrementing (reading and writing) the HeapObjectCount. HeapObjectCount is treated as a subheap of Heap but it’s disjoint from the subheaps that describe any state visible from JS. This is sufficient to block hoisting of NewObject while still allowing interesting optimizations to happen around it.
Figure 33. Sample sequence of DFG IR instructions and their dependence graph. DFG IR never stores the dependence graph in memory because we get the information implicitly by running clobberize.

The combination of clobberize and the control flow graph gives a scalable and intuitive way of expressing the dependence graph. It’s scalable because we don’t actually have to express any of the edges. Consider for example a dynamic access instruction that could read any named JavaScript property, like the Call instruction in Figure 33. Clobberize can say this in O(1) space and time. But a dependence graph would have to create an edge from that instruction to any instruction that accesses any named property before or after it. In short, clobberize gives us the benefit of a dependence graph without the cost of allocating memory to represent the edges.

The abstract heaps can also be efficiently collected into a set, which we use to summarize the aliasing effects of basic blocks and loops.

To summarize, the DFG puts a big emphasis on static analysis. Speculation decisions are made using a combination of profiling and an abstract interpreter called prediction propagation. Additionally, we have an abstract interpreter for optimization, simply called the DFG abstract interpreter, which serves as the main engine for redundant check removal. Abstract interpreters are a natural fit for the DFG because they give us a way to forward-propagate information about types. Finally, the DFG uses the clobberize analysis to describe dependencies and aliasing.

Fast Compilation

The DFG is engineered to compile quickly so that the benefits of OSR speculations can be realized quickly. To help reduce compile times, the DFG is focused about what optimizations it does and how it does them. The static analysis and OSR exit optimizations discussed so far represent the most powerful things that the DFG is capable of. The DFG does a quick and dirty job with everything else, like instruction selection, register allocation, and removal of redundant code that isn’t checks. Functions that benefit from the compiler doing a good job on those optimizations will get them if they run long enough to tier up into the FTL.

The DFG’s focus on fast compilation happened organically, as a result of many separate throughput-latency trade-offs. Initially, JavaScriptCore just had the Baseline JIT and then later Baseline as the profiling tier and DFG as the optimizing tier. The DFG experienced significant evolution during this time, and then experienced additional evolution after the FTL was introduced. While no single decision led to the DFG’s current design, we believe that it was most significantly shaped by tuning for short-running benchmarks and the introduction of the FTL.

The DFG was tuned for a diverse set of workloads. On the one hand, it was tuned for long-running tests in which one full second of warm-up was given to the speculative compiler for free (like the old V8 benchmarks, which live on in the form of Octane and JetStream, albeit without the freebie warmup), but on the other hand, it was also tuned for shorter-running benchmarks like SunSpider and page load tests. SunSpider focused on smallish programs running for very short bursts of time with little opportunity for warm-up. Compilers that do more optimizations than the DFG tend to lose to it on SunSpider because they fail to complete their optimizations before SunSpider finishes running. We continue to use tests that are in the spirit of SunSpider, like Speedometer and JetStream. Speedometer has a similar code-size-to-running-time ratio, so like SunSpider, it benefits a lot from DFG. JetStream includes a subset of SunSpider and puts a big emphasis on short-running code in all of its other tests. That’s not to say that we don’t also care about long-running code. It’s just that our methodology for improving the DFG was to try to get speed-ups on both short-running things and long-running things with the same engine. Since any long-running optimization would regress the short-running tests, we often avoided adding any long-running optimizations to the DFG. But we did add cheap versions of many sophisticated optimizations, giving respectable speed-ups on both short-running and long-running workloads.

The introduction of the FTL solidified the DFG’s position as the compiler that optimizes less. So long as the DFG generates reasonably good code quickly, we can get away with putting lots of expensive optimizations into the FTL. The FTL’s long compile times mean that many programs do not run long enough to benefit from the FTL. So, the DFG is there to give those programs a speculative optimization boost in way less time than an FTL-like compiler could do. Imagine a VM that only had one optimizing compiler. Unless that one compiler compiled as fast as the DFG and generated code that was as good as the FTL, it would end up being reliably slower than JavaScriptCore on some workloads. If that compiler compiled as fast as the DFG but didn’t have the FTL’s throughput then any program that ran long enough would run faster in JavaScriptCore. If that compiler generated code that was as good as the FTL but compiled slower than the DFG then any program that ran short enough to tier up into the DFG but not that compiler would run faster in JavaScriptCore. JavaScriptCore has multiple compiler tiers because we believe that it is not possible to build a compiler that compiles as fast as the DFG while generating code that is as good as the FTL.

To summarize, the DFG focuses on fast compilation because of the combination of the history of how it was tuned and the fact that it sits as the tier below the FTL JIT.

Figure 34. Illustration of a sample DFG IR program with all three graphs: local data flow, global data flow, and control flow.

The DFG compiler’s speed comes down to an emphasis on block-locality in the IR. The DFG IR used by the DFG tier has a two-level data flow graph:

  • Local data flow graph. The local data flow graph is used within basic blocks. This graph is a first-class citizen in the IR, when working with data flow in the DFG’s C++ code, it sometimes seems like this is the only data flow graph. DFG IR inside a basic block resembles SSA form in the sense that there’s a 1:1 mapping between instructions and the variables they assign and data flow is represented by having users of values point at the instructions (nodes) that produce those values. This representation does not allow you to use a value produced by an instruction in a different block except through tedious escape hatches.
  • Global data flow graph. We say global to mean the entire compilation unit, so some JS function and whatever the DFG inlined into it. So, global just means spanning basic blocks. DFG IR maintains a secondary data flow graph that spans basic blocks. DFG IR’s approach to global data flow is based on spilling: to pass a value to a successor block, you store it to a spill slot on the stack, and then that block loads it. But in DFG IR, we also thread data flow relationships through those loads and stores. This means that if you are willing to perform the tedious task of traversing this secondary data flow graph, you can get a global view of data flow.

Figure 34 shows an example of how this works. The compilation unit is represented as three graphs: a control flow graph, local data flow, and global data flow. Data flow graphs are represented with edges going from the user to the value being used. The local data flow graphs work like they do in SSA, so any SSA optimization can be run in a block-local manner on this IR. The global data flow graph is made of SetLocal/GetLocal nodes that store/load values into the stack. The data flow between SetLocal and GetLocal is represented completely in DFG IR, by threading data flow edges through special Phi nodes in each basic block where a local is live.

From the standpoint of writing outstanding high-throughput optimizations, this approach to IR design is like kneecapping the compiler. Compilers thrive on having actual SSA form, where there is a single data flow graph, and you don’t have to think about an instruction’s position in control flow when traversing data flow. The emphasis on locality is all about excellent compile times. We believe that locality gives us compile time improvements that we can’t get any other way:

  • Instruction selection and register allocation for a basic block can be implemented as a single pass over that basic block. The instruction selector can make impromptu register allocation decisions during that pass, like deciding that it needs any number of scratch registers to emit code for some DFG node. The combined instruction selector and register allocator (aka the DFG backend) compiles basic blocks independently of one another. This kind of code generation is good at register allocating large basic blocks but bad for small ones. For functions that only have a single basic block, the DFG often generates code that is as good as the FTL.
  • We never have to decompress the delta encoding of OSR exit. We just have the backend record a log of its register allocation decisions (the variable event stream). While the DFG IR for a function is thrown out after compilation, this log along with a minified version of the DFG IR (that only includes MovHints and the things they reference) is saved so that we can replay what the backend did whenever an OSR exit happens. This makes OSR exit handling super cheap in the DFG – we totally avoid the O(n2) complexity explosion of OSR stackmaps despite the fact that we speculate like crazy.
  • There is no need to enter or exit SSA. On the one hand, SSA conversion performance is a solved problem: it’s a nearly-linear-time operation. Even so, the constant factors are high enough that avoiding it entirely is profitable. Converting out of SSA is worse. If we wanted to combine SSA with our block-local backend, we’d have to add some sort of transformation that discovers how to load/store live state across basic blocks. DFG IR plays tricks where the same store that passes data flow to another block doubles as the OSR exit state update. It’s not obvious that exiting out of SSA would discover all of the cases where the same store can be reused for both OSR exit state update and the data flow edge. This suggests that any version of exiting out of SSA would make the DFG compiler either generate worse code or run slower. So, not having SSA makes the compiler run faster because entering SSA is not free and exiting SSA is awful.
  • Every optimization is faster if it is block-local. Of course, you could write block-local optimizations in an SSA IR. But having an IR that emphasizes locality is like a way to statically guarantee that we won’t accidentally introduce expensive compiler passes to the DFG.

The one case where global data flow is essential to the DFG’s mission is static analysis. This comes up in the prediction propagator and the abstract interpreter. Both of them use the global data flow graph in addition to the local data flow graphs, so that they can see how type information flows through the whole compilation unit. Fortunately, as shown in Figure 34, the global data flow graph is available. It’s in a format that makes it hard to edit but relatively easy to analyze. For example, it implicitly reports the set of live variables at each basic block boundary, which makes merging state in the abstract interpreter relatively cheap.

Figure 35. The DFG pipeline.

Figure 35 shows the complete DFG optimization pipeline. This is a fairly complete pipeline: it has classics like constant folding, control flow simplification, CSE, and DCE. It also has lots of JavaScript-specifics like deciding where to put checks (unification, prediction injection and propagation, prediction propagation, and fixup), a pass just to optimize common patterns of varargs, some passes for GC barriers, and passes that help OSR (CPS rethreading and phantom insertion). We can afford to do a lot of optimizations in the DFG so long as those optimizations are block-local and don’t try too hard. Still, this pipeline is way smaller than the FTL’s and runs much faster.

To summarize, the DFG compiler uses OSR exit and static analysis to emit an optimal set of type checks. This greatly reduces the number of type checks compared to running JavaScript in either of the profiled tiers. Because the benefit of type check removal is so big, the DFG compiler tries to limit how much time it spends doing other optimizations by restricting itself to a mostly block-local view of the program. This is a trade off that the DFG makes to get fast compile times. Functions that run long enough that they’d rather pay the compile time to get those optimizations end up tiering up to the FTL, which just goes all out for throughput.

FTL Compiler

We’ve previously documented some aspects of the FTL’s architecture in the original blog post and when we introduced B3. This section provides an updated description of this JIT’s capabilities as well as a deep dive into how FTL does OSR. We will structure our discussion of the FTL as follows. First we will enumerate what optimizations it is capable of. Then we will describe how it does OSR exit in detail. Finally we will talk about patchpoints — an IR operation based on a lambda.

All The Optimizations

The point of the FTL compiler is to run all the optimizations. This is a compiler where we never compromise on peak throughput. All of the DFG’s decisions that were known trade-offs in favor of compile time at the expense of throughput are reversed in the FTL. There is no upper limit on the amount of cycles that a function compiled with the FTL will run for, so it’s the kind of compiler where even esoteric optimizations have a chance to pay off eventually. The FTL combines multiple optimization strategies:

  • We reuse the DFG pipeline, including the weird IR. This ensures that any good thing that the DFG tier ever does is also available in the FTL.
  • We add a new DFG SSA IR and DFG SSA pipeline. We adapt lots of DFG phases to DFG SSA (which usually makes them become global rather than local). We add lots of new phases that are only possible in SSA (like loop invariant code motion).
  • We lower DFG SSA IR to B3 IR. B3 is an SSA-based optimizing JIT compiler that operates at the abstraction level of C. B3 has lots of optimizations, including global instructcion selection and graph coloring register allocation. The FTL was B3’s first customer, so B3 is tailored for optimizing at the abstraction level where DFG SSA IR can’t.

Having multiple ways of looking at the program gives the FTL maximum opportunities to optimize. Some of the compiler’s functionality, particularly in the part that decides where to put checks, thrives on the DFG’s weird IR. Other parts of the compiler work best in DFG SSA, like the DFG’s loop-invariant code motion. Lots of things work best in B3, like most reasoning about how to simplify arithmetic. B3 is the first IR that doesn’t know anything about JavaScript, so it’s a natural place to implement textbook optimization that would have difficulties with JavaScript’s semantics. Some optimizations, like CSE, work best when executed in every IR because they find unique opportunities in each IR. In fact, all of the IRs have the same fundamental optimization capabilities in addition to their specialized optimizations: CSE, DCE, constant folding, CFG simplification, and strength reductions (sometimes called peephole optimizations or instruction combining).

Figure 36. The FTL pipeline. Note that Lower DFG to B3 is in bold because it’s FTL’s biggest phase; sometimes when we say “FTL” we are just referring to this phase.

The no-compromise approach is probably best appreciated by looking at the FTL optimization pipeline in Figure 36. The FTL runs 93 phases on the code in encounters. This includes all phases from Figure 35 (the DFG pipeline), except Varargs Forwarding, only because it’s completely subsumed by the FTL’s Arguments Elimination. Let’s review some of the FTL’s most important optimizations:

  • DFG AI. This is one of the most important optimizations in the FTL. It’s mostly identical to the AI we run in the DFG tier. Making it work with SSA makes it slightly more precise and slightly more expensive. We run the AI a total of six times.
  • CSE (common subexpression elimination). We run this in DFG IR (Local Common Subexpression Elimination), DFG SSA IR (Global Common Subexpression Elimination), B3 IR (Reduce Strength and the dedicated Eliminate Common Subexpressions), and even in Air (Fix Obvious Spills, a CSE focused on spill code). Our CSEs can do value numbering and load/store elimination.
  • Object Allocation Sinking is a must-points-to analysis that we use to eliminate object allocations or sink them to slow paths. It can eliminate graphs of object allocations, including cyclic graphs.
  • Integer Range Optimization is a forward flow-sensitive abtract interpreter in which the state is a system of equations and inequalities that describe known relationships between variables. It can eliminate integer overflow checks and array bounds checks.
  • The B3 Reduce Strength phase runs a fixpoint that includes CFG simplification, constant folding, reassociation, SSA clean-up, dead code elimination, a light CSE, and lots of miscellaneous strength reductions.
  • Duplicate Tails, aka tail duplication, flattens some control flow diamonds, unswitches small loops, and undoes some cases of relooping. We duplicate small tails blindly over a CFG with critical edges broken. This allows us to achieve some of what splitting achieved for the original speculative compilers.
  • Lower B3 to Air is a global pattern matching instruction selector.
  • Allocate Registers By Graph Coloring implements the IRC and Briggs register allocators. We use IRC on x86 and Briggs on arm64. The difference is that IRC can find more opportunities for coalescing assignments into a single register in cases where there is high register pressure. Our register allocators have special optimizations for OSR exit, especially the OSR exits we emit for integer overflow checks.

OSR Exit in the FTL

Now that we have enumerated some of the optimizations that the FTL is capable of, let’s take a deep dive into how the FTL works by looking at how it compiles and does OSR. Let’s start with this example:

function foo(a, b, c)
{
    return a + b + c;
}

The relevant part of the bytecode sequence is:

[   7] add loc6, arg1, arg2
[  12] add loc6, loc6, arg3
[  17] ret loc6

Which results in the following DFG IR:

  24:  GetLocal(Untyped:@1, arg1(B<Int32>/FlushedInt32), R:Stack(6), bc#7)
  25:  GetLocal(Untyped:@2, arg2(C<BoolInt32>/FlushedInt32), R:Stack(7), bc#7)
  26:  ArithAdd(Int32:@24, Int32:@25, CheckOverflow, Exits, bc#7)
  27:  MovHint(Untyped:@26, loc6, W:SideState, ClobbersExit, bc#7, ExitInvalid)
  29:  GetLocal(Untyped:@3, arg3(D<Int32>/FlushedInt32), R:Stack(8), bc#12)
  30:  ArithAdd(Int32:@26, Int32:@29, CheckOverflow, Exits, bc#12)
  31:  MovHint(Untyped:@30, loc6, W:SideState, ClobbersExit, bc#12, ExitInvalid)
  33:  Return(Untyped:@3, W:SideState, Exits, bc#17)

The DFG data flow from the snippet above is illustrated in Figure 37 and the OSR exit sites are illustrated in Figure 38.

Figure 37. Data flow graph for FTL code generation example. Figure 38. DFG IR example with the two exiting nodes highlighted along with where they exit and what state is live when they exit.

We want to focus our discussion on the MovHint @27 and how it impacts the code generation for the ArithAdd @30. That ArithAdd is going to exit to the second add in the bytecode, which requires restoring loc6 (i.e. the result of the first add), since it is live at that point in bytecode (it also happens to be directly used by that add).

This DFG IR is lowered to the following in B3:

Int32 @42 = Trunc(@32, DFG:@26)
Int32 @43 = Trunc(@27, DFG:@26)
Int32 @44 = CheckAdd(@42:WarmAny, @43:WarmAny, generator = 0x1052c5cd0,
                     earlyClobbered = [], lateClobbered = [], usedRegisters = [],
                     ExitsSideways|Reads:Top, DFG:@26)
Int32 @45 = Trunc(@22, DFG:@30)
Int32 @46 = CheckAdd(@44:WarmAny, @45:WarmAny, @44:ColdAny, generator = 0x1052c5d70,
                     earlyClobbered = [], lateClobbered = [], usedRegisters = [],
                     ExitsSideways|Reads:Top, DFG:@30)
Int64 @47 = ZExt32(@46, DFG:@32)
Int64 @48 = Add(@47, $-281474976710656(@13), DFG:@32)
Void @49 = Return(@48, Terminal, DFG:@32)

CheckAdd is the B3 way of saying: do an integer addition, check for overflow, and if it overflows, execute an OSR exit governed by a generator. The generator is a lambda that is given a JIT generator object (that it can use to emit code at the jump destination of the OSR exit) and a stackmap generation parameters that tells the B3 value representation for each stackmap argument. The B3 value reps tell you which register, stack slot, or constant to use to get the value. B3 doesn’t know anything about how exit works except that it involves having a stackmap and a generator lambda. So, CheckAdd can take more than 2 arguments; the first two arguments are the actual add operands and the rest are the stackmap. It’s up to the client to decide how many arguments to pass to the stackmap and only the generator will ever get to see their values. In this example, only the second CheckAdd (@46) is using the stackmap. It passes one extra argument, @44, which is the result of the first add — just as we would expect based on MovHint @27 and the fact that loc6 is live at bc#12. This is the result of the FTL decompressing the delta encoding given by MovHints into full stackmaps for B3.

Figure 39. The stackmaps and stackmap-like mappings maintained by the FTL to enable OSR.

FTL OSR exit means tracking what happens with the values of bytecode locals through multiple stages of lowering. The various stages don’t know a whole lot about each other. For example, the final IRs, B3 and Air, know nothing about bytecode, bytecode locals, or any JavaScript concepts. We implement OSR exit by tracking multiple stackmap-like mappings per exit site that give us the complete picture when we glue them together (Figure 39):

  • The DFG IR stackmaps that we get be decompressing MovHint deltas. This gives a mapping from bytecode local to either a DFG node or a stack location. In some cases, DFG IR has to store some values to the stack to support dynamic variable introspection like via function.arguments. DFG OSR exit analysis is smart enough recognize those cases, since it’s more optimal to handle those cases by having OSR exit extract the value from the stack. Hence, OSR exit analysis may report that a bytecode local is available through a DFG node or a stack location.
  • The B3 value reps array inside the stackmap generation parameters that B3 gives to the generator lambdas of Check instructions like CheckAdd. This is a mapping from B3 argument index to a B3 value representation, which is either a register, a constant, or a stack location. By argument index we mean index in the stackmap arguments to a Check. This is three pieces of information: some user value (like @46 = CheckAdd(@44, @45, @44)), some index within its argument list (like 2), and the value that index references (@44). Note that since this CheckAdd has two argument indices for @44, that means that they may end up having different value representations. It’s not impossible for one to be a constant and another to be a register or spill slot, for example (though this would be highly unlikely; if it happened then it would probably be the result of some sound-but-inefficient antipattern in the compiler). B3’s client gets to decide how many stackmap arguments it will pass and B3 guarantees that it will give the generator a value representation for each argument index in the stackmap (so starting with argument index 2 for CheckAdd).
  • The FTL OSR exit descriptor objects, which the FTL’s DFG→B3 lowering creates at each exit site and holds onto inside the generator lambda it passes to the B3 check. Exit descriptors are based on DFG IR stackmaps and provide a mapping from bytecode local to B3 argument index, constant, stack slot, or materialization. If the DFG IR stackmap said that a bytecode local is a Node that has a constant value, then the OSR exit descriptor will just tell us that value. If the DFG stackmap said that a local is already on the stack, then the OSR exit descriptor will just tell that stack slot. It could be that the DFG stackmap tells us that the node is a phantom object allocation — an object allocation we optimized out but that needs to be rematerialized on OSR exit. If it is none of those things, the OSR exit descriptor will tell us which B3 argument index has the value of that bytecode local.
  • The FTL’s DFG→B3 lowering already maintains a mapping from DFG node to B3 value.
  • The FTL OSR Exit object, which is a mapping from bytecode local to register, constant, stack slot, or materialization. This is the final product of the FTL’s OSR exit handling and is computed lazily from the B3 value reps and FTL OSR exit descriptor.

These pieces fit together as follows. First we compute the DFG IR stackmap and the FTL’s DFG node to B3 value mapping. We get the DFG IR stackmap from the DFG OSR exit analysis, which the FTL runs in tandem with lowering. We get the DFG to B3 mapping implicitly from lowering. Then we use that to compute the FTL OSR exit descriptor along with the set of B3 values to pass to the stackmap. The DFG IR stackmap tells us which DFG nodes are live, so we turn that into B3 values using the DFG to B3 mapping. Some nodes will be excluded from the B3 stackmap, like object materializations and constants. Then the FTL creates the Check value in B3, passes it the stackmap arguments, and gives it a generator lambda that closes over the OSR exit descriptor. B3’s Check implementation figures out which value representations to use for each stackmap argument index (as a result of B3’s register allocator doing this for every data flow edge), and reports this to the generator as an array of B3 value reps. The generator then creates a FTL::OSRExit object that refers to the FTL OSR exit descriptor and value reps. Users of the FTL OSR exit object can figure out which register, stack slot, constant value, or materialization to use for any bytecode local by asking the OSR exit descriptor. That can tell the constant, spill slot, or materialization script to use. It can also give a stackmap argument index, in which case we load the value rep at that index, and that tells us the register, spill slot, or constant.

This approach to OSR exit gives us two useful properties. First, it empowers OSR-specific optimization. Second, it empowers optimizations that don’t care about OSR. Let’s go into these in more detail.

FTL OSR empowers OSR-specific optimizations. This happens in DFG IR and B3 IR. In DFG IR, OSR exit is a mutable part of the IR. Any operation can be optimized by adding more OSR exits and we even have the ability to move checks around. The FTL does sophisticated OSR-aware optimizations using DFG IR, like object allocation sinking. In B3 IR, OSR exit gets special register allocation treatment. The stackmap arguments of Check are understood by B3 to be cold uses, which means that it’s not expensive if those uses are spilled. This is powerful information for a register allocator. Additionally, B3 does special register allocation tricks for addition and subtraction with overflow checks (for example we can precisely identify when the result register can reuse a stackmap register and when we can coalesce the result register with one of the input registers to produce optimal two-operand form on x86).

FTL OSR also empowers optimizations that don’t care about OSR exit. In B3 IR, OSR exit decisions get frozen into stackmaps. This is the easiest representation of OSR exit because it requires no knowledge of OSR exit semantics to get right. It’s natural for compiler phases to treat extra arguments to an instruction opaquely. Explicit stackmaps are particularly affordable in B3 because of a combination of factors:

  1. the FTL is a more expensive compiler anyway so the DFG OSR delta encoding optimizations matter less,
  2. we only create stackmaps in B3 for exits that DFG didn’t optimize out, and
  3. B3 stackmaps only include a subset of live state (the rest may be completely described in the FTL OSR exit descriptor).

We have found that some optimizations are annoying, sometimes to the point of being impractical, to write in DFG IR because of explicit OSR exit (like MovHint deltas and exit origins). It’s not necessary to worry about those issues in B3. So far we have found that every textbook optimization for SSA is practical to do in B3. This means that we only end up having a bad time with OSR exit in our compiler when we are writing phases that benefit from DFG’s high-level knowledge; otherwise we write the phases in B3 and have a great time.

This has some surprising outcomes. Anytime FTL emits a Check value in B3, B3 may duplicate the Check. B3 IR semantics allow any code to be duplicated during optimization and this usually happens due to tail duplication. Not allowing code duplication would restrict B3 more than we’re comfortable doing. So, when the duplication happens, we handle it by having multiple FTL OSR exits share the same OSR exit descriptor but get separate value reps. It’s also possible for B3 to prove that some Check is either unnecessary (always succeeds) or is never reached. In that case, we will have one FTL OSR exit descriptor but zero FTL OSR exits. This works in such a way that DFG IR never knows that the code was duplicated and B3’s tail duplication and unreachable code elimination know nothing about OSR exit.

Patchpoints: Lambdas in the IR

This brings us to the final point about the FTL. We think that what is most novel about this compiler is its use of lambdas in its IRs. Check is one example of this. The DFG has some knowledge about what a Check would do at the machine code level, but that knowledge is incomplete until we fill in some blanks about how B3 register-allocated some arguments to the Check. The FTL handles this by having one of the operands to a B3 Check be a lambda that takes a JIT code generator object and value representations for all of the arguments. We like this approach so much that we also have B3 support Patchpoint. A Patchpoint is like an inline assembly snippet in a C compiler, except that instead of a string containing assembly, we pass a lambda that will generate that assembly if told how to get its arguments and produce its result. The FTL uses this for a bunch of cases:

  • Anytime the B3 IR generated by the FTL interacts with JavaScriptCore’s internal ABI. This includes all calls and call-like instructions.
  • Inline caches. If the FTL wants to emit an inline cache, it uses the same inline cache code generation logic that the DFG and baseline use. Instead of teaching B3 how to do this, we just tell B3 that it’s a patchpoint.
  • Lazy slow paths. The FTL has the ability to only emit code for a slow path if that slow path executes. We implement that using patchpoints.
  • Instructions we haven’t added to B3 yet. If we find some JavaScript-specific CPU instruction, we don’t have to thread it through B3 as a new opcode. We can just emit it directly using a Patchpoint. (Of course, threading it through B3 is a bit better, but it’s great that it’s not strictly necessary.)

Here’s an example of the FTL using a patchpoint to emit a fast double-to-int conversion:

if (MacroAssemblerARM64::
    supportsDoubleToInt32ConversionUsingJavaScriptSemantics()) {
    PatchpointValue* patchpoint = m_out.patchpoint(Int32);
    patchpoint->appendSomeRegister(doubleValue);
    patchpoint->setGenerator(
        [=] (CCallHelpers& jit,
             const StackmapGenerationParams& params) {
            jit.convertDoubleToInt32UsingJavaScriptSemantics(
                params[1].fpr(), params[0].gpr());
        });
    patchpoint->effects = Effects::none();
    return patchpoint;
}

This tells B3 that it’s a Patchpoint that returns Int32 and takes a Double. Both are assumed to go in any register of B3’s choice. Then the generator uses a C++ lambda to emit the actual instruction using our JIT API. Finally, the patchpoint tells B3 that the operation has no effects (so it can be hoisted, killed, etc).

This concludes our discussion of the FTL. The FTL is our high throughput compiler that does every optimization we can think of. Because it is a speculative compiler, a lot of its design is centered around having a balanced handling of OSR exit, which involves a separation of concerns between IRs that know different amounts of things about OSR. A key to the FTL’s power is the use of lambdas in B3 IR, which allows B3 clients to configure how B3 emits machine code for some operations.

Summary of Compilation and OSR

To summarize, JavaScriptCore has two optimizing compilers, the DFG and FTL. They are based on the same IR (DFG IR), but the FTL extends this with lots of additional compiler technology (SSA and multiple IRs). The DFG is a fast compiler: it’s meant to compile faster than typical optimizing compilers. But, it generates code that is usually not quite optimal. If that code runs long enough, then it will also get compiled with the FTL, which tries to emit the best code possible.

Related Work

The idea of using feedback from cheap profiling to speculate was pioneered by the Hölzle, Chambers, and Ungar paper on polymorphic inline caches, which calls this adaptive compilation. That work used a speculation strategy based on splitting, which means having the compiler emit many copies of code, one for each possible type. The same three authors later invented OSR exit, though they called it dynamic deoptimization and only used it to enhance debugging. Our approach to speculative compilation means using OSR exit as our primary speculation strategy. We do use splitting in a very limited sense: we emit diamond speculations in those cases where we are not sure enough to use OSR and then we allow tail duplication to split the in-between code paths if they are small enough.

This speculative compilation technique, with OSR or diamond speculations but not so much splitting, first received extraordinary attention during the Java performance wars. Many wonderful Java VMs used combinations of interpreters and JITs with varied optimization strategies to profile virtual calls and speculatively devirtualize them, with the best implementations using inline caches, OSR exit, and watchpoints. Java implementations that used variants of this technique include (but are not limited to):

  • the IBM JIT, which combined an interpreter and an optimizing JIT and did diamond speculations for devirtualization.
  • HotSpot and HotSpot server, which combined an interpreter and an optimizing JIT and used diamond speculations, OSR exit, and lots of other techniques that JavaScriptCore uses. JavaScriptCore’s FTL JIT is similar to HotSpot server in the sense that both compilers put a big emphasis on great OSR support, comprehensive low-level optimizations, and graph coloring register allocation.
  • Eclipse J9, a major competitor to HotSpot that also uses speculative compilation.
  • Jikes RVM, a research VM that used OSR exit but combined a baseline JIT and an optimizing JIT. I learned most of what I know about this technique from working on Jikes RVM.

Like Java, JavaScript has turned out to be a great use case for speculative compilation. Early instigators in the JavaScript performance war included the Squirrelfish interpreter (predecessor to LLInt), the Squirrelfish Extreme JIT (what we now call the Baseline JIT), the early V8 engine that combined a baseline JIT with inline caches, and TraceMonkey. TraceMonkey used a cheap optimizing JIT strategy called tracing, which compiles lots of speculative paths. This JIT sometimes outperformed the baseline JITs, but often lost to them due to overspeculation. V8 upped the ante by introducing the speculative compilation approach to JavaScript, using the template that had worked so well in Java: a lower tier that does inline caches, then an optimizing JIT (called Crankshaft) that speculates based on the inline caches and exits to the lower tier. This version of V8 used a pair of JITs (baseline JIT and optimizing JIT), much like Jikes RVM did for Java. JavaScriptCore soon followed by hooking up the DFG JIT as an optimizing tier for the baseline JIT, then adding the LLInt and FTL JIT. During about the same time, TraceMonkey got replaced with IonMonkey, which uses similar techniques to Crankshaft and DFG. The ChakraCore JavaScript implementation also used speculative compilation. JavaScriptCore and V8 have continued to expand their optimizations with innovative compiler technology like B3 (a CFG SSA compiler) and TurboFan (a sea-of-nodes SSA compiler). Much like for Java, the top implementations have at least two tiers, with the lower one used to collect profiling that the upper one uses to speculate. And, like for Java, the fastest implementations are built around OSR speculation.

Conclusion

JavaScriptCore includes some exciting speculative compiler technology. Speculative compilation is all about speeding up dynamically typed programs by placing bets on what types the program would have had if it could have types. Speculation uses OSR exit, which is expensive, so we engineer JavaScriptCore to make speculative bets only if they are a sure thing. Speculation involves using multiple execution tiers, some for profiling, and some to optimize based on that profiling. JavaScriptCore includes four tiers to also get an ideal latency/throughput trade-off on a per-function basis. A control system chooses when to optimize code based on whether it’s hot enough and how many times we’ve tried to optimize it in the past. All of the tiers use a common IR (bytecode in JavaScriptCore’s case) as input and provide independent implementation strategies with different throughput/latency and speculation trade-offs.

This post is an attempt to demystify our take on speculative compilation. We hope that it’s a useful resource for those interested in JavaScriptCore and for those interested in building their own fast language implementations (especially the ones with really weird and funny features).

July 29, 2020 05:00 PM

July 16, 2020

Release Notes for Safari Technology Preview 110

Surfin’ Safari

Safari Technology Preview Release 110 is now available for download for macOS Big Sur and macOS Catalina. If you already have Safari Technology Preview installed, you can update in the Software Update pane of System Preferences on macOS.

This release covers WebKit revisions 263214-263988.

WebRTC

  • Added a functional WebRTC VP9 codec (r263734, r263820)
  • Allowed registering VP9 as a VT decoder (r263894)
  • Added support for freeze and pause receiver stats (r263351)
  • Added MediaRecorder.onstart support (r263671, r263896)
  • Changed MediaRecorder to support peer connection remote video tracks (r263928)
  • Enabled VTB required low latency code path (r263931)
  • Fixed MediaRecorder stopRecorder() returning an empty Blob after first use (r263511, r263633, r263891)
  • Fixed MediaRecorder.start() Method ignoring the timeslice parameter (r263565, r263651, r263892)
  • Fixed RTCDataChannel.bufferedAmount to stay the same even if channel is closed (r263655)
  • Updated the max width and height for mock sources (r263844)

Web Authentication

  • Improved UI for PIN entry for security keys

Web Animations

  • Keyframe animation with infinite iteration count doesn’t show up in the Animations timeline (r263400)

Web API

  • Changed to require a <form> to be connected before it can be submitted (r263624)
  • Fixed window.location.replace with invalid URLs to throw (r263647)
  • Fixed the behavior when setting url.search="??" (two question marks) (r263637)
  • Changed to allow selecting HEIF images if the ‘accept’ attribute includes an image MIME type that the platform can transcode (r263949)
  • Added referrerpolicy attribute support for <link> (r263356, r263442)
  • Allow setting empty host/hostname on URLs if they use file scheme (r263971)
  • Allow the async clipboard API to write data when copying via menu action or key binding (r263480)

Media

  • Changed to check for mode=“showing” to consider a text track as selected in the tracks panel (r263802)

CSS

  • Changed to allow indefinite size flex items to be definite with respect to resolving percentages inside them (r263399)
  • Changed to not include scrollbar extents when computing sizes for percentage resolution (r263794)
  • Fixed pointer events (click/hover/etc) passing through flex items, if they have negative margin (r263659)

Layout

  • Changed to resolve viewport units against the preferred content size (r263311)

Rendering

  • Fixed overlapping content when margin-right is present (r263550)
  • Fixed content sometimes missing in nested scrollers with border-radius (r263578)

Accessibility

  • Fixed honoring aria-modal nodes wrapped in aria-hidden (r263673)
  • Implemented relevant simulated key presses for custom ARIA widgets for increment and decrement (r263823)

Bug Fixes

  • Fixed the indeterminate progress bar animation periodically jumping in macOS Big Sur (r263952)

JavaScript

  • Enabled RelativeTimeFormat and Locale by default (r263227)
  • Configured option-offered numberingSystem in Intl.NumberFormat through locale (r263837)
  • Changed Intl.Collator to set usage:”search” option through ICU locale (r263833)
  • Fixed Promise built-in functions to be anonymous non-constructors (r263222)
  • Fixed incorrect TypedArray.prototype.set with primitives (r263216)

Storage Access API

  • Added the capability to call the Storage Access API as a quirk, on behalf of websites that should be doing it themselves (r263383)

Text Manipulation

  • Updated text manipulation to exclude text rendered using icon-only fonts (r263527)
  • Added a new text manipulation heuristic to decide paragraph boundary (r263958)

Security

  • Enabled referrer policy attribute support by default (r263274)
  • Changed image crossorigin mutations to be considered “relevant mutations” (r263345, r263350)

Web Inspector

  • Added a tooltip to the icon of resources replaced by a local override explaining what happened (r263429)
  • Allow selecting text of Response (DOM Tree) in Network tab (r263872)
  • Adjusted the height of title area when Web Inspector is undocked to match macOS Big Sur (r263377, r263402)

July 16, 2020 05:35 PM

July 12, 2020

Manuel Rego: Open Prioritization and CSS Containment

Igalia WebKit

Igalia is a major contributor to all the open source web rendering engines (Blink, Gecko, Servo and WebKit). We have been doing different kind of contributions for years, which has led us to have an important position on the different communities. This allows us to help our customers to solve their problems through upstream contributions that also benefit the whole web community.

Implementing a feature in a rendering engine (or in several) might look very simple at first sight, but contributing them upstream can take a while depending on the standarization status, the related bugs, the browser architecture, and many other factors. You can find examples of things implemented by Igalia in the past on my previous blog posts, and you will realize about all the work behind some of those features.

There’s a common thing everywhere, people usually get really angry because that bug they reported years ago is still not fixed in a given browser. That can be for a variety of reasons, and not simply because the developers of that browser are very lazy and not paying attention to that particular bug. In many cases the answer to why that hasn’t been solved yet is pretty simple: priorities. Different companies and individuals contributing to the projects have their own interests and priorities, they prioritize the different issues and tasks and put the focus and effort on the ones that have a higher priority for them. A possible solution for that, now that major browsers are all open source, would be to look for a consulting company like Igalia that can fix that bug for you; but you as an individual, or even as a company, maybe you don’t have the budget to make that happen.

What would happen if we allow several parties to contribute together to the development of some features? That would make possible that both individuals and organizations that don’t have the power to implement them alone, could contribute their piece of the cake in order to add support for those features on the web platform.

Open Prioritization

Igalia is launching Open Prioritization, a crowd-founding campaign for the web platform. We believe this can open the door to many different people and organizations to prioritize the development of some features on the different web engines. Initially we have defined 6 tasks that can be found on the website, together with a FAQ explaining all the details of the campaign. 🚀

Let’s hope we can make this happen. If this is a success and some of these items get funded and implemented, probably there’ll be more in the future, including new things or ideas that you can share with us.

Open Prioritization by Igalia. An experiment in crowd-funding prioritization. Open Prioritization by Igalia

One of the tasks of the Open Prioritization campaign we’re starting this week is about adding CSS Containment support in WebKit, and we have experience working on that in Chromium.

Why CSS Containment in WebKit?

Briefly speaking CSS Containment is a standard focused in improving the rendering performance of web pages, it allows author to isolate DOM subtrees from the rest of the document, so any change that happens on the “contained” subtree doesn’t affect anything outside that element.

This is the spec behind the contain property, that can have a few values defining the “type of containment”: layout, paint, size and style. I’m not going to go deeper into this and I’ll refer to my introductory post or my CSSconf EU talk if you’re interested in getting more details about this specification.

So why we think this is important? Currently we have an issue with CSS Containment, it’s supported in Chromium and Firefox (except style containment) but not in WebKit. This might be not a big deal as it’s a performance oriented feature, so if you don’t have support you’ll simply have a worse performance and that’s all. But that’s not completely true as the different type of containments have some restrictions that apply to the contained element (e.g. layout containment makes the element become the containing block of positioned descendants), which might cause interoperability issues if you start to use the contain property in your websites.

The main goal of this task would be add CSS Containment support in WebKit, at least to the level that it’s spec compliant with the other implementations, and if time permits to implement some optimizations based on it. Once we have interoperability you can start using it wihtout any concern in your web pages, as the behavior won’t change between the different browsers and you might get some perf improvements (that will vary depending on each browser implementation).

In addition this will allow WebKit to implement further optimizations thanks to the information that the web authors provide through the contain property. On top of that, this initial support is a requirement in order to implement new features that are based on it; like the new CSS properties content-visibility and contain-intrinsic-size which are part of Display Locking feature.

If you think this is an important feature for you, please go ahead and do your pledge so it can get prioritized and implemented in WebKit upstream.

Really looking forward to seeing how this Open Prioritization campaign goes in the coming weeks. 🤞

July 12, 2020 10:00 PM

July 05, 2020

Frédéric Wang: Contributions to Web Platform Interoperability (First Half of 2020)

Igalia WebKit

Note: This blog post was co-authored by AMP and Igalia teams.

Web developers continue to face challenges with web interoperability issues and a lack of implementation of important features. As an open-source project, the AMP Project can help represent developers and aid in addressing these challenges. In the last few years, we have partnered with Igalia to collaborate on helping advance predictability and interoperability among browsers. Standards and the degree of interoperability that we want can be a long process. New features frequently require experimentation to get things rolling, course corrections along the way and then, ultimately as more implementations and users begin exploring the space, doing really interesting things and finding issues at the edges we continue to advance interoperability.

Both AMP and Igalia are very pleased to have been able to play important roles at all stages of this process and help drive things forward. During the first half of this year, here’s what we’ve been up to…

Default Aspect Ratio of Images

In our previous blog post we mentioned our experiment to implement the intrinsic size attribute in WebKit. Although this was a useful prototype for standardization discussions, at the end there was a consensus to switch to an alternative approach. This new approach addresses the same use case without the need of a new attribute. The idea is pretty simple: use specified width and height attributes of an image to determine the default aspect ratio. If additional CSS is used e.g. “width: 100%; height: auto;”, browsers can then compute the final size of the image, without waiting for it to be downloaded. This avoids any relayout that could cause bad user experience. This was implemented in Firefox and Chromium and we did the same in WebKit. We implemented this under a flag which is currently on by default in Safari Tech Preview and the latest iOS 14 beta.

Scrolling

We continued our efforts to enhance scroll features. In WebKit, we began with scroll-behavior, which provides the ability to do smooth scrolling. Based on our previous patch, it has landed and is guarded by an experimental flag “CSSOM View Smooth Scrolling” which is disabled by default. Smooth scrolling currently has a generic platform-independent implementation controlled by a timer in the web process, and we continue working on a more efficient alternative relying on the native iOS UI interfaces to perform scrolling.

We have also started to work on overscroll and overscroll customization, especially for the scrollend event. The scrollend event, as you might expect, is fired when the scroll is finished, but it lacked interoperability and required some additional tests. We added web platform tests for programmatic scroll and user scroll including scrollbar, dragging selection and keyboard scrolling. With these in place, we are now working on a patch in WebKit which supports scrollend for programmatic scroll and Mac user scroll.

On the Chrome side, we continue working on the standard scroll values in non-default writing modes. This is an interesting set of challenges surrounding the scroll API and how it works with writing modes which was previously not entirely interoperable or well defined. Gaining interoperability requires changes, and we have to be sure that those changes are safe. Our current changes are implemented and guarded by a runtime flag “CSSOM View Scroll Coordinates”. With the help of Google engineers, we are trying to collect user data to decide whether it is safe to enable it by default.

Another minor interoperability fix that we were involved in was to ensure that the scrolling attribute of frames recognizes values “noscroll” or “off”. That was already the case in Firefox and this is now the case in Chromium and WebKit too.

Intersection and Resize Observers

As mentioned in our previous blog post, we drove the implementation of IntersectionObserver (enabled in iOS 12.2) and ResizeObserver (enabled in iOS 14 beta) in WebKit. We have made a few enhancements to these useful developer APIs this year.

Users reported difficulties with observe root of inner iframe and the specification was modified to accept an explicit document as a root parameter. This was implemented in Chromium and we implemented the same change in WebKit and Firefox. It is currently available Safari Tech Preview, iOS 14 beta and Firefox 75.

A bug was also reported with ResizeObserver incorrectly computing size for non-default zoom levels, which was in particular causing a bug on twitter feeds. We landed a patch last April and the fix is available in the latest Safari Tech Preview and iOS 14 beta.

Resource Loading

Another thing that we have been concerned with is how we can give more control and power to authors to more effectively tell the browser how to manage the loading of resources and improve performance.

The work that we started in 2019 on lazy loading has matured a lot along with the specification.

The lazy image loading implementation in WebKit therefore passes the related WPT tests and is functional and comparable to the Firefox and Chrome implementations. However, as you might expect, as we compare uses and implementation notes it becomes apparent that determining the moment when the lazy image load should start is not defined well enough. Before this can be enabled in releases some more work has to be done on improving that. The related frame lazy loading work has not started yet since the specification is not in place.

We also added an implementation for stale-while-revalidate. The stale-while-revalidate Cache-Control directive allows a grace period in which the browser is permitted to serve a stale asset while the browser is checking for a newer version. This is useful for non-critical resources where some degree of staleness is acceptable, like fonts. The feature has been enabled recently in WebKit trunk, but it is still disabled in the latest iOS 14 beta.

Contributions were made to improve prefetching in WebKit taking into account its cache partitioning mechanism. Before this work can be enabled some more patches have to be landed and possibly specified (for example, prenavigate) in more detail. Finally, various general Fetch improvements have been done, improving the fetch WPT score. Examples are:

What’s next

There is still a lot to do in scrolling and resource loading improvements and we will continue to focus on the features mentioned such as scrollend event, overscroll behavior and scroll behavior, lazy loading, stale-while-revalidate and prefetching.

As a continuation of the work done for aspect ratio calculation of images, we will consider the more general CSS aspect-ratio property. Performance metrics such as the ones provided by the Web Vitals project is also critical for web developers to ensure that their websites provide a good user experience and we are willing to investigate support for these in Safari.

We love doing this work to improve the platform and we’re happy to be able to collaborate in ways that contribute to bettering the web commons for all of us.

July 05, 2020 10:00 PM

July 02, 2020

Philippe Normand: Web-augmented graphics overlay broadcasting with WPE and GStreamer

Igalia WebKit

Graphics overlays are everywhere nowadays in the live video broadcasting industry. In this post I introduce a new demo relying on GStreamer and WPEWebKit to deliver low-latency web-augmented video broadcasts.

Readers of this blog might remember a few posts about WPEWebKit and a GStreamer element we at Igalia worked on …

By Philippe Normand at July 02, 2020 01:00 PM

June 30, 2020

Enrique Ocaña: Developing on WebKitGTK with Qt Creator 4.12.2

Igalia WebKit

After the latest migration of WebKitGTK test bots to use the new SDK based on Flatpak, the old development environment based on jhbuild became deprecated. It can still be used with export WEBKIT_JHBUILD=1, though, but support for this way of working will gradually fade out.

I used to work on a chroot because I love the advantages of having an isolated and self-contained environment, but an issue in the way bubblewrap manages mountpoints basically made it impossible to use the new SDK from a chroot. It was time for me to update my development environment to the new ages and have it working in my main Kubuntu 18.04 distro.

My mail goal was to have a comfortable IDE that follows standard GUI conventions (that is, no emacs nor vim) and has code indexing features that (more or less) work with the WebKit codebase. Qt Creator was providing all that to me in the old chroot environment thanks to some configuration tricks by Alicia, so it should be good for the new one.

I preferred to use the Qt Creator 4.12.2 offline installer for Linux, so I can download exactly the same version in the future in case I need it, but other platforms and versions are also available.

The WebKit source code can be downloaded as always using git:

git clone git.webkit.org/WebKit.git

It’s useful to add WebKit/Tools/Scripts and WebKit/Tools/gtk to your PATH, as well as any other custom tools you may have. You can customize your $HOME/.bashrc for that, but I prefer to have an env.sh environment script to be sourced from the current shell when I want to enter into my development environment (by running webkit). If you’re going to use it too, remember to adjust to your needs the paths used there.

Even if you have a pretty recent distro, it’s still interesting to have the latests Flatpak tools. Add Alex Larsson’s PPA to your apt sources:

sudo add-apt-repository ppa:alexlarsson/flatpak

In order to ensure that your distro has all the packages that webkit requires and to install the WebKit SDK, you have to run these commands (I omit the full path). Downloading the Flatpak modules will take a while, but at least you won’t need to build everything from scratch. You will need to do this again from time to time, every time the WebKit base dependencies change:

install-dependencies
update-webkitgtk-libs

Now just build WebKit and check that MiniBrowser works:

build-webkit --gtk
run-minibrowser --gtk

I have automated the previous steps as go full-rebuild and runtest.sh.

This build process should have generated a WebKit/WebKitBuild/GTK/Release/compile_commands.json
file with the right parameters and paths used to build each compilation unit in the project. This file can be leveraged by Qt Creator to get the right include paths and build flags after some preprocessing to translate the paths that make sense from inside Flatpak to paths that make sense from the perspective of your main distro. I wrote compile_commands.sh to take care of those transformations. It can be run manually or automatically when calling go full-rebuild or go update.

The WebKit way of managing includes is a bit weird. Most of the cpp files include config.h and, only after that, they include the header file related to the cpp file. Those header files depend on defines declared transitively when including config.h, but that file isn’t directly included by the header file. This breaks the intuitive rule of “headers should include any other header they depend on” and, among other things, completely confuse code indexers. So, in order to give the Qt Creator code indexer a hand, the compile_commands.sh script pre-includes WebKit.config for every file and includes config.h from it.

With all the needed pieces in place, it’s time to import the project into Qt Creator. To do that, click File → Open File or Project, and then select the compile_commands.json file that compile_commands.sh should have generated in the WebKit main directory.

Now make sure that Qt Creator has the right plugins enabled in Help → About Plugins…. Specifically: GenericProjectManager, ClangCodeModel, ClassView, CppEditor, CppTools, ClangTools, TextEditor and LanguageClient (more on that later).

With this setup, after a brief initial indexing time, you will have support for features like Switch header/source (F4), Follow symbol under cursor (F2), shading of disabled if-endif blocks, auto variable type resolving and code outline. There are some oddities of compile_commands.json based projects, though. There are no compilation units in that file for header files, so indexing features for them only work sometimes. For instance, you can switch from a method implementation in the cpp file to its declaration in the header file, but not the opposite. Also, you won’t see all the source files under the Projects view, only the compilation units, which are often just a bunch of UnifiedSource-*.cpp files. That’s why I prefer to use the File System view.

Additional features like Open Type Hierarchy (Ctrl+Shift+T) and Find References to Symbol Under Cursor (Ctrl+Shift+U) are only available when a Language Client for Language Server Protocol is configured. Fortunately, the new WebKit SDK comes with the ccls C/C++/Objective-C language server included. To configure it, open Tools → Options… → Language Client and add a new item with the following properties:

  • Name: ccls
  • Language: *.c;.cpp;*.h
  • Startup behaviour: Always On
  • Executable: /home/enrique/work/webkit/WebKit/Tools/Scripts/webkit-flatpak
  • Arguments: --gtk -c ccls --index=/home/enrique/work/webkit/WebKit

Some “LanguageClient ccls: Unexpectedly finished. Restarting in 5 seconds.” errors will appear in the General Messages panel after configuring the language client and every time you launch Qt Creator. It’s just ccls taking its time to index the whole source code. It’s “normal”, don’t worry about it. Things will get stable and start to work after some minutes.

Due to the way the Locator file indexer works in Qt Creator, it can become confused, run out of memory and die if it finds cycles in the project file tree. This is common when using Flatpak and running the MiniBrowser or the tests, since /proc and other large filesystems are accessible from inside WebKit/WebKitBuild. To avoid that, open Tools → Options… → Environment → Locator and set Refresh interval to 0 min.

I also prefer to call my own custom build and run scripts (go and runtest.sh) instead of letting Qt Creator build the project with the default builders and mess everything. To do that, from the Projects mode (Ctrl+5), click on Build & Run → Desktop → Build and edit the build configuration to be like this:

  • Build directory: /home/enrique/work/webkit/WebKit
  • Add build step → Custom process step
    • Command: go (no absolute route because I have it in my PATH)
    • Arguments:
    • Working directory: /home/enrique/work/webkit/WebKit

Then, for Build & Run → Desktop → Run, use these options:

  • Deployment: No deploy steps
  • Run:
    • Run configuration: Custom Executable → Add
      • Executable: runtest.sh
      • Command line arguments:
      • Working directory:

With these configuration you can build the project with Ctrl+B and run it with Ctrl+R.

I think I’m not forgetting anything more regarding environment setup. With the instructions in this post you can end up with a pretty complete IDE. Here’s a screenshot of it working in its full glory:

Anyway, to be honest, nothing will ever reach the level of code indexing features I got with Eclipse some years ago. I could find usages of a variable/attribute and know where it was being read, written or read-written. Unfortunately, that environment stopped working for me long ago, so Qt Creator has been the best I’ve managed to get for a while.

Properly configured web based indexers such as the Searchfox instance configured in Igalia can also be useful alternatives to a local setup, although they lack features such as type hierarchy.

I hope you’ve found this post useful in case you try to setup an environment similar to the one described here. Enjoy!

By eocanha at June 30, 2020 03:47 PM

June 26, 2020

App-Bound Domains

Surfin’ Safari

Many applications use WKWebView as a convenient way to display websites without requiring users to leave the app, referred to as in-app browsing. Although this can provide a great user experience, the powerful features available to developers using WKWebView allow a hosting app to monitor users across all of the sites they visit within the app.

Powerful WKWebView features, such as JavaScript injection, event handlers, and other APIs can be used by applications or utility frameworks in intrusive ways to communicate with known trackers seeking to collect and aggregate personal information about users. These tactics can reveal which images a user pauses on, what content they copy/paste, and which sections of pages they reach while scrolling.

For iOS 14.0 and iPadOS 14.0, we want to make it possible for developers to continue offering an in-app browsing experience without exposing users to tracking risks. Today we are introducing App-Bound Domains, a new, opt-in WKWebView technology to improve in-app browsing by offering greater privacy to users.

App-Bound Domains

The App-Bound Domains feature takes steps to preserve user privacy by limiting the domains on which an app can utilize powerful APIs to track users during in-app browsing. Applications that opt-in to this new feature can specify up to 10 “app-bound” domains using a new Info.plist key — WKAppBoundDomains. Note that content supplied by the app through local files, data URLs, and HTML strings are always treated as app bound domains, and do not need to be listed.

<plist version="1.0">
<dict>
<key>WKAppBoundDomains</key>
<array>
    <string>example1.com</string>
    <string>example2.org</string>
    ...
</array>
</dict>

Once the WKAppBoundDomains key is added to the Info.plist, all WKWebView instances in the application default to a mode where JavaScript injection, custom style sheets, cookie manipulation, and message handler use is denied. To gain back access to these APIs, a WKWebView can set the limitsNavigationsToAppBoundDomains flag in their WKWebView configuration, like so:

webViewConfiguration.limitsNavigationsToAppBoundDomains = YES;

Setting this flag indicates to WebKit that a WKWebView will only navigate to app-bound domains. Once set, any attempt to navigate away from an app-bound domain will fail with the error: “App-bound domain failure.” A web view which has this configuration flag and loads an app-bound domain from the WKAppBoundDomains list, or from local resources like file URLs, data URLs, and strings, will have access to the following APIs:

  • (void)evaluateJavaScript:(NSString *)javaScriptString completionHandler:(void (^ _Nullable)(_Nullable id, NSError * _Nullable error))completionHandler
  • (void)addUserScript:(WKUserScript *)userScript;
  • window.webkit.messageHandlers

Additionally, an application will have access to the following WKHTTPCookieStore APIs for accessing cookies for app-bound domains:

  • (void)setCookie:(NSHTTPCookie *)cookie completionHandler:(nullable void (^)(void))completionHandler;
  • (void)getAllCookies:(void (^)(NSArray<NSHTTPCookie *> *))completionHandler;

All other WKWebView instances are prevented from using these APIs, since they are capable of leaking private data. This makes a WKWebView navigating to domains outside of the small set of app-bound domains work more like SafariViewController, which has built-in privacy protections like this already.

We will talk more about specific examples and benefits of App-Bound Domains below.

Example Use Cases

We will use five examples to illustrate the ways App-Bound Domains can be adopted for different types of applications:

  • UnchangedApp, an application which does not opt-in to App-Bound Domains.
  • ShopApp, an application with only self-hosted content.
  • SocialApp, an application with an in-app browser.
  • BrowserApp, a full web browser application.
  • HybridApp, an application with both self-hosted content and an in-app browser.

UnchangedApp

First let’s consider UnchangedApp, which does not opt-in to App-Bound Domains or change its behavior in any way. UnchangedApp will experience pre-iOS 14.0 WKWebView behavior, with no restricted APIs on any domains. However, the decision to not adopt App-Bound Domains in UnchangedApp could expose its users to tracking risks, for example if UnchangedApp includes third party code that surreptitiously injects script into web views.

ShopApp (self-hosted content)

Let’s look at a simple example of an application, ShopApp, which only serves web content from its own domain, shop.example. ShopApp can opt-in to App-Bound Domains by creating a new array entry in its Info.plist with the key WKAppBoundDomains. This kind of app can add up to 10 “app-bound” domains as strings in the array. This might be an example of ShopApp’s Info.plist entry:

<plist version="1.0">
<dict>
<key>WKAppBoundDomains</key>
<array>
    <string>shop.example</string>
    <string>shop-cdn.example</string>
</array>
</dict>

In order for a WKWebView to use restricted APIs on these domains, ShopApp will also have to initialize the WKWebView with the following configuration argument:

webViewConfiguration.limitsNavigationsToAppBoundDomains = YES;

ShopApp will now have access to the full spectrum of APIs listed above when browsing on shop.example and shop-cdn.example. Note that the check for app-bound domains only occurs for the top-level frame, so ShopApp will still be able to display third party iframes from domains outside the app-bound set on shop.example.

SocialApp (with in-app browser)

Now let’s consider a social media application, SocialApp, which is used largely as an in-app browser. A SocialApp user might navigate to many different websites using the app, possibly encountering a tracker, tracker.example, during in-app browsing.

Without the protections offered by App-Bound Domains, it is possible that SocialApp is intrusively using in-app browsing to track users by communicating with tracker.example.

If the developers of SocialApp want a better user privacy experience they have two paths forward:

  1. Use SafariViewController instead of WKWebView for in-app browsing. SafariViewController protects user data from SocialApp by loading pages outside of SocialApp’s process space. SocialApp can guarantee it is giving its users the best available user privacy experience while using SafariViewController.
  2. Opt-in to App-Bound Domains. The additional WKWebView restrictions from App-Bound Domains ensure that SocialApp is not able to track users using the APIs outlined above.

To opt-in to App-Bound Domains, SocialApp only needs to add an empty WKAppBoundDomains key to their Info.plist.

<plist version="1.0">
<dict>
<key>WKAppBoundDomains</key>
</dict>

Since SocialApp does not need any restricted APIs, no WKWebViewConfiguration arguments are necessary.

Due to the asynchronous nature of the web, a SocialApp developer could see different errors if trying to use restricted APIs and navigate to non app-bound domains in the same WKWebView. If a WKWebView in SocialApp uses a restricted API before any navigations occur, and then tries to navigate to a domain outside of the set of “app-bound” domains, the navigation will fail with the error “App-bound domain failure.” Conversely, if SocialApp first navigates to a non-app-bound domain then tries to use a restricted API, the API call will fail.

BrowserApp (exclusively browsing the web)

Another application, BrowserApp, is used exclusively for browsing the web. BrowserApp has previously received permission to take the managed entitlement com.apple.developer.web-browser, which signifies its purpose as a full web-browser. All WKWebView instances for BrowserApp will therefore have unrestricted API access on all domains. BrowserApp will not need to add a WKAppBoundDomains value to their Info.plist or make any changes to the way they initialize WKWebView.

HybridApp (both self-hosted + in-app browser)

Finally, let’s look at a more complex example. HybridApp is an application which offers in-app browsing to its users, but also requires restricted API use for WKWebView instances on its own domain, hybrid.example. HybridApp is a combination of ShopApp and SocialApp, and you should read and fully understand those examples first before considering HybridApp.

HybridApp’s Info.plist might look like this:

<plist version="1.0">
<dict>
<key>WKAppBoundDomains</key>
<array>
    <string>hybrid.example</string>
</array>
</dict>

HybridApp needs to inject JavaScript on hybrid.example, so it might create a WKWebViewConfiguration with the specific argument: webViewConfiguration.limitsNavigationsToAppBoundDomains = YES;.

HybridApp can now navigate to hybrid.example and successfully inject JavaScript:

[webView loadRequest:[NSURLRequest requestWithURL:[NSURL URLWithString:@"https://hybrid.example"]];
[webView evaluateJavaScript:script completionHandler:^(id value, NSError *error) {
    if ([value isEqual:@"Successfully injected JavaScript"])
        // …
}];

Say HybridApp tried to use this WKWebView to navigate to shop.example. Since shop.example is not an app-bound domain, this will result in a failed navigation.

Instead, HybridApp can create a different WKWebView with no limitsNavigationsToAppBoundDomains configuration flag. HybridApp can use this new WKWebView to navigate to any domain, including app-bound domains. However, any attempts to call restricted APIs will fail.

Other Options

The App-Bound Domains feature was created to allow for in-app browsing without sacrificing user privacy.

Prior to iOS 14, the only way to protect web content inside applications was to use SFSafariViewController for general web content, or ASWebAuthenticationSession for authentication purposes. We still think SafariViewController and ASWebAuthenticationSession represent the best way to protect user data, because they are views hosted outside of the application, making it impossible for applications to view or interact with the content of those views. If the developer only wishes to display web content as a convenience to the user, or if they only wish to support a web-based authentication flow, SafariViewController and ASWebAuthenticationSession continue to be the best choices.

Intelligent Tracking Prevention in WKWebView

Additionally in iOS 14.0 and macOS Big Sur, Intelligent Tracking Prevention (ITP), is enabled by default in all WKWebView applications. To learn more about how ITP protects users against web tracking, checkout this documentation on the topic.

In some extreme cases, users might need to disable ITP protections, for example when relying on web content outside of the app developer’s control. Applications can signal the need to allow users to disable ITP by adding a Purpose String for the key NSCrossWebsiteTrackingUsageDescription to the app’s Info.plist. When present, this key causes the application’s Settings screen to display a user control to disable ITP. The setting cannot be read or changed through API calls.

Note that applications taking the new Default Web Browser entitlement always have a user control in Settings to disable ITP, and don’t need to specify the NSCrossWebsiteTrackingUsageDescription key in their Info.plist.

Feedback and Bug Reports

If you find that this feature in any way doesn’t work as explained, please file a WebKit bug at https://bugs.webkit.org and CC Brent Fulgham and Kate Cheney. For feedback, please contact our web evangelist Jon Davis.

June 26, 2020 05:00 PM

June 25, 2020

Release Notes for Safari Technology Preview 109 with Safari 14 Features

Surfin’ Safari

Safari Technology Preview Release 109 is now available for download for macOS Catalina. With this release, Safari Technology Preview is now available for betas of macOS Big Sur. If you already have Safari Technology Preview installed, you can update in the Software Update pane of System Preferences on macOS. Safari Technology Preview is currently only available for Intel-based Macs.

This release includes new Safari and WebKit features that will be present in Safari 14. The following Safari 14 features are new in Safari Technology Preview 109:

Safari Web Extensions. Extensions written for Chrome, Firefox, and Edge that use the WebExtension APIs can be converted to Safari Web Extensions using Xcode 12.

Privacy Report. See the trackers that Intelligent Tracking Prevention prevented from accessing identifying information.

Improved tab management with tab previews. Tabs feature a new space-efficient design that lets you view more tabs on-screen and preview tabs to find the one you’re looking for.

Website icons in tabs. Icons in tabs are turned on by default in Safari 14.

Password breach notifications. On macOS Big Sur, Safari will notify users when one of their saved passwords in iCloud Keychain has shown up in a data breach; requesting a password change uses the well-known URL for changing passwords (https://example.com/.well-known/change-password), enabling websites to specify the page to open for updating a password.

Domain-bound codes. On macOS Big Sur, added support to Security Code AutoFill for domain-bound, one-time codes sent over SMS; in the following 2FA SMS, Safari only offers to fill the code on example.com, and no other domain.

Your Example code is 123446.

@example.com #123446

Web Authentication. Added a Web Authentication platform authenticator using Touch ID, if that capability is present (macOS Big Sur-only). Added support for PIN entry and account selection on external FIDO2 security keys.

Adobe Flash is no longer supported in Safari.

In addition to these new Safari 14 features, this release covers WebKit revisions 262502-263214 and Password Manager Resources version 10e3fca9.

Web API

  • Changed the file picker of the <input> file element to show the selection filter (r262895)
  • Changed to disallow XHR+Fetch responses when a response contains invalid header values (r262511)
  • Changed image referrerpolicy mutations to be considered “relevant mutations” (r263167)
  • Fixed empty dataTransfer.types when handling the dragstart event (r262507)
  • Fixed a case of being unable to select an item from dropdown (r263179)
  • Made ReadableStream robust against user code (r263141)

CSS

  • Fixed align-content to apply for a single line (r262716)
  • Fixed pseudo-elements (::after) in shadow roots to animate (r262711)
  • Fixed CSS custom properties for specific border properties (r262627)

Web Animations

  • Fixed animating font-size values with em units (r262946)

SVG

  • Fixed Document.currentScript to work for SVGScriptElements (r262945)
  • Fixed multiple SVG filters unexpectedly lightening the image using linearRGB (r262893)

IndexedDB

  • Added support for IDBFactory databases method (r263157)

Scrolling

  • Fixed horizontally scrolling elements that are broken when revealed by toggling visibility (r262774)

Layout

  • Changed to not apply the special anchor handling when the anchor content is visible after clamping (r262892)
  • Fixed inserted text placeholder to vertically align to top and behave like a block-level element when it has 0 width (r262525)

Media

  • Fixed a YouTube video that gets stuck after rapidly tapping on touchbar’s picture-in-picture button (r262599)
  • Added a quirk to allow starting AudioContext if document was interacted (r263025)

WebRTC

  • Improved SCTP cookie generation (r263154)

Back-forward Cache

  • Stopped allowing pages served over HTTPS with Cache-Control: no-store into the back-forward cache (r262978)

JavaScript

  • Added support for private class fields (r262613)
  • Added “el” (Greek) to our maintained available locales list (r262992)
  • Changed Logical Assignment to perform NamedEvaluation of anonymous functions (r262638)
  • Changed JSON.stringify to throw stack overflow error (r262727)
  • Changed RegExp.prototype getters to throw on cross-realm access (r262908)
  • Changed super to not depend on proto (r263035)
  • Fixed AsyncGenerator to await return completions (r262979)
  • Made errors an own property of AggregateError instead of a prototype accessor (r263006)

Editing

  • Fixed text form controls to prevent scrolling by a pixel when the value is the same length as the size (r263073)
  • Fixed observing a newly displayed element inside previously observed content (r263044)
  • Fixed text manipulation to exclude characters outside of the unicode private use area (r262645)
  • Fixed editing to handle nested item boundary elements (r263101)
  • Fixed to not re-extract elements whose children have been manipulated (r263132)
  • Fixed first and last unit in a paragraph to not contain only excluded tokens (r262601)

Accessibility

  • Changed <address> element to no longer map to ARIA contentinfo role (r263096)

Apple Pay

  • Added new values for -apple-pay-button-type (r262528)

Web Inspector

  • Changed text inputs to not spellcheck or autocomplete (r262848)
  • Fixed an issue where XHRs with the same URL as the main resource were not shown in the Sources Tab (r262842)
  • Improved the performance of resizing the Scope Chain panel in the details sidebar of the Sources Tab (r263115)

Web Driver

  • Fixed Automation.computeElementLayout to return iframe-relative element rects when the coordinate system is “Page” (r262997)
  • Fixed WebDriver on non-iOS ports that cannot perform ActionChain which has scrolling down to the element and click it (r262861)

June 25, 2020 06:35 PM

June 23, 2020

Async Clipboard API

Surfin’ Safari

Safari 13.1 adds support for the async Clipboard API. This API allows you, as web developers, to write and read data to and from the system clipboard, and offers several advantages over the current state of the art for reading and writing clipboard data via DataTransfer. Let’s take a look at how this new API works.

The API

The Clipboard API introduces two new objects:

  • Clipboard, which is accessible through navigator.clipboard and contains methods for reading from and writing to the system clipboard.
  • ClipboardItem, which represents a single item on the system clipboard that may have multiple representations. Currently, when reading and writing data, WebKit supports four MIME type representations: "text/plain", "text/html", "text/uri-list", and "image/png".

Conceptually, the Clipboard is represented by an ordered list of one or more ClipboardItem, each of which may have multiple type representations. For instance, when writing several PNG images to the clipboard, you should create a ClipboardItem for each image, with a single "image/png" type representation for each item. When writing a single PNG image with some alt text, you would instead write a single ClipboardItem with two representations: "image/png" and "text/plain".

You can use clipboard.read to extract data from the system clipboard; this asynchronously retrieves an array of ClipboardItem, each containing a mapping of MIME type to Blob. Similarly, clipboard.write can be used to write a given array of ClipboardItem to the system clipboard. However, if you are only trying to read or write plain text, you may find the methods clipboard.readText and clipboard.writeText to be more ergonomic.

Each ClipboardItem also has a presentationStyle, which may indicate whether the item is best represented as inline data or an “attachment” (that is, a file-like entity). This distinction may be useful in order to tell a copied text selection on a webpage apart from a copied HTML file.

Let’s dive into some examples below to see how to programmatically read and write data.

Writing Data

Consider this basic example, which implements a button that copies plain text when clicked:

<button id="new-copy">Copy text</button>
<script>
document.getElementById("new-copy").addEventListener("click", event => {
    navigator.clipboard.writeText("This text was copied programmatically.");
});
</script>

This is much simpler than the current method of programmatically copying text, which requires us to select a text field and execute the “copy” command:

<button id="old-copy">Copy text</button>
<script>
document.getElementById("old-copy").addEventListener("click", event => {
    let input = document.createElement("input");
    input.style.opacity = "0";
    input.style.position = "fixed";
    input.value = "This text was also copied programmatically.";
    document.body.appendChild(input);

    input.focus();
    input.setSelectionRange(0, input.value.length);
    document.execCommand("Copy");

    input.remove();
});
</script>

When using clipboard.write to copy data, you need to create an array of ClipboardItem. Each ClipboardItem is initialized with a mapping of MIME type to Promise which may resolve either to a string or a Blob of the same MIME type. The following example uses clipboard.write to copy a single item with both plain text and HTML representations.

<button id="copy-html">Copy text and markup</button>
<div>Then paste in the box below:</div>
<div contenteditable spellcheck="false" style="width: 200px; height: 100px; overflow: hidden; border: 1px solid black;"></div>
<script>
document.getElementById("copy-html").addEventListener("click", event => {
    navigator.clipboard.write([
        new ClipboardItem({
            "text/plain": Promise.resolve("This text was copied using `Clipboard.prototype.write`."),
            "text/html": Promise.resolve("<p style='color: red; font-style: oblique;'>This text was copied using <code>Clipboard.prototype.write</code>.</p>"),
        }),
    ]);
});
</script>

A similar implementation using existing DataTransfer API would require us to create a hidden text field, install a copy event handler on the text field, focus it, trigger a programmatic copy, set data on the DataTransfer (within the copy event handler), and finally call preventDefault.

Note that both clipboard.write and clipboard.writeText are asynchronous. If you attempt to write to the clipboard while a prior clipboard writing invocation is still pending, the previous invocation will immediately reject, and the new content will be written to the clipboard.

On both iOS and macOS, the order in which types are written to the clipboard is also important. WebKit writes data to the system pasteboard in the order specified — this means types that come before other types are considered by the system to have “higher fidelity” (that is, preserve more of the original content). Native apps on macOS and iOS may use this fidelity order as a hint when choosing an appropriate UTI (universal type identifier) to read.

Reading Data

Data extraction follows a similar flow. In the following example, we:

  1. Use clipboard.read to obtain a list of clipboard items.
  2. Resolve the first item’s "text/html" data to a Blob using clipboardItem.getType.
  3. Use the FileReader API to read the contents of the Blob as text.
  4. Display the pasted markup by setting the innerHTML of a container <div>.
<span style="font-weight: bold; background-color: black; color: white;">Select this text and copy</span>
<div><button id="read-html">Paste HTML below</button></div>
<div id="html-output"></div>
<script>
document.getElementById("read-html").addEventListener("click", async clickEvent => {
    let items = await navigator.clipboard.read();
    for (let item of items) {
        if (!item.types.includes("text/html"))
            continue;

        let reader = new FileReader;
        reader.addEventListener("load", loadEvent => {
            document.getElementById("html-output").innerHTML = reader.result;
        });
        reader.readAsText(await item.getType("text/html"));
        break;
    }
});
</script>

There are a couple of interesting things to note here:

  • Like writing data, reading data is also asynchronous; the processes of both fetching every ClipboardItem and extracting a Blob from a ClipboardItem return promises.
  • Type fidelities are preserved when reading data. This means the order in which types were written (either using system API on iOS and macOS, or the async clipboard API) is the same as the order in which they are exposed upon reading from the clipboard.

Security and Privacy

The async clipboard API is a powerful web API, capable of both writing arbitrary data to the clipboard, as well as reading from the system clipboard. As such, there are serious security ramifications when allowing pages to write data to the clipboard, and privacy ramifications when allowing pages to read from the clipboard. Clearly, untrusted web content shouldn’t be capable of extracting sensitive data — such as passwords or addresses — without explicit consent from the user. Other vulnerabilities are less obvious; for instance, consider a page that writes "text/html" to the pasteboard that contains malicious script. When pasted in another website, this could result in a cross-site scripting attack!

WebKit’s implementation of the async clipboard API mitigates these issues through several mechanisms.

  • The API is limited to secure contexts, which means that navigator.clipboard is not present for http:// websites.
  • The request to write to the clipboard must be triggered during a user gesture. A call to clipboard.write or clipboard.writeText outside the scope of a user gesture (such as "click" or "touch" event handlers) will result in the immediate rejection of the promise returned by the API call.
  • Both "text/html" and "image/png" data is sanitized before writing to the pasteboard. Markup is loaded in a separate document where JavaScript is disabled, and only visible content is then extracted from this page. Content such as <script> elements, comment nodes, display: none; elements and event handler attributes are all stripped away. For PNG images, the image data is first decoded into a platform image representation, before being re-encoded and sent to the platform pasteboard for writing. This ensures that a website cannot write corrupted or broken images to the pasteboard. If the image data cannot be decoded, the writing promise will reject. Additional information about WebKit’s sanitization mechanisms is available in the Clipboard API Improvements blog post.
  • Since users may not always be aware that sensitive content has been copied to the pasteboard, restrictions on the ability to read are more strict than the restrictions on the ability to write. If a page attempts to programmatically read from the pasteboard outside of a user gesture, the promise will immediately reject. If the user is explicitly triggering a paste during the gesture (for instance, using a keyboard shortcut on macOS such as ⌘V or pasting using the “Paste” action on the callout bar on iOS), WebKit will allow the page to programmatically read the contents of the clipboard. Programmatic clipboard access is also automatically granted in the case where the contents of the system clipboard were written by a page with the same security origin. If neither of the above are true, WebKit will show platform-specific UI which the user may interact with to proceed with a paste. On iOS, this takes the form of a callout bar with a single option to paste; on macOS, it is a context menu item. Tapping or clicking anywhere in the page (or performing any other actions, such as switching tabs or hiding Safari) will cause the promise to be rejected; the page is granted programmatic access to the clipboard only if the user manually chooses to paste by interacting with the platform-specific UI.
  • Similar to writing data, reading data from the system clipboard involves sanitization to prevent users from unknowingly exposing sensitive information. Image data read from the clipboard is stripped of EXIF data, which may contain details such as location information and names. Likewise, markup that is read from the clipboard is stripped of hidden content, such as comment nodes.

These policies ensure that the async clipboard API allows developers to deliver great experiences without the potential to be abused in a way that compromises security or privacy for users.

Future Work

As we continue to iterate on the async clipboard API, we’ll be adding support for custom pasteboard types, and will also consider support for additional MIME types, such as "image/jpeg" or "image/svg+xml". As always, please let us know if you encounter any bugs (or if you have ideas for future enhancements) by filing bugs on bugs.webkit.org.

June 23, 2020 11:35 PM

June 16, 2020

Víctor Jáquez: WebKit Flatpak SDK and gst-build

Igalia WebKit

This post is an annex of Phil’s Introducing the WebKit Flatpak SDK. Please make sure to read it, if you haven’t already.

Recapitulating, nowadays WebKitGtk/WPE developers —and their CI infrastructure— are moving towards to Flatpak-based environment for their workflow. This Flatpak-based environment, or Flatpak SDK for short, can be visualized as a software sandboxed-container, which bundles all the dependencies required to compile, run and debug WebKitGtk/WPE.

In a day-by-day work, this approach removes the potential compilation of the world in order to obtain reproducible builds, improving the development and testing work flow.

But what if you are also involved in the development of one dependency?

This is the case of Igalia’s multimedia team where, besides developing the multimedia features for WebKitGtk and WPE, we also participate in the GStreamer development, the framework used for multimedia.

Because of this, in our workflow we usually need to build WebKit with a fix, hack or new feature in GStreamer. Is it possible to add in Flatpak our custom GStreamer build without messing its own GStreamer setup? Yes, it’s possible.

gst-build is a set of scripts in Python which clone GStreamer repositories, compile them and setup an uninstalled environment. This uninstalled environment allows a transient usage of the compiled framework from their build tree, avoiding installation and further mess up with our system.

The WebKit scripts that wraps Flatpak operations are also capable to handle the scripts of gst-build to build GStreamer inside the container, and, when running WebKit’s artifacts, the scripts enable the mentioned uninstalled environment, overloading Flatpak’s GStreamer.

How do we unveil all this magic?

First of all, setup a gst-build installation as it is documented. In this installation is were the GStreamer plumbing is done.

Later, gst-build operations through WebKit compilation scripts are enabled when the environment variable GST_BUILD_PATH is exported. This variable should point to the directory where the gst-build tree is placed.

And that’s all!

But let’s put these words in actual commands. The following workflow assumes that WebKit repository is cloned in ~/WebKit and the gst-build tree is in ~/gst-build (please, excuse my bashisms).

Compiling WebKitGtk with symbols, using LLVM as toolchain (this command will also compile GStreamer):

$ cd ~/WebKit
% CC=clang CXX=clang++ GST_BUILD_PATH=/home/vjaquez/gst-build Tools/Scripts/build-webkit --gtk --debug
...

Running the generated minibrowser (remind GST_BUILD_PATH is required again for a correct linking):

$ GST_BUILD_PATH=/home/vjaquez/gst-build Tools/Scripts/run-minibrowser --gtk --debug
...

Running media layout tests:

$ GST_BUILD_PATH=/home/vjaquez/gst-build ./Tools/Scripts/run-webkit-tests --gtk --debug media

But wait! There’s more...

What if you I want to parametrize the GStreamer compilation. To say, I would like to enable a GStreamer module or disable the built of a specific element.

gst-build, as the rest of GStreamer modules, uses meson build system, so it’s possible to pass arguments to meson through the environment variable GST_BUILD_ARGS.

For example, I would like to enable gstreamer-vaapi 😇

$ cd ~/WebKit
% CC=clang CXX=clang++ GST_BUILD_PATH=/home/vjaquez/gst-build GST_BUILD_ARGS="-Dvaapi=enabled" Tools/Scripts/build-webkit --gtk --debug
...

By vjaquez at June 16, 2020 11:49 AM

June 11, 2020

Release Notes for Safari Technology Preview 108

Surfin’ Safari

Safari Technology Preview Release 108 is now available for download for macOS Catalina and macOS Mojave. If you already have Safari Technology Preview installed, you can update in the Software Update pane of System Preferences on macOS.

This release covers WebKit revisions 262002-262502.

Web Inspector

  • Network
    • Fixed updating statistics when filtering (r262263)
    • Fixed gaps around the “An error occurred trying to load this resource” message (r262162)
  • Storage
    • Prevented requesting the list of IndexedDB database names multiple times for the same security origin (r262077)
  • Graphics
    • Added support for the id (name) of the animation if it exists (r262404)
    • Fixed populating text editors in the Keyframes section when the Animation panel sidebar in the details sidebar is first shown (r262307)
  • Miscellaneous
    • Fixed ⌘G to not override the current query of the find banner if it’s visible (r262173)

Accessibility

  • Fixed SVG text node’s with content getting described as “empty group” even if it’s not empty (r262500)
  • Fixed ignoring images with an empty alt attribute (r262224)

Web API

  • Fixed <area> to require being connected in order to navigate (r262359)
  • Fixed the pageshow event only firing the first time the back button is pressed (r262221)
  • Fixed Array.prototype.splice not setting the length of the returned object if not an Array (r262088)
  • Fixed incorrect location.origin in blob workers (r262026)
  • Implemented ParentNode.prototype.replaceChildren (r262381)

CSS

  • Changed the calculation to compute the hypothetical cross size of each item in flexbox to use fit-content, not max-content (r262411)
  • Changed to allow indefinite size flex items to be definite with respect to resolving percentages inside them (r262124)
  • Fixed dynamically setting position: absolute in a grid item to trigger a relayout of that element (r262481)
  • Fixed tables as flex items to obey the flex container sizing (r262378)
  • Fixed styling ::selection for a flex container (r262049)
  • Prevented grid-template-rows from serializing adjacent <line-names> (r262130)
  • Prevented putting out-of-flow boxes in anonymous flex items or grid items (r262061)

JavaScript

  • Fixed BigInt operations to handle exceptions correctly (r262386)

Scrolling

  • Fixed scrolling on a mercurynews.com article (r262127)
  • Fixed stuttery overflow scrolling in slow-scrolling regions (r262094)
  • Fixed rendering artifacts when scrolling overlays (r262177)

Rendering

  • Fixed incorrect clipping of absolute and fixed elements inside stacking-context composited overflow: hidden (r262237)

Async Clipboard API

  • Added support for reading "image/png" on ClipboardItem (r262209)
  • Fixed DataTransfer.prototype.files containing multiple files when pasting a single image with multiple representations (r262047)

Web Animations

  • Avoided starting CSS Transitions for a property when a CSS Animations or JavaScript-originated animation is running for the same property (r262154)
  • Fixed SVG animations to not stop when other animators are still running (r262175)

Media

  • Fixed Picture-in-Picture API issues under stress tests (r262038)
  • Fixed scrubbing video on www.judiciary.senate.gov ( r262169)
  • Fixed fullscreen animation missing a few frames at beginning (r262322)
  • Fixed transition between encrypted and clear codecs throwing an error (r262364)
  • Fixed video freezing when attaching a local MediaStream to multiple elements (r262189)
  • Made setting fullscreen mode more robust under stress tests (r262456)

June 11, 2020 05:30 PM

Diego Pino: Renderization of Conic gradients

Igalia WebKit

The CSS Images Module Level 4 introduced a new type of gradient: conic-gradient. Until then, there were only two other type of gradients available on the Web: linear-gradient and radial-gradient.

The first browser to ship conic-gradient support was Google Chrome, around March 2018. A few months after, September 2018, the feature was available in Safari. Firefox have been missing support until now, although an implementation is on the way and will ship soon. In the case of WebKitGTK (Epiphany) and WPE (Web Platform for Embedded), support landed in October 2019 which I implemented as part of my work at Igalia. The feature has been officially available in WebKitGTK and WPE since version 2.28 (March 2020).

Before native browser support, conic-gradient was available as a JavaScript polyfill created by Lea Verou.

Gradients in the Web

Generally speaking, a gradient is a smooth transition of colors defined by two or more stop-colors. In the case of a linear gradient, this transition is defined by a straight line (which might have and angle or not).

div.linear-gradient {
  width: 400px;
  height: 100px;
  background: linear-gradient(to right, red, yellow, lime, aqua, blue, magenta, red);
}
Linear gradientLinear gradient

In the case of a radial gradient, the transition is defined by a center and a radius. Colors expand evenly in all directions from the center of the circle to outside.

div.radial-gradient {
  width: 300px;
  height: 300px;
  border-radius: 50%;
  background: radial-gradient(red, yellow, lime, aqua, blue, magenta, red);
}
Radial gradientRadial gradient

A conical gradient, although also defined by a center and a radius, isn’t the same as a radial gradient. In a conical gradient colors spin around the circle.

div.conic-gradient {
  width: 300px;
  height: 300px;
  border-radius: 50%;
  background: conic-gradient(red, yellow, lime, aqua, blue, magenta, red);
}
Conic gradientConic gradient

Implementation in WebKitGTK and WPE

At the time of implementing support in WebKitGTK and WPE, the feature had already shipped in Safari. That meant WebKit already had support for parsing the conic-gradient specification as defined in CSS Images Module Level 4 and the data structures to store relevant information were already created. The only piece missing in WebKitGTK and WPE was painting.

Safari leverages many of its graphical painting operations on CoreGraphics library, which counts with a primitive for conic gradient painting (CGContextDrawConicGradient). Something similar happens in Google Chrome, although in this case the graphics library underneath is Skia (CreateTwoPointConicalGradient). WebKitGTK and WPE use Cairo for many of their graphical operations. In the case of linear and radial gradients, there’s native support in Cairo. However, there isn’t a function for conical gradient painting. This doesn’t mean Cairo cannot be used to paint conical gradients, it just means that is a little bit more complicated.

Mesh gradients

Cairo documentation states is possible to paint a conical gradient using a mesh gradient. A mesh gradient is defined by a set of colors and control points. The most basic type of mesh gradient is a Gouraud-shading triangle mesh.

cairo_mesh_pattern_begin_patch (pattern)

cairo_mesh_pattern_move_to (pattern, 100, 100);
cairo_mesh_pattern_line_to (pattern, 130, 130);
cairo_mesh_pattern_line_to (pattern, 130,  70);

cairo_mesh_pattern_set_corner_color_rgb (pattern, 0, 1, 0, 0);
cairo_mesh_pattern_set_corner_color_rgb (pattern, 1, 0, 1, 0);
cairo_mesh_pattern_set_corner_color_rgb (pattern, 2, 0, 0, 1);

cairo_mesh_pattern_end_patch (pattern)
Gouraud-shaded triangle meshGouraud-shaded triangle mesh

A more sophisticated patch of mesh gradient is a Coons patch. A Coons patch is a quadrilateral defined by 4 cubic Bézier curve and 4 colors, one for each vertex. A Bézier curve is defined by 4 points, so we have a total of 12 control points (and 4 colors) in a Coons patch.

cairo_mesh_pattern_begin_patch (pattern);

cairo_mesh_pattern_move_to (pattern, 45, 12);
cairo_mesh_pattern_curve_to(pattern, 69, 24, 173, -15, 115, 50);
cairo_mesh_pattern_curve_to(pattern, 127, 66, 174, 47, 148, 104);
cairo_mesh_pattern_curve_to(pattern, 65, 58, 70, 69, 18, 103);
cairo_mesh_pattern_curve_to(pattern, 42, 43, 63, 45, 45, 12);

cairo_mesh_pattern_set_corner_color_rgb (pattern, 0, 1, 0, 0); // red
cairo_mesh_pattern_set_corner_color_rgb (pattern, 1, 0, 1, 0); // green
cairo_mesh_pattern_set_corner_color_rgb (pattern, 2, 0, 0, 1); // blue
cairo_mesh_pattern_set_corner_color_rgb (pattern, 3, 1, 1, 0); // yellow

cairo_mesh_pattern_end_patch (pattern);
Coons patch gradientCoons patch gradient

A Coons patch comes very handy to paint a conical gradient. Consider the first quadrant of a circle, such quadrant can be easily defined with a Bézier curve.

cairo_mesh_pattern_begin_patch (pattern);

cairo_mesh_pattern_move_to (pattern, 0, 200);
cairo_mesh_pattern_line_to (pattern, 0, 0);
cairo_mesh_pattern_curve_to (pattern, 133, 0, 200, 133, 200, 200);
cairo_mesh_pattern_line_to (pattern, 0, 200);

cairo_mesh_pattern_set_corner_color_rgb (pattern, 0, 1, 0, 0); // red
cairo_mesh_pattern_set_corner_color_rgb (pattern, 1, 0, 1, 0); // green
cairo_mesh_pattern_set_corner_color_rgb (pattern, 2, 0, 0, 1); // blue
cairo_mesh_pattern_set_corner_color_rgb (pattern, 3, 1, 1, 0); // yellow

cairo_mesh_pattern_end_patch(pattern);
Coons patch of the first quadrant of a circleCoons patch of the first quadrant of a circle

If we just simply use two colors instead, the final result resembles more to how a conical gradient looks.

cairo_mesh_pattern_set_corner_color_rgb (pattern, 0, 1, 0, 0); // red
cairo_mesh_pattern_set_corner_color_rgb (pattern, 1, 1, 0, 0); // red
cairo_mesh_pattern_set_corner_color_rgb (pattern, 2, 1, 1, 0); // yellow
cairo_mesh_pattern_set_corner_color_rgb (pattern, 3, 1, 1, 0); // yellow
Coons patch of the first quadrant of a circle (2 colors)Coons patch of the first quadrant of a circle (2 colors)

Repeat this step 3 times more, with a few more stop colors, and you have a nice conical gradient.

A conic gradient made by composing mesh patchesA conic gradient made by composing mesh patches

Bézier curve as arcs

At this point the difficulty of painting a conical gradient has been reduced to calculating the shape of the Bézier curve of each mesh patch.

Computing the starting and ending points is straight forward, however calculating the position of the other two control points of the Bézier curve is a bit much harder.

Bézier curve approximation to a circle quadrantBézier curve approximation to a circle quadrant

Mozillian Michiel Kamermans (pomax) has a beautifully written essay on Bézier curves. Section “Circles and cubic Bézier curves” of such essay discusses how to approximate a Bézier curve to an arc. The case of a circular quadrant is particularly interesting because it allows painting a circle with 4 Bézier curves with minimal error. In the case of the quadrant above the values for each point would be the following:

S = (0, r), CP1 = (0.552 * r, r), CP2 = (r, 0.552 * r), E = (r, 0) 

Even though on its most basic form a conic gradient is defined by one starting and one ending color, painting a circle with two Bézier curves is not a good approximation to a semicircle (check the interactive examples of pomax’s Bézier curve essay). In such case, the conic gradient is split into four Coon patches with middle colors interpolated.

Also, in cases were there are more than 4 colors, each Coons patch will be smaller than a quadrant. It’s necessary a general formula that can compute the control points for each section of the circle, given an angle and a radius. After some math, the following formula can be inferred (check section “Circle and cubic Bézier curves” in pomax’s essay):

cp1 = {
   x: cx + (r * cos(angleStart) - f * (r * sin(angleStart),
   y: cy + (r * sin(angleStart)) + f * (r * cos(angleStart))
}
cp2 = {
   x: cx + (r * cos(angleEnd)) + f * (r * sin(angleEnd)),
   y: cy + (r * sin(angleEnd)) - f * (r * cos(angleEnd))
}

where f is a variable computed as:

f = 4 * tan((angleEnd - angleStart) / 4) / 3;

For a 90 degrees angle the value of f is 0.552. Thus, if the quadrant above had a radius of 100px, the values of the control points would be: CP1(155.2, 0) and CP2(200, 44.8) (considering top corner left as point 0,0).

And that’s basically all that is needed. The formula above allows us to compute a circular sector as a Bézier line, which when setup as a Coons patch creates a section of a conical gradient. Adding several Coons patches together creates the final conical gradient.

Wrapping up

It has been a long time since conic gradients for the Web were first drafted. For instance, the current bug in Firefox’s Bugzilla was created by Lea Verou five years ago. Fortunately, browsers have started shipping native support and conical gradients have been available in Chrome and Safari since two years ago. In this post I discussed the implementation, mainly rendering, of conic gradients in WebKitGTK and WPE. And since both browsers are WebKit based, they can leverage on the implementation efforts led by Apple when bringing support of this feature to Safari. With Firefox shipping conic gradient support soon this feature will be safe to use in the Web Platform.

June 11, 2020 12:00 AM

June 09, 2020

Philippe Normand: WebKitGTK and WPE now supporting videos in the img tag

Igalia WebKit

Using videos in the <img> HTML tag can lead to more responsive web-page loads in most cases. Colin Bendell blogged about this topic, make sure to read his post on the cloudinary website. As it turns out, this feature has been supported for more than 2 years in Safari, but …

By Philippe Normand at June 09, 2020 04:00 PM

June 08, 2020

Philippe Normand: Introducing the WebKit Flatpak SDK

Igalia WebKit

Working on a web-engine often requires a complex build infrastructure. This post documents our transition from JHBuild to Flatpak for the WebKitGTK and WPEWebKit development builds.

For the last 10 years, WebKitGTK has been relying on a custom JHBuild moduleset to handle its dependencies and (try to) ensure a reproducible …

By Philippe Normand at June 08, 2020 04:50 PM

June 05, 2020

Paulo Matos: JSC: what are my options?

Igalia WebKit

Compilers tend to be large pieces of software that provide an enormous amount of options. We take a quick look at how to find what JavaScriptCore (JSC) provides.

More…

By Paulo Matos at June 05, 2020 12:23 PM

May 28, 2020

Release Notes for Safari Technology Preview 107

Surfin’ Safari

Safari Technology Preview Release 107 is now available for download for macOS Catalina and macOS Mojave. If you already have Safari Technology Preview installed, you can update in the Software Update pane of System Preferences on macOS.

This release covers WebKit revisions 261057-262002.

Web Inspector

  • Network
    • Adjusted the spacing of the navigation items so that none are hidden when previewing a resource (r261497)
  • Sources
    • Added showing the name of the Worker if it exists as the title of its main resource (r261104)
    • Fixed source mapping issue when combining single-child directory chains (r261200)
    • Fixed restoring global DOM, URL, or event breakpoints incorrectly enabling all breakpoints (r261340)
    • Supported CSS pretty printing when a url() is nested inside a url() (r261772)
  • Timelines
    • Fixed the memory stacked area graph to not extend beyond the “stopping time” marker (r261197)
  • Storage
    • Fixed double-clicking on a cookie field to start editing that cookie (r261339)
  • Layers
    • Ensured that the text at the bottom of the details sidebar doesn’t overlap (r261237)
  • Console
    • Added showing EventTarget listeners as an internal property (r261670)
    • Added showing Worker name as an internal property (r261499)
  • Miscellaneous
    • Accessibility
      • Fixed Left/Right arrow keys to collapse/expand details sections (r261962)
    • Remote Inspection
      • Provided a way to turn on or off ITP debug mode and AdClickAttribution debug mode (r261103)
      • Dropped support for iOS 8.x, iOS 9.x, and iOS 10.x (r261108, r261109, r261105)

Web API

  • Changed the initial value of transform-box to be view-box to fix some SVG animations (r261752)
  • Fixed computing the correct perspective matrix on a box with borders and overflow: hidden (r261619)
  • Fixed Object.prototype.toString to match standards (r261159)
  • XML external entity resources should only be loaded from XML MIME types (r261443)

CSS

  • Changed the cursor to update during the rendering steps, rather than on a 20ms timer (r261741)
  • Fixed the computed min-width and min-height for auto depending on the box (r261974)
  • Fixed disappearing content with CSS parallax on an overflow scroll (r261837)

Rendering

  • Fixed repaint issues when the login field collapses on music.apple.com (r261979)
  • Fixed text clipped when rendered with fonts which have a negative line gap metric (r261573)
  • Fixed table sizing when max-width is used (r261924)

Scrolling

  • Fixed tapping on the trackpad in a <select> to flash the scrollers (r261368)
  • Fixed a <select> that sometimes becomes non-scrollable (r261427)
  • Fixed overflow scrollbars not growing when hovered (r261132)
  • Fixed find not always scrolling search results into view (r261819)
  • Fixed composited scrolling interfering with the propagation of perspective (r261632)
  • Fixed scrollbars flickering in RTL scrollable regions (r261535)

Media

  • Changed to ignore a poster set after playback begins (r261341, r261576)
  • Fixed media controls tracks menu showing “Auto” selected instead of the track selected via the JavaScript API (r261084)

IndexedDB

  • Improved the accuracy of IndexedDB estimated write size computation (r261533)
  • Fixed a bug that could cause IndexedDB log files to grow without bound (r261533)

JavaScript

  • Implemented Intl.Locale (r261215)
  • Implemented BigInt.asIntN and BigInt.asUintN (r261156, r261199)
  • Enabled logical assignment operators (r261728)
  • Ensured IntlCollator.prototype.resolvedOptions returns relevant locale extension keys in alphabetical order (r261182)

Web Animations

  • Fixed the animation engine to not wake up every tick for steps() timing functions (r261926)
  • Fixed animations with a single keyframe not getting accelerated (r261756)
  • Fixed calling reverse() on an accelerated animation having no effect (r261637)
  • Coordinated “update animations and send events” procedure across multiple timelines (r261218)
  • Fixed Document.getAnimations() to only consider document connection and not timeline association (r261488)
  • Fixed the animation of font-size using rem values (r261861)

Async Clipboard API

  • Enabled clipboard API access when pasting from a menu item or key binding (r261825)
  • Fixed cut and paste from Google Doc to Notes in several (non-Latin) languages (r261247)
  • Preserved character set information when writing to the pasteboard when copying rich text (r261395)

Accessibility

  • Implemented accessibility of HTML 5.1 Drag & Drop (r261248)

CSS Grid

  • Cleared the override width for computing percent margins (r261841)
  • Changed to treat percentages as auto for the minimum contribution (r261767)
  • Fixed auto repeat with multiple tracks and gutters (r261949)

Bug Fixes

  • Added quirk for cookie blocking latch mode aolmail.com redirecting to aol.com under aol.com (r261724)
  • Changed to enforce a URL cannot have a username, password, or port if its host is null (r261173)
  • Changed XML external entities to require an XML MIME type to be loaded (r261451)
  • Fixed the playhead in Touch Bar continuing when loading stalls (r261342)
  • Fixed the search field on mayoclinic.org clipping the submit button (r261450)
  • Fixed setting a host on a URL when no port is specified (r261212)
  • Limited the HTTP referer to 4KB (r261402)

May 28, 2020 05:30 PM

May 14, 2020

Release Notes for Safari Technology Preview 106

Surfin’ Safari

Safari Technology Preview Release 106 is now available for download for macOS Catalina and macOS Mojave. If you already have Safari Technology Preview installed, you can update in the Software Update pane of System Preferences on macOS.

This release covers WebKit revisions 260266-261057.

Web Inspector

  • Sources
    • Ensured “Step Over” only steps through comma expressions if they not nested (r260520)
  • Storage
    • Fixed third-party cookie display (r260807)
    • Added support for selecting multiple local storage entries (r260613)
  • Miscellaneous
    • Updated find dialog to populate the search string from the system find pasteboard (r260847, r260887, r260895)
    • Fixed the filter bar in the navigation sidebar to respect the global search settings (r260386)

Async Scrolling

  • Enabled async frame and overflow scrolling by default on macOS (r260276)
  • Fixed an overflow that’s hidden on one axis to be scrollable on that axis (r260450)

Web Animations

  • Fixed applying keyframe easings to transforms (r260360)
  • Changed to guarantee assigning an element to effect.target keeps the element alive, even without other references to it (r260705)
  • Implemented jump-* functions for steps() timing functions (r261046)

CSS

  • Added support for :where() pseudo class (r260319)
  • Fixed :is() and :where() to not allow pseudo-elements when parsing (r260338)
  • Fixed border-radius failing to clip composited iframe contents (r260950)

JavaScript

  • Enabled BigInt (r260345)
  • Changed BigInt constructor to accept larger integers than safe-integers (r260863)
  • Added support for Intl.RelativeTimeFormat (r260349)
  • Redesigned for-of iteration for arrays (r260323)

WebRTC

  • Updated getDisplayMedia to respect aspect ratio with max constraints (r260561, r260638)

Web API

  • Fixed the visibilitychange event to bubble per spec (r260483)

Media

  • Changed to ensure a remote track event gets unmuted after the track event is fired (r260813)
  • Fixed audio session category to be set incorrectly after changing video source with MSE (r261004)
  • Fixed video elements to return to an incorrect position when exiting fullscreen (r260150)

Rendering

  • Fixed flickering header when scrolling articles with fixed position elements (r260828)
  • Fixed content disappearing in a CSS-based parallax implementation (r260371)
  • Fixed a blank header on a site by changing to not use stale containing block width value while computing preferred width (r260905)
  • Fixed oversized caret and selection rects in text fields (r260367)

Bug Fix

  • Enabled using credentials for same-origin CSS mask images (r260598)

May 14, 2020 05:15 PM

April 23, 2020

Release Notes for Safari Technology Preview 105

Surfin’ Safari

Safari Technology Preview Release 105 is now available for download for macOS Catalina and macOS Mojave. If you already have Safari Technology Preview installed, you can update in the Software Update pane of System Preferences on macOS.

This release covers WebKit revisions 259476-260266.

CSS

  • Added Selectors Level 4 specificity calculation for pseudo classes (r260024, r260069)
  • Added support for font-relative lh and rlh unit frp, CSS Values Level 4 specification (r259703)
  • Corrected the computed style for outline-offset when outline-style is none (r259562)
  • Fixed bad style sharing between sibling elements with different part attributes for CSS Shadow Parts (r259877)
  • Implemented the CSS Color Level 4 behavior for inheritance of currentColor (r259532)
  • Prevented caching definite height against perpendicular flex items (r260055)

JavaScript

  • Fixed Intl.DateTimeFormat patterns and fields (r260145)
  • Implemented BigInt.prototype.toLocaleString` (r259919)
  • Updated Intl to allow calendar and numberingSystem options (r259941)
  • Implemented logical assignment operators (r260119)
  • Updated canonicalizeLocaleList to gracefully throw OOM error if the input and error message is too large (r259481)
  • Updated module’s default cross-origin value should be “anonymous” (r260003, r260038)

Media

  • Made a change to update ScreenTime as playback state changes (r260182, r260201)
  • Filtered some capture device names (r259477)
  • Added support for applying a frameRate limit when the request stream is from Camera (r260245)

Web Animations

  • Added support for pseudoElement on KeyframeEffect and KeyframeEffectOptions (r260139)
  • Fixed computing transition-property correctly when transition-duration is set to inherit (r259720)

Accessibility

  • Fixed smart invert to handle the picture elements on foxnews.com (r260092)

Rendering

  • Fixed drawing an image srcRect and imageRect to be in the same orientation of destRect (r260016)
  • Fixed a missing gradient banner on fastclick.com (r259701)

Web API

  • Fixed querySelector("#\u0000") to match an element with ID U+FFFD (r259773)
  • Fixed scroll snap in subframes when async overflow scroll is enabled (r260086)
  • Fixed zoom changes to not affect ResizeObserverSize (r259578)
  • Updated CanvasRenderingContext2D.drawImage to ignore the EXIF orientation if the image-orientation is none (r259567)
  • Updated documentFragment.getElementById() not work for empty-string IDs (r259651)
  • Updated baseURL for a module script to be the response URL, not the request URL (r260131)

Web Inspector

  • Elements Tab
    • De-indented items in the Variables section in the Computed sidebar panel so that wrapped content doesn’t line up with -- (r260096)
  • Sources Tab
    • Added support for copying selected call frame(s) in the Call Stack section (r259738)
    • Added a “Step” button that continues execution to the next expression in the current call frame (r260113)
    • Treated comma sub-expressions as separate statements to provide more intuitive formatting, additional breakpoint opportunities, and better stepping functionality (r259781, r259810)
  • Storage Tab
    • Provided a way to delete multiple localStorage or sessionStorage entries (r259744)
    • Allowed cookies to be set with no value (r259842)
    • Fixed an issue where cookies weren’t shown on pages that have subframes that have been denied access to cookies (r259649)
  • Console Tab
    • Ensured that long strings are not truncated when passed to console functions (r260091)
  • Search Tab
    • Added a setting that controls whether search field is populated with the current selection when using the global search shortcut ⇧⌘F (r259748)
  • Miscellaneous
    • Increased the auto-inspect debugger timeout delay to account for slower networks/devices (r259479)

April 23, 2020 08:00 PM

April 10, 2020

A Tour of Inline Caching with Delete

Surfin’ Safari

If you search for any JavaScript performance advice, a very popular recommendation is to avoid the delete operator. Today, this seems to be good advice, but why should it be vastly more expensive to delete a property than to add it?

The goal of my internship at Apple this winter was to improve the performance of the delete operator in JavaScriptCore. This has given me the opportunity to learn about how the various pieces of JavaScriptCore’s inline caching optimizations work, and we will take a quick tour of these optimizations today.

First, we will look at how expensive deletion really is in the major engines, and discuss why we might want to optimize it. Then, we will learn what inline caching is by implementing it for property deletion. Finally, we will look at the performance difference these optimizations make on benchmarks and microbenchmarks.

How expensive is Deletion?

First of all, why should we even bother optimizing deletion? Deletion should be fairly rare, and many JavaScript developers already know not to use it when possible. At the same time, we generally try to avoid having large hidden performance cliffs. To demonstrate how a simple delete statement can have a surprising effect on performance, I wrote a small benchmark that renders a scene progressively, and measures the time it takes to render each frame. This benchmark is not designed to precisely measure performance, but rather to make it easy to see large performance differences.

You can run the program yourself by pressing the run button below, which will calculate a new color value for every pixel in the image. It will then display how long it took to render, in milliseconds.

Next, we can try executing a single delete statement to a hot part of the code. You can do this by checking the “Use Delete” checkbox above, and clicking run again. This is what the code looks like:

class Point {
    constructor(x, y, z) {
        this.x = x
        this.y = y
        this.z = z
        this.a = 0
        if (useDelete)
            delete this.a
    }
}

The following results were measured on my personal computer running Fedora 30, comparing tip of tree WebKit (r259643) with all of the delete optimizations both enabled and disabled. I also used Firefox 74 and Chromium 77 from the Fedora repositories.

Here, we can see that the addition of a delete statement to a hot section of code can send the performance off a cliff! The primary reason for this is that deletion in JavaScriptCore used to disable all of the inline caching optimizations for an object, including when putting and getting properties of the object. Let’s see why.

Structures and Transitions

JavaScript code can be extremely dynamic, making it tricky to optimize. While many other languages have fixed-size structs or classes to help the compiler make optimizations, objects in JavaScript behave like dictionaries that allow associating arbitrary values and keys. In addition, objects frequently change their shape over time. For performance, JavaScriptCore uses multiple internal representations for objects, choosing between them at runtime based on how a program uses them. The default representation of objects use something called a Structure to hold the general shape of an object, allowing many instances that have the same shape to share a structure. If two objects have the same structure ID, we can quickly tell that they have the same shape.

class A {
  constructor() {
    this.x = 5
    this.y = 10
  }
}
let a = new A() // Structure: { x, y }
let b = new A() // same as a

Structures store the names, attributes (such as read-only), and offsets of all of the properties owned by the object, and often the object’s prototype too. While the structure stores the shape, the object maintains its property storage, and the offset gives a location inside this storage.

At a high level, adding a property to an object using this representation goes as follows:

1) Construct a new structure with the property added.
2) Place the value into the property storage.
3) Change the object’s structure ID to point to the new structure.

One important optimization is to cache these structure changes in something called a structure transition table. This is the reason why in the example above, both objects share the same structure ID rather than having separate but equivalent structure objects. Instead of creating a new structure for b, we can simply transition to the structure we already created for a.

This representation improves performance for the case when you have a small number of object shapes that change predictably, but it is not always the best representation. For example, if there are many properties that are unlikely to be shared with other objects, then it is faster to create a new structure for every instance of this object. Since we know that there is a 1:1 relationship between this object and its structure, we no longer have to update the structure ID when making changes to the shape of the object. This also means that we can no longer cache most property accesses, since these optimizations rely on checking the structure ID.

In previous versions of JavaScriptCore, this is the representation that was chosen for any object that had a deleted property. This is why we see such a large performance difference when a delete is performed on a hot object.

The first optimization of my internship was to instead cache deletion transitions, allowing these objects to continue to be cached.

class A {
  constructor() {
    this.x = 5
    this.y = 10
    delete this.y
  }
}
let a = new A() // Structure: { x } from { x, y }
let b = new A() // same as a

Before this change, both a and b had distinct structures. Now they are the same. This not only removes our performance cliff, but will enable further optimizations to the deletion statement itself.

In the path tracing example above, we get the following performance results:

Inline Caching

Now that we can cache deletion transitions, we can further optimize the act of property deletion itself. Getting and putting properties both use something called an inline cache to do this, and now deletion does too. The way this works is by emitting a generic version of these operations that modifies itself over time to handle frequent cases faster.

I implemented this optimization in all three of our JIT compilers for delete, and our interpreter also supports inline caching. To learn more about how JavaScriptCore uses multiple compilers to trade-off latency and throughput, see this blog post. The summary is that code is first executed by the interpreter. Once it is sufficiently hot, it gets compiled into increasingly optimized pieces of machine code by our compilers.

An inline cache uses structures to keep track of the object shapes it has seen. We will see how inline caching now works for delete, which will give us some insight into how it works for getting and putting properties too.

When we see a delete property statement, we will first emit something called a patchable jump. This is a jump instruction that can be changed later on. At first, it will simply jump to some code that calls the slow path for property deletion. This slow path will store a record of this call, however, and after we have performed a few accesses, we will attempt to emit our first inline cache. Let’s walk through an example:

function test() {
    for (let i = 0; i < 50; ++i) {
        let o = { }
        if (i < 10)
            o.a = 1
        else if (i < 20)
            o.b = 1
        else
            o.c = 1

        delete o.a
    }
}

for (let i = 0; i < 100; ++i)
    test()

First, we run test a few times causing it to tier up to the baseline compiler. Inside test(), we see that the code deletes properties from objects with three different structures, while giving enough time for us to emit a new inline cache for each. Let’s see what the baseline code looks like:

          jmp [slow path] ; This jump is patchable
continuation:
          [package up boolean result]
          ; falls through to rest of program
...
slow path:
          [call into C++ slow path]
          jmp [next bytecode]

This code performs a few checks, then jumps either to the slow path call or the patchable jump target. The slow path call is placed outside the main portion of the generated code to improve cache locality. Right now, we see that the patchable jump target also points to the slow path, but that will change.

At this point in time, deletion is a jump to the slow path plus a call into C++, which is quite expensive.

Every time the slow path call is made, it collects and saves information about its arguments. In this example, the first case that it decided to cache is the delete miss for the structure with shape { c }. It generates the following code, and repatches the patchable jump target to go here instead:

      0x7f3fa96ff6e0: cmp [structure ID for { c }], (%rsi)
      0x7f3fa96ff6e6: jnz [slow path]
      0x7f3fa96ff6ec: mov $0x1, %eax ; return value true
      0x7f3fa96ff6f1: jmp [continuation]
      0x7f3fa96ff6f6: jmp [slow path]

We see that if the structure ID matches, we simply return true and jump back to the continuation which is responsible for packaging up the boolean result and continuing execution. We save ourselves an expensive call into C++, running just a few instructions in its place.

Next, we see the following inline cache get generated as the engine decides to cache the next case:

      0x7f3fa96ff740: mov (%rsi), %edx
      0x7f3fa96ff742: cmp [structure ID for { c }], %edx
      0x7f3fa96ff748: jnz [case 2]
      0x7f3fa96ff74e: mov $0x1, %eax
      0x7f3fa96ff753: jmp [continuation]
case 2:
      0x7f3fa96ff758: cmp [structure ID for { a }], %edx
      0x7f3fa96ff75e: jnz [slow path]
      0x7f3fa96ff764: xor %rax, %rax; Zero out %rax
      0x7f3fa96ff767: mov %rax, 0x10(%rsi); Store to the property storage
      0x7f3fa96ff76b: mov [structure ID for { } from { a }], (%rsi)
      0x7f3fa96ff771: mov $0x1, %eax
      0x7f3fa96ff776: jmp [continuation]

Here, we see what happens if the property exists. In this case, we zero the property storage, change the structure ID, and return true (0x1), to the continuation for our result to be packaged up.

Finally, we see the complete inline cache:

      0x7f3fa96ff780: mov (%rsi), %edx
      0x7f3fa96ff782: cmp [structure ID for { c }], %edx
      0x7f3fa96ff788: jnz [case 2]
      0x7f3fa96ff78e: mov $0x1, %eax
      0x7f3fa96ff793: jmp [continuation]
case 2:
      0x7f3fa96ff798: cmp [structure ID for { a }], %edx
      0x7f3fa96ff79e: jnz [case 3]
      0x7f3fa96ff7a4: xor %rax, %rax
      0x7f3fa96ff7a7: mov %rax, 0x10(%rsi)
      0x7f3fa96ff7ab: mov $0x37d4, (%rsi)
      0x7f3fa96ff7b1: mov $0x1, %eax
      0x7f3fa96ff7b6: jmp [continuation]
case 3:
      0x7f3fa96ff7bb: cmp [structure ID for { b }], %edx
      0x7f3fa96ff7c1: jnz [slow path]
      0x7f3fa96ff7c7: mov $0x1, %eax
      0x7f3fa96ff7cc: jmp [continuation]

When we run the code above in different browsers, we get the following performance numbers with and without delete optimizations:

Inlining Delete

Now that we have seen how an inline cache is generated, we would like to see how we can feed back this profiling information into the compiler. We will see how our first optimizing compiler, the DFG, can attempt to inline delete statements like it already does for puts and gets. This will then allow all of the other optimizations we have to see inside the delete statement, rather than seeing a black box.

We will demonstrate this by looking at just one of these optimizations, our object allocation sinking and elimination phase. This phase attempts to prevent allocations of objects when they do not escape their scope, but it previously saw property deletion as an escape. Consider the following code:

function noEscape(i) {
    let foo = { i }
    delete foo.x
    return foo.i
}

for (let i = 0; i < 1000000; ++i)
    noEscape(5)

As the delete is run, it will first get an inline cache. As the code continues to tier up, the DFG tier will eventually look at the cases covered by the inline cache and see only one structure. It will decide to try inlining the delete.

It first speculates that the structure ID of the target object is for the structure with shape { i }. That is, we check that the structure of the object matches, and if it does not, we exit to lower-tier code. This means that our engine can assume that the structures match for any subsequent code. If this assumption turns out to be false, we will eventually recompile the code again without it.

D@25:   NewObject()
D@30:   CheckStructure(D@25, Structure for { })
D@31:   PutByOffset(D@25, D@25, D@36, id0{i})
D@32:   PutStructure(D@25, Structure for { i })
D@36:   CheckStructure(D@25, Structure for { i })
D@37:   JSConstant(True)
D@42:   CheckStructure(D@25. Structure for { i })
D@43:   GetByOffset(D@25, 0)
D@34:   Return(D@43)

We see here that we make a new object, and then see inlined versions of put, delete and get. Finally, we return.

If the delete statement were compiled into a DeleteByID node, later optimizations could not do much to optimize this code. They don’t understand what effects the delete has on the heap or the rest of the program. We see that once inlined however, the delete statement actually becomes very simple:

D@36:   CheckStructure(D@25, Structure for { i })
D@37:   JSConstant(True)

That is, the delete statement becomes simply a check and a constant! Next, while it often isn’t possible to prove every check is redundant, in this example our compiler will be able to prove that is can safely remove them all. Finally, our object allocation elimination phase can look at this code and remove the object allocation. The final code for noEscape() looks like this:

D@36:   GetStack(arg1)
D@25:   PhantomNewObject()
D@34:   Return(D@36)

PhantomNewObject does not cause any code to be emitted, making our final code trivial! We get the following performance results:

Results

The caching of delete transitions caused a 1% improvement overall on the speedometer benchmark. The primary reason for this is that the EmberJS Debug subtest uses the delete statement frequently, and delete transition caching progresses this benchmark by 6%. This subtest was included in the benchmark because it was discovered that many websites do not ship the release version of EmberJS. The other optimizations we discussed were all performance-neutral on the macro-benchmarks we track.

In conclusion, we have seen how inline caching works first-hand, and even progressed a few benchmarks in the process! The order that we learned about these optimizations (and indeed the order that I learned about them this term) follows closely the order that they were discovered too. If you would like to learn more, check out these papers on implementations of inline caching in smalltalk and self, two languages that inspired the design of JavaScript. We can see the evolution from monomorphic inline caches in smalltalk to polymorphic inline caches with inlining support in self, just like we saw on our tour today. Implementing inline caching for delete was an extremely educational experience for me, and I hope you enjoyed reading about it.

While deletion will still be rare, it still feels great to make JSC even more robust. You can get in touch by filing a bug or by joining the WebKit slack workspace. You can also consider downloading the code and hacking on it yourself!

April 10, 2020 05:11 PM

April 08, 2020

Web Animations in Safari 13.1

Surfin’ Safari

With the release of iOS 13.4, iPadOS 13.4, and Safari 13.1 in macOS Catalina 10.15.4, web developers have a new API at their disposal: Web Animations. We have been working on this feature for well over 2 years and it’s now available to all Safari users, providing a great programmatic API to create and control animations using JavaScript.

In this article we will discuss the benefits of this new API, how best to detect its availability, how it integrates with existing features such as CSS Animations and CSS Transitions, and the road ahead for animation technology in WebKit.

A Little History

The WebKit team came up with original proposals for CSS Animations and CSS Transitions back in 2007 and announced them on this blog. Over the years these specifications have matured and become W3C standards and an integral part of the web platform.

With these technologies, integrating animations in web content became simple, removing the requirement for developers to write JavaScript while providing better performance and power consumption by allowing browsers to use hardware acceleration when available and integrate animations in the layout and rendering pipeline.

As a web developer, I’ve enjoyed the simplicity and great performance of CSS Animations and CSS Transitions. I believe it is those virtues that have allowed animations to become a powerful tool for web developers. However, in my day-to-day work, I also found a few areas where these technologies were frustrating: dynamic creation, playback control, and monitoring an animation’s lifecycle.

The great news is that these issues are all taken care of by the new Web Animations API. Let’s see how to leverage this new API to improve everyday code in these areas.

Part I – Animation Creation

While CSS allows you to very easily animate a state change (the appearance of a button, for instance) it will be a lot trickier if the start and end values of a given animation are not known ahead of time. Typically, web developers would deal with those cases with CSS Transitions:

// Set the transition properties and start value.
element.style.transitionProperty = "transform";
element.style.transitionDuration = "1s";
element.style.transform = "translateX(0)";

// Force a style invalidation such that the start value is recorded.
window.getComputedStyle(element);

// Now, set the end value.
element.style.transform = "translateX(100px)";

While this may look like a reasonable amount of code, there are additional factors to consider. Forcing a style invalidation will not let the browser perform that task at the time it will judge most appropriate. And this is just one single animation; what if another part of the page, possibly even a completely different JavaScript library, also needed to create an animation? This would multiply forced style invalidations and degrade performance.

And if you consider using CSS Animations instead, you would have to first generate a dedicated @keyframes rule and insert it inside a <style> element, failing to encapsulate what is really a targeted style change for a single element and causing a costly style invalidation.

The value of the Web Animations API lies in having a JavaScript API that preserves the ability to let the browser engine do the heavy lifting of running animations efficiently while enabling more advanced control of your animations. Using the Web Animations API, we can rewrite the code above with a single method call using Element.animate():

element.animate({ transform: ["translateX(0)", "translateX(100px)"] }, 1000);

While this example is very simple, the single Element.animate() method is a veritable Swiss Army knife and can express much more advanced features. The first argument specifies CSS values while the second argument specifies the animation’s timing. We won’t go into all possible timing properties, but all of the features of CSS Animations can be expressed using the Web Animations API. For instance:

element.animate({
    transform: ["translateX(500px)"], // move by 500px 
    color: ["red", "blue", "green"]   // go through three colors
}, {
    delay: 500,            // start with a 500ms delay
    easing: "ease-in-out", // use a fancy timing function 
    duration: 1000,        // run for 1000ms
    iterationCount: 2,     // repeat once
    direction: "alternate" // run the animation forwards and then backwards
});

Now we know how to create an animation using the Web Animations API, but how is it better than the code snippet using CSS Transitions? Well, that code tells the browser what to animate and how to animate it, but does not specify when. Now the browser will be able to process all new animations at the next most opportune moment with no need to force a style invalidation. This means that animations you author yourself as well as animations that may originate from a third-party JavaScript library – or even in a different document (for instance, via an <iframe>) – will all be started and progress in sync.

Part II – Playback Control

Another shortcoming with existing technologies was the lack of playback control: the ability to pause, resume, and seek animations and control their speed. While the animation-play-state property allows control of whether a CSS Animation is paused or playing, there is no equivalent for CSS Transitions and it only controls one aspect of playback control. If you want to set the current time of an animation, you can only resort to roundabout techniques such as clever manipulations of negative animation-delay values, and if you want to change the speed at which an animation plays, the only option is to manipulate the timing values.

With the Web Animations API, all these concerns are handled by dedicated API. For instance, we can manipulate playback state using the play() and pause() methods, query and set time using the read-write currentTime property, and control speed using playbackRate without modifying duration:

// Create an animation and keep a reference to it.
const animation = element.animate(…);

// Pause the animation.
animation.pause();

// Change its current time to move forward by 500ms.
animation.currentTime += 500;

// Slow the animation down to play at half-speed.
animation.playbackRate = 0.5;

This gives developers control over the behavior of animations after they have been created. It is now trivial to perform tasks which would have been previously daunting. To toggle the playback state of an animation at the press of a button:

button.addEventListener("click", event => {
    if (animation.playState === "paused")
        animation.play();
    else
        animation.pause(); 
});

To connect the progress of an animation to an <input type="range"> element:

input.addEventListener("input", event => {
    animation.currentTime = event.target.value * animation.effect.getTiming().duration;
});

Thanks to the Web Animations API making playback control a core concept for animations, these simple tasks are trivial and more complex control over an animation’s state can be achieved.

Part III – Animation Lifecycle

While the transition* and animation* family of DOM events provide information about when CSS-originated animations start and end, it is difficult to use them correctly. Consider fading out an element prior to removing it from the DOM. Typically, this would be written this way using CSS Animations:

@keyframes fade-out {
    to { opacity: 0 }
}
element.style.animationName = "fade-out";
element.addEventListener("animationend", event => {
    element.remove();
});

Seems correct, but on further inspection there are problems. This code will remove the element as soon as an animationend event is dispatch on the element, but since animation events bubble, the event could come from an animation completing in a child element in the DOM hierarchy, and the animations could even be named the same way. There are measures you can take to make this kind of code safer, but using the Web Animations API, writing this kind of code is not just easier but safer because you have a direct reference to an Animation object rather than working through animation events scoped to an element’s hierarchy. And on top of this, the Web Animations API uses promises to monitor the ready and finished state of animations:

let animation = element.animate({ opacity: 0 }, 1000);
animation.finished.then(() => {
    element.remove();
});

Consider how complex the same task would have been if you wanted to monitor the completion of a number of CSS Animations targeting several elements prior to removing a shared container. With the Web Animations API and its support for promises this is now expressed concisely:

// Wait until all animations have finished before removing the container.
let animations = container.getAnimations();
Promise.all(animations.map(animation => animation.finished).then(() => {
    container.remove();
});

Integration with CSS

Web Animations are not designed to replace existing technologies but rather to tightly integrate with them. You are free to use whichever technology you feel fits your use case and preferences best.

The Web Animations specification does not just define an API but also aims to provide a shared model for animations on the web; other specifications dealing with animations are defined with the same model and terminology. As such, it’s best to understand Web Animations as the foundation for animations on the web, and think of its API as well as CSS Transitions and CSS Animations as layers above that shared foundation.

What does this mean in practice?

To make a great implementation of the Web Animations API, we had to start off fresh with a brand new and shared animation engine for CSS Animations, CSS Transitions, and the new Web Animations API. Even if you don’t use the Web Animations API, the CSS Animations and CSS Transitions you’ve authored are now running in the new Web Animations engine. No matter which technology you choose, the animations will all run and update in sync, events dispatched by CSS-originated animation and the Web Animations API will be delivered together, etc.

But what may matter even more to authors is that the entire Web Animations API is available to query and control CSS-originated animations! You can specify animations in pure CSS but also control them with the Web Animations APIs using Document.getAnimations() and Element.getAnimations(). You can pause all animations running for a given document this way:

document.getAnimations().forEach(animation => animation.pause());

What about SVG? At this stage, SVG Animations remain distinct from the Web Animations model, and there is no integration between the Web Animations API and SVG. This remains an area of improvement for the Web platform.

Feature Detection

But before you start adopting this new technology in your projects, there are some further practical considerations that you need to be aware of.

Since this is new technology, it is important to use feature detection as users gradually update their browsers to newer versions with support for Web Animations. Detecting the availability of the various Web Animations API is simple. Here is one correct way to detect the availability of Element.animate():

if (element.animate) 
    element.animate(…); // Use the Web Animations API.
else// Fall back to other technologies.

While Safari is shipping the entire Web Animations API as a whole, other browsers, such as Firefox and Chrome, have been shipping Element.animate() for a long time already, so it’s critical to test individual features separately. So, if you want to use Document.getAnimations() to query all running animations for a given document, make sure to feature detect that feature’s availability. As such the snippet further above would be better written this way:

if (document.getAnimations)
   document.getAnimations().forEach(animation => animation.pause());
else// Fall back to another approach.

There are parts of the API that aren’t yet implemented in Safari. Notably, effect composition is not supported yet. Before trying to set the composite property when defining the timing of your animation, you can check whether it is supported this way:

const isEffectCompositionSupported = !!(new KeyframeEffect(null, {})).composite;

Animations in Web Inspector

Also new in the latest Safari release: CSS Animations and CSS Transitions can be seen in Web Inspector in the new Media & Animations timeline in the Timelines Tab, organized by animation-name or transition-property properties for each element target. When used alongside other timelines, it can be helpful to correlate how that particular CSS Animation or CSS Transition was created, such as by looking at script entries in the JavaScript & Events timeline.

Web Inspector Media and Animations Timeline

Starting in Safari Technology Preview 100, Web Inspector shows all animations, whether they are created by CSS or using the JavaScript API, in the Graphics Tab. It visualizes each individual animation object with lines for delays and curves for keyframes, and provides an in-depth view of exactly what the animation will do, as well as information about how it was created and some useful actions, such as logging the animation object in the Console. These are the first examples of what Web Animations allow to improve Web Inspector for working with animations, and we’re looking forward to improving our tools further.

Web Inspector Graphics Tab in Light Mode

The Road Ahead

Shipping support for Web Animations is an important milestone for animations in WebKit. Our new animation engine provides a more spec-compliant and forward-looking codebase to improve on. This is where developers like you come in: it’s important we hear about any compatibility woes you may run into with existing content using CSS Animations and CSS Transitions, but also when adopting the new Web Animations API in your content.

The transition to the new Web Animations engine allowed us to address known regressions and led to numerous progressions with Web Platform Tests, improving cross-browser compatibility. If you find your animations running differently in Safari, please file a bug report on bugs.webkit.org so that we can diagnose the issue and establish if it is an intentional change in behavior or a regression that we should address.

We’re already working on improving on this initial release and you can keep an eye out for future improvements by monitoring this blog and the release notes for each new Safari Technology Preview release.

You can also send a tweet to @webkit or @jonathandavis to share your thoughts on our new support for Web Animations.

April 08, 2020 08:00 PM

Release Notes for Safari Technology Preview 104

Surfin’ Safari

Safari Technology Preview Release 104 is now available for download for macOS Catalina and macOS Mojave. If you already have Safari Technology Preview installed, you can update in the Software Update pane of System Preferences on macOS.

This release covers WebKit revisions 258409-259472.

Web Inspector

  • Elements
    • Created a visual editor for box-shadow (r259170)
  • Network
    • Changed “Preserve Log” to be the last navigation item to be hidden at small widths (r258622)
    • Ensured that the method is escaped when using “Copy as cURL” (r259141)
  • Sources
    • If the hovered object is a DOM node, highlight it when hovering the title in the object preview popup (r258621)
  • Storage
    • Added support for editing cookies (r259173)
  • Console
    • Added logs for Intelligent Tracking Prevention Debug Mode and Ad Click Attribution Debug Mode in the Console (r259236)
    • Added a console message when legacy TLS is used (r258890, r258957)
  • Miscellaneous
    • Added a new WebSocket icon (r259329)
    • Added the keyboard shortcut for showing the Search Tab and Settings Tab to the titles of their respective tab tab items (r259101)
    • Fixed a bug where the tab bar thought it was too wide causing a tab bar item to be hidden (r258623)
    • Fixed a bug where the currently focused node was changed when detaching into a separate window (r259277)
    • Prevented disabled buttons from being focusable (r258730)

Web API

  • Avoided querying pasteboard strings while dragging content over a potential drop target (r258980)
  • Added label text to suggested values for a <datalist> element (r259330)
  • Fixed <datalist> dropdown suggestions table able to be scrolled too far (r259198)
  • Fixed a change event getting dispatched when a <textarea> gets changed without focus (r258532)
  • Fixed event listeners registered with the once option that get garbage collected too soon (r259009)
  • Fixed the name of X-Content-Type HTTP header in console logging (r258789)
  • Fixed a bug that could cause elements to disappear with combinations of transforms and overflow (r259015)
  • Fixed function passed to addEventListener may get garbage collected before the event listener is even added (r258959)
  • Prevented Force Touch preview on file:/// URL works while clicking on the URL is blocked (r259056)
  • Removed synchronous termination of service workers (r259383)
  • Sanitized the suggested download filename (r258741)
  • Updated Intl.NumberFormat.prototype.format to preserve a sign of -0 (r259370)
  • Updated to make sure a preflight fails if response headers are invalid (r258631)
  • Updated to consider the referrer-policy in the append Origin header algorithm (r259036)

CSS

  • Added support for :is() (r259261)
  • Fixed changes in grid or elements inside the grid affecting margin on other elements in the grid (r258735)

Web Animations

  • Marked promises as handled when rejected (r258702)
  • Fixed onwebkit{animation, transition}XX handlers missing from Document (r258697)

Intersection Observer

  • Fixed Intersection Observer intersections when zoomed in (r258787, r258791)

Media

  • Changed HTMLTrackElement to be pending while it is waiting for LoadableTextTrack request (r259138)
  • Fixed animated PNG issue where it would play the frames one time more than the image loopCount (r258817)

WebRTC

  • Added initial support for WebRTC HEVC (r259452)
  • Applied video rotation at the source level if WebRTC sink ask so (r258504)
  • Fixed RTCRtpSender of kind video to have a null dtmf attribute (r258502)
  • Fixed audio failing to capture stream if the AudioSession gets interrupted (r258977)
  • Replaced the host candidate IP address in SDP with the corresponding mDNS name (r258545)
  • Supported inserting text or dictation alternative by simulating keyboard input (r258873)
  • Supported resolution of IPv6 STUN/TURN addresses (r259338)

WebAuthn

  • Improved title and text used in prompts (r258961)

Bug Fixes

  • Fixed getting stuck in a loading state when seeking on hulu.com (r259404)

Safari extensions

  • Added support for restoring extension tabs across launches of Safari

April 08, 2020 05:00 PM

April 03, 2020

New WebKit Features in Safari 13.1

Surfin’ Safari

This year’s spring releases of Safari 13.1 for macOS Catalina, iPadOS, iOS, and watchOS bring a tremendous number of WebKit improvements for the web across Apple’s platforms. All of this with many more updates for improved privacy, performance, and a host of new tools for web developers.

Here’s a quick look at the new WebKit enhancements available with these releases.

Pointer and Mouse Events on iPadOS

The latest iPadOS 13.4 brings desktop-class pointer and mouse event support to Safari and WebKit. To ensure the best experience, web developers can use feature detection and adopt Pointer Events. Since a mouse or trackpad won’t send touch events, web content should not depend on touch events. Pointer Events will specify whether a mouse/trackpad or touch generated the event.

Web Animations API

These releases ship with support for the Web Animations API, a web standard offering developers a JavaScript API to create, query, and control animations, including direct control of CSS Animations and CSS Transitions. It offers a convenient unified model for programmatic animations, CSS Animations and Transitions. They can all now be directly controlled to pause, resume, seek, or change speed and direction, with less manual calculation. In addition, Web Inspector has been updated to show entries for them in the Media and Animations timeline.

Web Inspector Media and Animations Timeline

Read more about Web Animations in WebKit and Web Animations in Safari 13.1.

Async Clipboard API

WebKit brings the Async Clipboard API to this release of Safari. It provides access to the system clipboard and clipboard operations while keeping the webpage responsive. This API is much more flexible than DataTransfer, allowing developers to write multiple items with multiple types per item. Additionally, it brings programmatic paste to all websites on macOS and iOS.

The implementation is available through the navigator.clipboard API which must be called within user gesture event handlers like pointerdown or pointerup, and only works for content served in a secure context (e.g. https://). Instead of a permissions-based model for reading from the clipboard, a native UI is displayed when the page calls into the clipboard API; the clipboard can only be accessed if the user then explicitly interacts with the platform UI.

For more details see the original API specifications.

JavaScript Improvements

These releases include new JavaScript support for the replaceAll() method for strings and the new nullish coalescing operator (??).

The String.prototype.replaceAll() method does exactly what it suggests, replacing all occurrences of a given value in the string with a replacement string.

"too good to be true".replaceAll(" ", "-");
// too-good-to-be-true

Learn more from the String.prototype.replaceAll Proposal.

The nullish coalesing operator (??) is a new operator that only evaluates and returns the expression on the right of the ?? if the result of the expression on the left of the ?? is null or undefined.

const nullValue = null;
const resultWithNull = nullValue ?? "default";        // "default"

const nonNullValue = 0;
const resultWithNonNull = nonNullValue ?? "default";  // 0

For more details see the Nullish Coalescing for JavaScript proposal.

ResizeObserver

The addition of ResizeObserver in WebKit enables developers to design components that are responsive to the container instead of just the viewport. This allows more flexible responsive designs, where containers can react to window size changes, orientation changes, and additions of new elements to the layout. The JavaScript API avoids the circular dependencies of trying to use media queries for element sizes in CSS. ResizeObserver addresses the problem by providing a means to observe changes in the layout size of elements.

For more read about ResizeObserver in WebKit.

HTML enterkeyhint Attribute

On iOS, WebKit supports the enterkeyhint attribute that allows a content author to provide a label for the enter key on virtual keyboards with values for: enter, done, go, next, previous, search, and send.

See the HTML Standard for more information.

CSS Shadow Parts

New support for CSS Shadow Parts allows web authors to style parts of web components without the need to understand how they are constructed. This provides a mechanism to define author-defined style parts akin to input element’s pseudo elements in WebKit.

See the CSS Shadow Parts specification for more information.

More CSS Additions

There are a number of new CSS additions in WebKit. New font keywords are available for using platform-specific fonts including ui-serif, ui-sans-serif, ui-monospace, and ui-rounded . WebKit also supports the line-break: anywhere value that adds a soft wrap opportunity around each character unit, including around any punctuation or preserved white spaces, in the middle of words, even ignoring limits against line breaks. Finally, WebKit includes support for the dynamic-range media query allowing authors to create styles specific to display capabilities.

@media (dynamic-range: standard) {
    .example {
        /* Styles for displays not capable of HDR. */
        color: rgb(255, 0, 0);
    }
}

@media (dynamic-range: high) {
    .example {
        /* Styles for displays capable of HDR. */
        color: color(display-p3 1 0 0);
    }
}

Media APIs

Safari was the first to ship a picture-in-picture feature and has long supported the ability to specify a playback target for AirPlay. Safari for iOS and macOS now supports the standardizations of these features with the Picture-in-Picture API and Remote Playback API. There is also new support for HLS date-range metadata in DataCue.

Subtitles and Captions

WebKit is introducing enhancements to TextTrackCue for programmatic subtitle and caption presentation. This enables video publishers to continue storing captions in legacy or custom formats, and deliver them programmatically and still maintain the ability for users to control the presence and style of captions with system accessibility settings.

For more detail, see the WebKit TextTracks Explainer.

WebRTC Legacy Audio and Proxy Support

WebRTC support in WebKit has been updated so it can work in more places, with more systems. Support for DTMF allows WebKit to interact with legacy audio services. WebRTC Proxy support allows WebRTC to work in enterprise networks where firewalls may forbid UDP and require TCP to go through a dedicated proxy.

Performance Improvements

WebKit continues to deliver performance gains on benchmarks in these releases while also optimizing memory use. This release includes an 8-10% improvement on the Jetstream 2 benchmark. JavaScript Promises in particular showed a 2× improvement in the async-fs benchmark on JetStream 2. IndexedDB showed an improvement of 1.3× to 5× faster than before for most operations. There’s also faster Service Worker startup and more efficient CSS media query updates. Improved back-forward responsiveness helps history navigations feel snappier. Plus, a new Web Assembly interpreter dramatically improves startup time by around 8× for large WASM apps.

Security Improvements

WebKit has continued to harden security by fixing a number of bugs found through a process known as fuzzing. Following our announcement of deprecating TLS 1.0 and TLS 1.1 connections, this release now adds a “Not Secure” warning when connecting to a site where any resource is using either of these deprecated encryption protocols.

Intelligent Tracking Prevention Updates

There are several new enhancements to Intelligent Tracking Prevention including full third-party cookie blocking, cross-site document.referrers downgraded to their origins, and an expiry on non-cookie website data after seven days of Safari use and no user interaction on the website.

Read the “Full Third-Party Cookie Blocking and More” blog post for details.

Web Platform Quality Improvements

Areas of improved standards compliance and browser interoperability include more compatible gradient and position parsing, color component rounding, new support for the Q unit, and better calc() computed styles.

Web Inspector Updates

Web Inspector in Safari 13.1 includes new debugging experiences and adds several new tools to help web developers test functionality or identify issues.

Sources Tab

A new Sources Tab combines the Resources Tab and Debugger Tab into a single view, keeping more critical information in one place without the need to switch back and forth. Among the improvements, it includes improved support for debugging workers and has new JavaScript breakpoints, such as pausing on All Events or on All Microtasks.

Also new in the Sources Tab, developers can create use the “+” button in the lower left of the navigation sidebar to add an Inspector Bootstrap Script or Local Override. The Inspector Bootstrap Scripts is a snippet of JavaScript that is guaranteed to be the first script evaluated after any new global object is created in any page or sub-frame, regardless of URL, so long as Web Inspector is open. A Local Override can be added to override any resource loaded into the page, giving developers the ability to change files and preview those the changes on pages that they might ordinarily not be able to change.

Both the Sources Tab and the Network Tab also benefit from improved displaying of HTML and XML content, including being able to pretty print or viewing any request/response data as a simulated DOM tree.

Layers Tab

The Layers Tab is also newly available in this release. It provides a 3D visualization and complete list of the rendering layers used to display the page. It also includes information like layer count and the memory cost of all the layers, both of which can help point developers to potential performance problems.

Read the “Visualizing Layers in Web Inspector” blog post for details.

Script Blackboxing

Script Blackboxing is another powerful tool, focused on helping developers debug behaviors built on top of a JavaScript library or framework. By setting up a blackbox for any library or framework script, the debugger will ignore any pauses that would have happened in that script, instead deferring the pause until JavaScript execution continues to a statement outside of the blackboxed script.

Redesigned Color Picker

Other additions to Web Inspector give content authors more insight for design and user experience. A redesigned color picker uses a square design for more precise color selection and includes support for wide-gamut colors with a white guide line that shows the edge of sRGB to Display-P3 color space.

Learn more from the “Wide Gamut Color in CSS with Display-P3” blog post.

Customized AR QuickLook

AR QuickLook Custom HTML BannerIn Safari on iOS 13.3 or later, users can launch an AR experience from the web where content authors can customize a banner that overlays the AR view. It’s possible to customize:

  • Apple Pay button styles
  • Action button label
  • An item title
  • Item subtitle
  • Price

Or, authors can create an entirely custom banner with HTML:

https://example.com/example.usdz#custom=https://example.com/customBanner.html

For more information, read about Adding an Apple Pay Button or a Custom Action in AR Quick Look.

Feedback

These improvements are available to users running watchOS 6.2, iOS and iPadOS 13.4, macOS Catalina 10.15.4, macOS Mojave 10.14.6 and macOS High Sierra 10.13.6. These features were also available to web developers with Safari Technology Preview releases. Changes in this release of Safari were included in the following Safari Technology Preview releases: 90, 91, 92, 93, 94, 95, 96, 97, 98. Download the latest Safari Technology Preview release to stay on the forefront of future web platform and Web Inspector features. You can also use the WebKit Feature Status page to watch for changes to your favorite web platform features.

Send a tweet to @webkit or @jonathandavis to share your thoughts on this release.. If you run into any issues, we welcome your bug reports for Safari, or WebKit bugs for web content issues.

April 03, 2020 05:00 PM

March 26, 2020

Release Notes for Safari Technology Preview 103

Surfin’ Safari

Safari Technology Preview Release 103 is now available for download for macOS Catalina and macOS Mojave. If you already have Safari Technology Preview installed, you can update in the Software Update pane of System Preferences on macOS.

This release covers WebKit revisions 257162-258409.

Web Inspector

  • Merged the toolbar and tab bar to save vertical space (r257759, r257765, r257810)
  • Redesigned resource and action identifier icons (r257753, r257757, r257791, r258039)
  • Allowed the use of dark mode theme independently from the system-wide theme (r257620, r257801, r257833)
  • Annotated tabs so that they are properly recognized as such (r257965)
  • Changed to not re-cycle through items in the Styles or Computed details sidebar panel when pressing tab (r257959)
  • Fixed clicking a button navigation item to focus it, allowing for subsequent keyboard navigation (r257411)
  • Supported expanding and collapsing details sections with the spacebar or “enter” key (r258058)
  • Supported cycling through scope bar items by pressing tab (r258057)

Web API

  • Aligned garbage collection for XMLHttpRequest objects with the specification (r258159)
  • Aligned Fetch ‘request Origin header’ behavior with the specification (r258194)
  • Changed the case of an activating service worker getting terminated to go to an activated state (r257929)
  • Changed to load async scripts with a low priority (r257566)
  • Changed to accept a Document as an explicit root (r257976)
  • Implemented wildcard behavior for Cross-Origin-Expose-Headers (r258330)

CSS

  • Made the style invalidation accurate for user-action pseudo classes (r258321)
  • Changed to avoid full style resolution on Element::focus() (r257839, r257846)

Page loading

  • Changed fixed sized SVG content to be taken into account when computing visually not empty status (r257952)
  • Changed layers going from visually empty to non-empty to immediately trigger layer unfreezing (r257840)

Back-Forward Cache

  • Added quirk to disable to back-forward cache on docs.google.com (r257714)

JavaScript

  • Updated custom element caching to be aware of different worlds (r257414)

Bug Fixes

  • Fixed leaking DocumentTimeline and CSSTransition objects on CNN.com (r257417)
  • Fixed icloud.com notes text in titles and headings is distorted (r258282)
  • Fixed maps.google.com not loading properly with screen flickering when zooming (r257716)

March 26, 2020 05:00 PM

March 24, 2020

Full Third-Party Cookie Blocking and More

Surfin’ Safari

This blog post covers several enhancements to Intelligent Tracking Prevention (ITP) in iOS and iPadOS 13.4 and Safari 13.1 on macOS to address our latest discoveries in the industry around tracking.

Full Third-Party Cookie Blocking

Cookies for cross-site resources are now blocked by default across the board. This is a significant improvement for privacy since it removes any sense of exceptions or “a little bit of cross-site tracking is allowed.”

It might seem like a bigger change than it is. But we’ve added so many restrictions to ITP since its initial release in 2017 that we are now at a place where most third-party cookies are already blocked in Safari. To keep supporting cross-site integration, we shipped the Storage Access API two years ago to provide the means for authenticated embeds to get cookie access with mandatory user control. It is going through the standards process in the W3C Privacy Community Group right now.

Regardless of the size of this change, there are further benefits, as explored below.

Paves the Way For Other Browsers

Safari continues to pave the way for privacy on the web, this time as the first mainstream browser to fully block third-party cookies by default. As far as we know, only the Tor Browser has featured full third-party cookie blocking by default before Safari, but Brave just has a few exceptions left in its blocking so in practice they are in the same good place. We know Chrome wants this behavior too and they announced that they’ll be shipping it by 2022.

We will report on our experiences of full third-party cookie blocking to the privacy groups in W3C to help other browsers take the leap.

Removes Statefulness From Cookie Blocking

Full third-party cookie blocking removes statefulness in cookie blocking. As discussed in our December 2019 blog post, the internal state of tracking prevention could be turned into a tracking vector. Full third-party cookie blocking makes sure there’s no ITP state that can be detected through cookie blocking behavior. We’d like to again thank Google for initiating this analysis through their report.

Disables Login Fingerprinting

As discussed by Jeremiah Grossman back in 2008 and Tom Anthony in 2012, and set up by Robin Linus in 2016 as a live demo with which you can test your browser, this technique allows a website to invisibly detect where you are logged in and is viable in any browser without full third-party cookie blocking.

Since “global browser state” has been top of mind in the web privacy community as of late, we’d like to point out that cookies themselves are global state and unless the browser blocks or partitions them in third-party contexts, they allow for cross-site leakage of user information such as login fingerprinting.

Additional Benefits

In addition, there are further benefits to full third-party cookie blocking:

  • Disables cross-site request forgery attacks against websites through third-party requests. Note that you still need to protect against forged requests that come in through top frame navigations (see SameSite cookies for guidance).
  • Removes the ability to use an auxiliary third-party domain to identify users. Such a setup could otherwise persist IDs even when users delete website data for the first party.
  • Simplifies things for developers. Now it’s as easy as possible: If you need cookie access as third-party, use the Storage Access API.

What About the Classifier?

ITP’s classifier keeps working to detect bounce trackers, tracker collusion, and link decoration tracking.

Developer Guidance

If yours is among the few websites that still relies on third-party cookies in Safari and has not been affected by ITP in its previous iterations, here’s how you can make things work for your users:

Option 1: OAuth 2.0 Authorization with which the authenticating domain (in your case, the third-party that expects cookies) forwards an authorization token to your website which you consume and use to establish a first-party login session with a server-set Secure and HttpOnly cookie.

Option 2: The Storage Access API with which the third-party can request permission to get access to its first-party cookies.

Option 3: The temporary compatibility fix for popups, see section “Temporary Compatibility Fix: Automatic Storage Access for Popups” in our ITP 2.0 blog post. This compatibility fix allows the third-party to open a popup from your website and upon a tap or click in that popup gain temporary cookie access under the opener page on your website. Note that this compatibility fix will go away in a future version of Safari so only go this route if it saves you time and allows for a graceful transition period.

Cookie Blocking Latch Mode

The original release of ITP featured what we call “cookie blocking latch mode.” It means once a request is blocked from using cookies, all redirects of that request are also blocked from using cookies. Back in 2017 we got a request to allow cookie blocking to open and close on redirects and implemented that behavior. But with full third-party cookie blocking in place, latch mode is back.

7-Day Cap on All Script-Writeable Storage

Back in February 2019, we announced that ITP would cap the expiry of client-side cookies to seven days. That change curbed third-party scripts’ use of first-party cookies for the purposes of cross-site tracking.

However, as many anticipated, third-party scripts moved to other means of first-party storage such as LocalStorage. If you have a look at what’s stored in the first-party space on many websites today, it’s littered with data keyed as various forms of “tracker brand user ID.” To make matters worse, APIs like LocalStorage have no expiry function at all, i.e. websites cannot even ask browsers to put a limit on how long such storage should stay around.

Now ITP has aligned the remaining script-writable storage forms with the existing client-side cookie restriction, deleting all of a website’s script-writable storage after seven days of Safari use without user interaction on the site. These are the script-writable storage forms affected (excluding some legacy website data types):

  • Indexed DB
  • LocalStorage
  • Media keys
  • SessionStorage
  • Service Worker registrations and cache

A Note On Web Applications Added to the Home Screen

As mentioned, the seven-day cap on script-writable storage is gated on “after seven days of Safari use without user interaction on the site.” That is the case in Safari. Web applications added to the home screen are not part of Safari and thus have their own counter of days of use. Their days of use will match actual use of the web application which resets the timer. We do not expect the first-party in such a web application to have its website data deleted.

If your web application does experience website data deletion, please let us know since we would consider it a serious bug. It is not the intention of Intelligent Tracking Prevention to delete website data for first parties in web applications.

Cross-Site document.referrer Downgraded to Origin

All cross-site document.referrers are downgraded to their origin. This matches the already downgraded cross-site referrer request headers.

Detection of Delayed Bounce Tracking

Some trackers have started to delay their navigational redirects, probably to evade ITP’s bounce tracking detection. This manifests as the webpage disappearing and reloading shortly after you land on it. We’ve added logic to cover such delayed bounce tracking and detect them just like instant bounces.

Testing Your Website

We encourage all developers to regularly test their websites with Safari Technology Preview (STP) and our betas of iOS, iPadOS, and macOS. Major changes to ITP and WebKit in general are included in the betas and STP, typically months before shipping. An easy way to stay ahead of the changes is to use STP as a daily development browser. This gives you access to the latest developer tools and helps you discover unexpected behavior in your websites with each release. If you come across bugs or breakage, please file an open source bug report.

March 24, 2020 05:59 PM

March 16, 2020

Víctor Jáquez: Review of the Igalia Multimedia team Activities (2019/H2)

Igalia WebKit

This blog post is a review of the various activities the Igalia Multimedia team was involved along the second half of 2019.

Here are the previous 2018/H2 and 2019/H1 reports.

GstWPE

Succinctly, GstWPE is a GStreamer plugin which allows to render web-pages as a video stream where it frames are GL textures.

Phil, its main author, wrote a blog post explaning at detail what is GstWPE and its possible use-cases. He wrote a demo too, which grabs and previews a live stream from a webcam session and blends it with an overlay from wpesrc, which displays HTML content. This composited live stream can be broadcasted through YouTube or Twitch.

These concepts are better explained by Phil himself in the following lighting talk, presented at the last GStreamer Conference in Lyon:

Video Editing

After implementing a deep integration of the GStreamer Editing Services (a.k.a GES) into Pixar’s OpenTimelineIO during the first half of 2019, we decided to implement an important missing feature for the professional video editing industry: nested timelines.

Toward that goal, Thibault worked with the GSoC student Swayamjeet Swain to implement a flexible API to support nested timelines in GES. This means that users of GES can now decouple each scene into different projects when editing long videos. This work is going to be released in the upcoming GStreamer 1.18 version.

Henry Wilkes also implemented the support for nested timeline in OpenTimelineIO making GES integration one of the most advanced one as you can see on that table:

Feature OTIO EDL FCP7 XML FCP X AAF RV ALE GES
Single Track of Clips ✔ ✔ ✔ ✔ ✔ W-O ✔ ✔
Multiple Video Tracks ✔ ✖ ✔ ✔ ✔ W-O ✔ ✔
Audio Tracks & Clips ✔ ✔ ✔ ✔ ✔ W-O ✔ ✔
Gap/Filler ✔ ✔ ✔ ✔ ✔ ✔ ✖ ✔
Markers ✔ ✔ ✔ ✔ ✖ N/A ✖ ✔
Nesting ✔ ✖ ✔ ✔ ✔ W-O ✔ ✔
Transitions ✔ ✔ ✖ ✖ ✔ W-O ✖ ✔
Audio/Video Effects ✖ ✖ ✖ ✖ ✖ N/A ✖ ✔
Linear Speed Effects ✔ ✔ ✖ ✖ R-O ✖ ✖ ✖
Fancy Speed Effects ✖ ✖ ✖ ✖ ✖ ✖ ✖ ✖
Color Decision List ✔ ✔ ✖ ✖ ✖ ✖ N/A ✖

Along these lines, Thibault delivered a 15 minutes talk, also in the GStreamer Conference 2019:

After detecting a few regressions and issues in GStreamer, related to frame accuracy, we decided to make sure that we can seek in a perfectly frame accurate way using GStreamer and the GStreamer Editing Services. In order to ensure that, an extensive integration testsuite has been developed, mostly targeting most important container formats and codecs (namely mxf, quicktime, h264, h265, prores, jpeg) and issues have been fixed in different places. On top of that, new APIs are being added to GES to allow expressing times in frame number instead of nanoseconds. This work is still ongoing but should be merged in time for GStreamer 1.18.

GStreamer Validate Flow

GstValidate has been turning into one of the most important GStreamer testing tools to check that elements behave as they are supposed to do in the framework.

Along with our MSE work, we found that other way to specify tests, related with produced buffers and events through specific pads, was needed. Thus, Alicia developed a new plugin for GstValidate: Validate Flow.

Alicia gave an informative 30 minutes talk about GstValidate and the new plugin in the last GStreamer Conference too:

GStreamer VAAPI

Most of the work along the second half of 2019 were maintenance tasks and code reviews.

We worked mainly on memory restrictions per backend driver, and we reviewed a big refactor: internal encoders now use GstObject, instead of the custom GstVaapiObject. Also we reviewed patches for new features such as video rotation and cropping in vaapipostproc.

Servo multimedia

Last year we worked integrating media playing in Servo. We finally delivered hardware accelerated video playback in Linux and Android. We worked also for Windows and Mac ports but they were not finished. As natural, most of the work were in servo/media crate, pushing code and reviewing contributions. The major tasks were to rewrite the media player example and the internal source element looking to handle the download playbin‘s flag properly.

We also added WebGL integration support with <video> elements, thus webpages can use video frames as WebGL textures.

Finally we explored how to isolate the multimedia processing in a dedicated thread or process, but that task remains pending.

WebKit Media Source Extension

We did a lot of downstream and upstream bug fixing and patch review, both in WebKit and GStreamer, for our MSE GStreamer-based backend.

Along this line we improved WebKitMediaSource to use playbin3 but also compatibility with older GStreamer versions was added.

WebKit WebRTC

Most of the work in this area were maintenance and fix regressions uncovered by the layout tests. Besides, the support for the Rasberry Pi was improved by handling encoded streams from v4l2 video sources, with some explorations with Minnowboard on top of that.

Conferences

GStreamer Conference

Igalia was Gold sponsor this last GStreamer Conference held in Lyon, France.

All team attended and five talks were delivered. Only Thibault presented, besides the video editing one which we already referred, another two more: One about GstTranscoder API and the other about the new documentation infrastructure based in Hotdoc:

We also had a productive hackfest, after the conference, where we worked on AV1 Rust decoder, HLS Rust demuxer, hardware decoder flag in playbin, and other stuff.

Linaro Connect

Phil attended the Linaro Connect conference in San Diego, USA. He delivered a talk about WPE/Multimedia which you can enjoy here:

Demuxed

Charlie attended Demuxed, in San Francisco. The conference is heavily focused on streaming and codec engineering and validation. Sadly there are not much interest in GStreamer, as the main focus is on FFmpeg.

RustFest

Phil and I attended the last RustFest in Barcelona. Basically we went to meet with the Rust community and we attended the “WebRTC with GStreamer-rs” workshop presented by Sebastian Dröge.

By vjaquez at March 16, 2020 03:20 PM

March 05, 2020

Release Notes for Safari Technology Preview 102

Surfin’ Safari

Safari Technology Preview Release 102 is now available for download for macOS Catalina and macOS Mojave. If you already have Safari Technology Preview installed, you can update in the Software Update pane of System Preferences on macOS.

This release covers WebKit revisions 256576-257162.

Web Inspector

  • Fixed VoiceOver to read the selected panel tab (r256652)
  • Updated resource, type, and timeline icons for both light and dark modes (r256774, r257043)

Web API

  • Changed the disk cache policy to allow resources larger than 10MB to be cached (r257041)
  • Defered execution of async scripts until the document is loaded (r256808)
  • Fixed value sanitization for input[type=text] to not truncate the value at a control character (r257132)
  • Fixed new FontFace() to not throw when failing to parse arguments (r256659)
  • Implemented EventTarget constructor (r256716)
  • Set User-Agent in preconnect requests (r256912)

IndexedDB

  • Improved the speed of index cursor iteration when there are a lot of index records from different object stores (r256738)
  • Changed to prefetch cursor records on client side (r256621)

Apple Pay

  • Added support for Apple Pay buttons with custom corner radii (r256648)

Web Animations

  • Ensured CSS Transition and CSS Animation events are queued, sorted and dispatched by their timeline (r256619)
  • Ensured animations that lose their effect don’t schedule an animation update (r256623)
  • Fixed repeated animations on pseudo elements failing to run after a while (r257138)
  • Fixed style changes due to Web Animations to not trigger CSS Transitions (r256627)

CSS

  • Improved performance of track sizing algorithm for spanning items (r256826)

Rendering

  • Changed to not fire timers when there is a pending rendering update (r256853)
  • Fixed a white flash that can occur if JavaScript forces an early layout (r256577)

Web Driver

  • Fixed Automation.setWindowFrameOfBrowsingContext to accept negative origin values (r257042)

March 05, 2020 06:15 PM

March 02, 2020

Wide Gamut Color in CSS with Display-P3

Surfin’ Safari

Display-P3 color space includes vivid colors that aren’t available in sRGB.

sRGB versus Display-P3

CSS Color Module Level 4 introduced syntax to use Display-P3 color space on the web:

color: color(display-p3 1 0.5 0)

The previously available syntax defined colors in sRGB color space. hsl(42, 70%, 50%), rgb(3, 5, 11), #abc — all of these colors are in the sRGB color space.

Display-P3 is a superset of sRGB. It’s around 35% larger:

sRGB outline

The white line shows the edge of sRGB. Everything on its top right is Display-P3 colors not available in sRGB. Note how greens is greatly expanded while blues aren’t nearly as much.

Browser support

WebKit has had support for Display-P3 color since 2016 (r207442). The following browsers support Display-P3 color:

  • Safari on macOS Mojave and newer
  • Safari on iOS 11 and newer

WebKit is the only browser engine that supports Display-P3 color as of January 2020.

Graceful degradation

One way to provide a fallback is to include the same property with the sRGB color before:

header {
    color: rgb(0, 255, 0);
    color: color(display-p3 0 1 0);
}

Browsers other than WebKit currently parse color(...) as invalid value. CSS properties with invalid values are ignored by the browsers.

Alternatively, you can use @supports feature query. This is particularly useful when defining variables with colors:

/* sRGB color. */
:root {
    --bright-green: rgb(0, 255, 0);
}

/* Display-P3 color, when supported. */
@supports (color: color(display-p3 1 1 1)) {
    :root {
        --bright-green: color(display-p3 0 1 0);
    }
}

header {
    color: var(--bright-green);
}

Hardware support

  • iPhone 7 and newer
  • MacBook Pro (since 2016)
  • iMac (since 2015)
  • iPad Pro (since 2016)
  • LG UltraFine 5K Display

There are also numerous devices that support Display-P3 color space but currently have no browsers that support Display-P3 in CSS:

  • Google Pixel 2 XL
  • Google Pixel 3
  • HTC U11+
  • OnePlus 6

More devices that support Display-P3 are listed on Wikipedia.

Hardware support can be detected with a media query in CSS:

@media (color-gamut: p3) {
    /* Do colorful stuff. */
}

And JavaScript:

if (window.matchMedia("(color-gamut: p3)").matches) {
    // Do colorful stuff.
}

Web Inspector

Starting Safari Technology Preview 97, Web Inspector includes P3-capable color picker:

The white line draws the edge sRGB color space. All colors on the top right of it are only available in Display-P3 color space.

Right-clicking the color square shows an option to convert to sRGB color space:

Clamp to sRGB

When the color is within sRGB color space, “Convert to sRGB” menu item is displayed. When it outside — “Clamp to sRGB”.

Web Inspector also includes context menus to convert sRGB colors to Display-P3:

Convert to Display-P3

Closing thoughts

CSS has syntax to define colors in Display-P3 color space, which includes vivid colors previously not available in sRGB. Many modern displays cover 100% of the P3 color standard. Web Inspector now includes P3-capable color picker.

You can start using Display-P3 colors on your websites and web views today. It only takes a couple of lines of code to provide a backward compatible sRGB color.

If you have any feedback, reach me out on Twitter. You can also send general comments to the @webkit Twitter account.

Further reading

Note: Learn more about Web Inspector from the Web Inspector Reference documentation.

March 02, 2020 06:36 PM

Žan Doberšek: Flatpak repository for WPE

Igalia WebKit

To let developers play with the WPE stack, we have set up a Flatpak repository containing all the necessary bits to start working with it. To install applications (like Cog, the very simple WPE launcher), first add the remote repository, and proceed with the following instructions:

$ flatpak --user remote-add wpe-releases --from https://software.igalia.com/flatpak-refs/wpe-releases.flatpakrepo
$ flatpak --user install org.wpe.Cog
$ flatpak run org.wpe.Cog -P fdo <url>

Currently the 2.26 release of the WPE port is used, along with libwpe 1.4.0, WPEBackend-fdo 1.4.0 and Cog 0.4.0. Upgrades to the newer releases (happening in next few weeks) will be done in the next month or two. Builds are provided for x86_64, arm and aarch64 architectures.

If you need ideas or inspiration on how to use WPE, this repository also contains GstWPEBroadcastDemo, an application that showcases both GStreamer and WPE, enabling you to mix live video input with HTML content that can be updated on-the-fly. You can read more about this in the blog post made by Philippe Normand.

The current Cog/WPE stack still imposes the Wayland-only limitation, with Mesa-based graphics stacks most likely to work well. In future release, we plan to add support for new platforms, graphics stacks and methods of integration.

All of this is still in very early stages. If you find an issue with the applications or libraries in the repository, please do not hesitate to report it to our issue tracker. The issues will be rerouted to the trackers of the problematic component if necessary.

By Žan Doberšek at March 02, 2020 09:00 AM

February 19, 2020

Release Notes for Safari Technology Preview 101

Surfin’ Safari

Safari Technology Preview Release 101 is now available for download for macOS Catalina and macOS Mojave. If you already have Safari Technology Preview installed, you can update in the Software Update pane of System Preferences on macOS.

This release covers WebKit revisions 255473-256576.

Web Inspector

  • Added a special breakpoint for controlling whether debugger statements pause in the Sources tab (r255887)
  • Changed to encode binary web socket frames using base64 (r256497)
  • Fixed elements closing tag showing reversed in RTL mode (r256374)
  • Fixed the bezier editor popover to be strictly LTR (r255886)
  • Fixed dragging handles in the easing popover selecting sidebar text (r255888)
  • Updated some cookie table column headers to not be localizable (r255896)

Media

  • Corrected TextTrack sorting with invalid BCP47 language (r255997)
  • Fixed AirPlay sometimes stopping after 60 minutes of playback (r255581)

Apple Pay

  • Redacted billing contact during payment method selection (r256071)

JavaScript

  • Added support for BigInt literal as PropertyName (r256541)

Web Animations

  • Fixed accelerated animations freezing on a render tree rebuild (r255663)
  • Fixed an event loop cycle between an animation finishing and it being removed from GraphicsLayerCA (r256181)
  • Fixed an issue where out-of-view transitions could trigger high memory use (r256095)
  • Prevented playing an accelerated animation that was canceled before it was committed (r255810)

WebAuthn

  • Changed authenticatorGetAssertion to be sent without pinAuth if user verification is discouraged (r256001)

WebRTC

  • Aligned getDisplayMedia() with standards specifications (r256034)
  • Fixed not processing newly gathered ICE candidates if the document is suspended (r256009)

CSS

  • Fixed CSS rules with the same selector from several large stylesheets getting applied in the wrong order (r255671)

Rendering

  • Fixed pages that trigger a redirect sometimes getting left blank (r256452)

Web API

  • Disallowed setting base URL to a data or JavaScript URL (r256191)
  • Fixed highlight text decorations to work with all decoration types and colors (r256451)
  • Implemented OffscreenCanvas.copiedImage (r256505)
  • Added standard gamepad mapping for GameControllerGamepads (r256215)
  • Tightened up stylesheet loading (r255693)
  • Fixed quantifiers after lookahead assertions to be syntax errors in Unicode patterns only (r255689)
  • Fixed \0 identity escapes to be syntax errors in Unicode patterns only (r255584)

IndexedDB

  • Fixed iteration of cursors skipping records if deleted (r256414)

Back-forward Cache

  • Updated to remember if legacy TLS was used in the back-forward cache (r256073)

February 19, 2020 09:10 PM

February 05, 2020

Release Notes for Safari Technology Preview 💯

Surfin’ Safari

Safari Technology Preview Release 100 is now available for download for macOS Catalina and macOS Mojave. If you already have Safari Technology Preview installed, you can update in the Software Update pane of System Preferences on macOS.

This release covers WebKit revisions 254696-255473.

Web Inspector

  • Added links to Web Inspector Reference documentation (r254730)
  • Renamed the Canvas Tab to be the Graphics Tab, and included basic information and graphical representations of all Web Animation objects that exist in the inspected page (r255396)
  • Allowed developers to evaluate arbitrary JavaScript in isolated worlds created by Safari App Extensions via the execution context picker in the Console (r255191)

Web Animations

  • Added support for the options parameter to getAnimations() (r255149)
  • Changed animations to run accelerated even if other animations targeting the same element are not accelerated (r255383)
  • Fixed changing the delay of an accelerated animation to correctly seek (r255422)
  • Fixed a leak of CSS Animations when removing its animation-name property (r255371)
  • Separated setting a timeline’s current time from updating its animations (r255260)
  • Updated all DocumentTimeline objects when updating animations (r255141)

WebAuthn

  • Fixed User Verification (UV) option present on a CTAP2 authenticatorMakeCredential while the authenticator has not advertised support for it (r254710)

Media

  • Added support for allow="fullscreen" feature policy (r255162)
  • Changed EME to only emit an array of persistent-usage-records when more than one record is discovered (r254896)
  • Corrected VTT Cue Style handling to match the specification (r255151, r255227)
  • Fixed decoder glitches when watching videos on CNN.com (r254761)
  • Fixed AirPlay placard not visible when AirPlay is entered in fullscreen mode (r255103)
  • Fixed video sound sometimes continuing to play in page cache (r254814)
  • Fixed HTMLMediaElement to not remove the media session at DOM suspension time (r255116)

Web API

  • Added finite timeout when synchronously terminating a service worker (r254706)
  • Fixed :matches() to correctly combine with pseudo elements (r255059)
  • Fixed automatic link replacement via “Smart links” to emit insertLink input events (r254945)
  • Disabled Service Workers before terminating an unresponsive service worker process (r255438)
  • Implemented “create a potential-CORS request” (r254821)
  • Implemented transferable property of OffscreenCanvas (r255315)
  • Improved performance speed of index records deletion in IndexedDB (r255318)
  • Made pasteboard markup sanitization more robust (r254800)
  • Used Visible Position to calculate Positions for highlights (r254785)

CSS

  • Fixed EXIF orientation ignored for some CSS images (r254841)
  • Fixed elements no longer stay fixed with elastic overscroll (r255037)

WebRTC

  • Added support for MediaRecorder.requestData (r255085)

JavaScript

  • Fixed DateMath to accept more ISO-8601 timezone designators even if they are not included in ECMA262 to produce expected results in the wild code (r254939)

WebGL2

  • Implemented sub-source texImage2D and texSubImage2D (r255316)

February 05, 2020 09:10 PM

January 24, 2020

ResizeObserver in WebKit

Surfin’ Safari

For years now, web developers have desired the ability to design components that are responsive to their container instead of the viewport. Developers are used to using media queries against viewport width for responsive designs, but having media queries based on element sizes is not possible in CSS because it could result in circular dependencies. Thus, a JavaScript solution was required.

ResizeObserver was introduced to solve this problem, allowing authors to observe changes to the layout size of elements. It was first made available in Chrome 64 in January 2018, and it’s now in Safari Technology Preview releases (and Epiphany Technology Preview). ResizeObserver was enabled by default as of Safari Technology Preview 97.

API Overview

A script creates a ResizeObserver with a callback which will be called with ‘observations’, and registers/unregisters callbacks using .observe(element), and .unobserve(element). Each call to observe(element) adds that element to the set of elements observed by this ResizeObserver instance.

The callback provided to the constructor is called with a collection of observerEntries which contain data about the state of CSS boxes being observed, if those boxes actually changed size. The observer itself also has a .disconnect() method which stops the active delivery of observed changes to the callback. Here’s a simple example:

const callback = (entries) => {
  console.log(`${entries.length} resize observations happened`)
  Array.from(entries).forEach((entry) => {
    let rect = entry.contentRect;
    console.log(
      entry.target,
      `size is now ${rect.width}w x ${rect.height}h`
    )
  })
}

const myObserver = new ResizeObserver(callback)

myObserver.observe(targetElementA)
myObserver.observe(targetElementB)

What we are observing with ResizeObserver is changes to the size of CSS Boxes that we have observed. Since we previously had no information on these boxes before observing, and now we do, this creates an observable effect. Assuming that targetElementA and targetElementB are in the DOM, we will see a log saying that 2 resize observations happened, and providing some information about the elements and sizes of each. It will look something like:

"2 resize observations happened"
"<div class='a'>a</div>" "size is now 1385w x 27h"
"<div class='b'>b</div>" "size is now 1385w x 27h"

Similarly, this means that while it is not an error to observe an element that isn’t in the DOM tree, no observations will occur until a box is actually laid out (when it is inserted, and creates a box). Removing an observed element from the DOM tree (which wasn’t hidden) also causes an observation.

How Observations are Delivered

ResizeObserver strictly specifies when and how things happen and attempts to ensure that calculation and observation always happen “downward” in the tree, and to help authors avoid circularity. Here’s how that happens:

  1. Boxes are created.
  2. Layout happens.
  3. The browser starts a rendering update, and runs the steps up to and including the Intersection Observer steps.
  4. The system gathers and compares the box sizes of observed element with their previously recorded size.
  5. ResizeObserver callback is called passing ResizeObserverEntry objects containing information about the new sizes.
  6. If any changes are incurred during the callback, then layout happens again, but here, the system finds the shallowest at which depth a change occurred (measured in simple node depth from the root). Any changes that are related to something deeper down in the tree are delivered at once, while any that are not are queued up and delivered in the next frame, and an error message will be sent to the Web Inspector console: (ResizeObserver loop completed with undelivered notifications).
  7. Subsequent steps in the rendering updates are executed (i.e. painting happens).

Note

In Safari Technology Preview, entries contain a .contentRect property reflecting the size of the Content Box. After early feedback, the spec is being iterated on in backward compatible ways which will also provide a way to get the measure of the Border Box. Future versions of this API will also allow an optional second argument to .observe which allows you to specify which boxes (Content or Border) you want to receive information about.

Useful Example

Suppose that we have a component containing an author’s profile. It might be used on devices with many sized screens, and in many layout contexts. It might even be provided for reuse as a custom element somehow. Further, these sizes can change at runtime for any number of reasons:

  • On a desktop, the user resizes their window
  • On a mobile device, the user changes their orientation
  • A new element comes into being, or is removed from the DOM tree causing a re-layout
  • Some other element in the DOM changes size for any reason (some elements are even user resizable)

Depending on the amount of space available to us at any given point in time, we’d like to apply some different CSS—laying things out differently, changing some font sizes, perhaps even using different colors.

For this, let’s assume that we follow a ‘responsive first’ philosophy and make our initial design for the smallest screen size. As available space gets bigger, we have another design that should take effect when there are 768px available, and still another when there are at least 1024px. We’ll make these designs with our page using classes “.container-medium” and “.container-large”. Now all we have to do is add or remove those classes automatically.

/* Tell the observer how to manage the attributes */
const callback = (entries) => {
  entries.forEach((entry) => {
    let w = entry.contentRect.width
    let container = entry.target

    // clear out any old ones
    container.classList.remove('container-medium', 'container-large')

    // add one if a 'breakpoint' is true
    if (w > 1024) {
      container.classList.add('container-large')
    } else if (w > 768) {
      container.classList.add('container-medium')
    }
  }) 
}

/* Create the instance **/
const myObserver = new ResizeObserver(callback)

/* Find the elements to observe */
const profileEls = [...document.querySelectorAll('.profile')]

/* .observe each **/
profileEls.forEach(el => myObserver.observe(el))

Now, each .profile element will gain the class of .container-medium or .container-large if their available size meets our specified criteria, and our designs will always be appropriately applied based on their available size. You can, of course, combine this with a MutationObserver or as a Custom Element in order to account for elements which might come into existence later.

Feedback

We’re excited to have ResizeObserver available in Safari Technology Preview! Please try it out and file bugs for any issues you run into.

January 24, 2020 06:00 PM

January 22, 2020

Release Notes for Safari Technology Preview 99

Surfin’ Safari

Safari Technology Preview Release 99 is now available for download for macOS Catalina and macOS Mojave. If you already have Safari Technology Preview installed, you can update in the Software Update pane of System Preferences on macOS.

This release covers WebKit revisions 253789-254696.

Legacy Plug-Ins

  • Removed support for Adobe Flash

Web Inspector

  • Elements
    • Enabled the P3 color picker (r253802)
    • Added RGBA input fields for the P3 color picker (r254243)
    • Added support for manipulating the value with the arrow keys in the color picker (r254094)
    • Added color() suggestion when editing a CSS property that accepts color values (r254316)
  • Sources
    • Allowed editing of style sheets injected by Safari App Extensions (r254186)
  • Console
    • Ensured that the clear button is always visible, even at smaller widths (r253800)

Web API

  • Added support for using valid non-zero width and height attributes to become the default aspect ratio of <img> (r254669)
  • Added a check to ensure Service Workers terminate after a period of time when thread blocking (r253898)
  • Aligned Range.intersectsNode() with the DOM specification (r254018)
  • Changed <iframe> attributes to be processed on srcdoc attribute removal (r254498)
  • Changed <img>.naturalWidth to return the density-corrected intrinsic width (r254229)
  • Changed <link> with non-CSS type to not be retrieved (r253992)
  • Changed Object.keys to throw if called on a module namespace object with uninitialized binding (r254390)
  • Changed Object.preventExtensions to throw if not successful (r254626)
  • Changed Document.createAttribute() to take in a localName, not a qualifiedName (r254021)
  • Changed the supported MIME types for image encoding to be supported image MIME types (r254541)
  • Denied Notification API access for non-secure contexts (r253899)
  • Fixed dispatchEvent() to not clear the event’s isTrusted flag when it returns early (r254016)
  • Fixed String.prototype.replace() incorrectly handling named references on regular expressions without named groups (r254088)
  • Fixed URL parser in Fetch not always using UTF-8 (r254672)
  • Fixed encoding entities correctly in <style> element during XML serialization of text (r253988)
  • Removed the low priority resource load for sendBeacon to reduce failure rates (r253847)
  • Updated Fetch to Handle empty Location value (r253814)

Cookies

  • Fixed document.cookie to not do a sync IPC to the network process for iframes that do not have storage access (r254556)

CSS

  • Added support for image-set() standard syntax (r254406)
  • Added support for rendering highlights specified in CSS Highlight API (r253857)
  • Implemented a network error when fetching a linked stylesheet resource fails (r254043)
  • Improved performance by invalidating only affected elements after media query evaluation changes (r253875)
  • Fixed rejected changes between similar unprefixed and prefixed gradient syntax (r254164)
  • Excluded implicit CSS grid tracks from the resolved value (r254561)

Media

  • Enabled HDR Media Capabilities by default (r253853)
  • Fixed specification violation in Font Loading API (r254220)
  • Ignored URL host for schemes that are not using host information (r253946)
  • Implemented “create a potential-CORS request” (r254000)
  • Implemented transceiver setCodecPreferences (r253966)
  • Made text track loading set same-origin fallback flag (r254031)
  • Fixed MediaKeySession.load() failing (r253852)

WebRTC

  • Removed the certificate info checks related to getUserMedia (r253827)

Payment Request

  • Converted the payment method data IDL in the PaymentRequest constructor (r253986)

Web Animations

  • Stopped creating CSS Animations for <noscript> elements (r254201)

JavaScript

  • Fixed invalid date parsing for ISO 8601 strings when no timezone given (r254038)
  • Fixed RegExp.prototype[Symbol.replace] to support named capture groups (r254195)

Web Share API

  • Added support for a user gesture to allow using the Web Share API even when preceded by an XHR call (r254178)

WebDriver

  • Reimplemented the “Execute Async Script” command with Promises to match the specification (r254329)
  • Fixed handling of session timeouts for values higher than MAX_INT (r253883)
  • Fixed scripts being executed in the wrong page context after a history navigation (r254328)

IndexedDB

  • Improved performance by removing the timer for pending operations in IDBTransaction (r253807)

January 22, 2020 06:00 PM

January 16, 2020

Paulo Matos: Cross-Arch Reproducibility using Containers

Igalia WebKit

I present the use of containers for cross architecture reproducibility using docker and podman, which I then go on to apply to JSC. If you are trying to understand how to create cross-arch reproducible environments for your software, this might help you!

More…

By Paulo Matos at January 16, 2020 04:00 PM

January 08, 2020

Release Notes for Safari Technology Preview 98

Surfin’ Safari

Safari Technology Preview Release 98 is now available for download for macOS Catalina and macOS Mojave. If you already have Safari Technology Preview installed, you can update in the Software Update pane of System Preferences on macOS.

This release covers WebKit revisions 252823-253789.

Web Inspector

  • Elements
    • Removed the “Show/Hide Shadow DOM” navigation item (r253706)
    • Restricted showing paint flashing and compositing borders to the Web Inspector session (r253739)
    • Ensure that a bezier swatch is shown for CSS timing function keywords (r253758)
    • Fixed hovering over an invalid value while holding ⌘ to change the color of the text (r253405)
    • Fixed the Classes input to not display on top of other content (r253167)
  • Network
    • Fixed pressing ⌘F when no network item is selected to focus the filter bar (r253160)
  • Sources
    • Fixed non-regex local overrides to not apply to resources that only contain the URL instead of completely matching the URL (r253246)
  • Storage
    • Added support for filtering IndexedDB stores and indexes (r253161)
  • Audit
    • Fixed selected item before entering edit mode not being reselected after exiting edit mode (r253759)
    • Fixed importing a result with DOM nodes that don’t match the inspected page appearing as empty lines (r253757)
  • Console
    • Ensure copying an evaluation result does not include the saved variable index (r253169)
  • Search
    • Added basic “No Search Results” text with a clickable help navigation item that reveals and focuses the navigation sidebar search input when there is no active search (r253165)

Web Animations

  • Enabled Web Animations CSS Integration, a new implementation of CSS Animations and CSS Transitions, by default (r252945)
  • Fixed layout of element children with forwards-filling opacity animation that can be incorrect after removal (r252879)
  • Implemented Animation.commitStyles() (r252966)

Media

  • Enabled the Generic Text Track Cue API (r253695)

Rendering

  • Ensured transparency layers are properly ended when only painting root background (r253692)
  • Fixed an issue where elements could jump to the wrong position after some compositing-related style changes (r252935)

Web API

  • Implemented OffscreenCanvas.convertToBlob (r253474)
  • Changed setting toString or valueOf on a cross-origin Location object to throw a SecurityError (r253418)
  • Fixed an incorrect association of the URL object with the port value (r252998)
  • Prevented synchronous XHR in beforeunload and unload event handlers (r253213)

CSS

  • Changed to not perform range checking for calc() at parse time (r252983)
  • Changed media queries in img sizes attribute to evaluate dynamically (r252828)
  • Implemented the clamp() function (r253105)
  • Improved computed values of calc() functions to match the specification (r253079)

JavaScript

  • Changed Object.prototype.isPrototypeOf() to check if the passed in value is a non-object first (r253264)

WebRTC

  • Added protection for WebRTC network monitoring to wait forever in edge cases (r253203)
  • Fixed audio elements that resumed playback after getUserMedia (r253742)

Clipboard API

  • Added sanitization for HTML and image data written using clipboard.write (r253486)

Browser Changes

  • Changed to issue the load sooner on swipe back/forward navigation (r253360)
  • Re-disabled TLS 1.0 and TLS 1.1 by default (r253292)

WebAssembly

  • Changed to validate and generate bytecode in a single pass (r253140)

January 08, 2020 09:15 PM

Angelos Oikonomopoulos: A Dive Into JavaScriptCore

Igalia WebKit

Recently, the compiler team at Igalia was discussing the available resources for the WebKit project, both for the purpose of onboarding new Igalians and for lowering the bar for third-party contributors. As compiler people, we are mainly concerned with JavaScriptCore (JSC), WebKit’s javascript engine implementation. There are many high quality blog posts on the webkit blog that describe various phases in the evolution of JSC, but finding one’s bearings in the actual source can be a daunting task.

The aim of this post is twofold: first, document some aspects of JavaScriptCore at the source level; second, show how one can figure out what a piece of code actually does in a large and complex source base (which JSC’s certainly is).

In medias res

As an exercise, we’re going to arbitrarily use a commit I had open in a web browser tab. Specifically, we will be looking at this snippet:

Operands<Optional<JSValue>> mustHandleValues(codeBlock->numParameters(), numVarsWithValues);
int localsUsedForCalleeSaves = static_cast<int>(CodeBlock::llintBaselineCalleeSaveSpaceAsVirtualRegisters());
for (size_t i = 0; i < mustHandleValues.size(); ++i) {
    int operand = mustHandleValues.operandForIndex(i);
    if (operandIsLocal(operand) && VirtualRegister(operand).toLocal() < localsUsedForCalleeSaves)
	continue;
    mustHandleValues[i] = callFrame->uncheckedR(operand).jsValue();
}

This seems like a good starting point for taking a dive into the low-level details of JSC internals. Virtual registers look like a concept that’s good to know about. And what are those “locals used for callee saves” anyway? How do locals differ from vars? What are “vars with values”? Let’s find out!

Backstory

Recall that JSC is a multi-tiered execution engine. Most Javascript code is only executed once; compiling takes longer than simply interpreting the code, so Javascript code is always interpreted the first time through. If it turns out that a piece of code is executed frequently though1, compiling it becomes a more attractive proposition.

Initially, the tier up happens to the baseline JIT, a simple and fast non-optimizing compiler that produces native code for a Javascript function. If the code continues to see much use, it will be recompiled with DFG, an optimizing compiler that is geared towards low compilation times and decent performance of the produced native code. Eventually, the code might end up being compiled with the FTL backend too, but the upper tiers won’t be making an appearence in our story here.

What do tier up and tier down mean? In short, tier up is when code execution switches to a more optimized version, whereas tier down is the reverse operation. So the code might tier up from the interpreter to the baseline JIT, but later tier down (under conditions we’ll briefly touch on later) back to the baseline JIT. You can read a more extensive overview here.

Diving in

With this context now in place, we can revisit the snippet above. The code is part of operationOptimize. Just looking at the two sites it’s referenced in, we can see that it’s only ever used if the DFG_JIT option is enabled. This is where the baseline JIT ➞ DFG tier up happens!

The sites that make use of operationOptimize both run during the generation of native code by the baseline JIT. The first one runs in response to the op_enter bytecode opcode, i.e. the opcode that marks entry to the function. The second one runs when encountering an op_loop_hint opcode (an opcode that only appears at the beginning of a basic block marking the entry to a loop). Those are the two kinds of program points at which execution might tier up to the DFG.

Notice that calls to operationOptimize only occur during execution of the native code produced by the baseline JIT. In fact, if you look at the emitted code surrounding the call to operationOptimize for the function entry case, you’ll see that the call is conditional and only happens if the function has been executed enough times that it’s worth making a C++ call to consider it for optimization.

The function accepts two arguments: a vmPointer which is, umm, a pointer to a VM structure (i.e. the “state of the world” as far as this function is concerned) and the bytecodeIndex. Remember that the bytecode is the intermediate representation (IR) that all higher tiers start compiling from. In operationOptimize, the bytecodeIndex is used for

Again, the bytecodeIndex is a parameter that has already been set in stone during generation of the native code by the baseline JIT.

The other parameter, the VM, is used in a number of things. The part that’s relevant to the snippet we started out to understand is that the VM is (sometimes) used to give us access to the current CallFrame. CallFrame inherits from Register, which is a thin wrapper around a (maximally) 64-bit value.

The CodeBlock

In this case, the various accessors defined by CallFrame effectively treat the (pointer) value that CallFrame consists of as a pointer to an array of Register values. Specifically, a set of constant expressions

struct CallFrameSlot {
    static constexpr int codeBlock = CallerFrameAndPC::sizeInRegisters;
    static constexpr int callee = codeBlock + 1;
    static constexpr int argumentCount = callee + 1;
    static constexpr int thisArgument = argumentCount + 1;
    static constexpr int firstArgument = thisArgument + 1;
};

give the offset (relative to the callframe) of the pointer to the codeblock, the callee, the argument count and the this pointer. Note that the first CallFrameSlot is the CallerFrameAndPC, i.e. a pointer to the CallFrame of the caller and the returnPC.

The CodeBlock is definitely something we’ll need to understand better, as it appears in our motivational code snippet. However, it’s a large class that is intertwined with a number of other interesting code paths. For the purposes of this discussion, we need to know that it

  • is associated with a code block (i.e. a function, eval, program or module code block)
  • holds data relevant to tier up/down decisions and operations for the associated code block

We’ll focus on three of its data members:

int m_numCalleeLocals;
int m_numVars;
int m_numParameters;

So, it seems that a CodeBlock can have at least some parameters (makes sense, right?) but also has both variables and callee locals.

First things first: what’s the difference between callee locals and vars? Well, it turns out that m_numCalleeLocals is only incremented in BytecodeGeneratorBase<Traits>::newRegister whereas m_numVars is only incremented in BytecodeGeneratorBase<Traits>::addVar(). Except, addVar calls into newRegister, so vars are a subset of callee locals (and therefore m_numVarsm_numCalleelocals).

Somewhat surprisingly, newRegister is only called in 3 places:

So there you have it. Callee locals

  1. are allocated by a function called newRegister
  2. are either a var or a temporary.

Let’s start with the second point. What is a var? Well, let’s look at where vars are created (via addVar):

There is definitely a var for every lexical variable (VarKind::Stack), i.e. a non-local variable accessible from the current scope. Vars are also generated (via BytecodeGenerator::createVariable) for

So, intuitively, vars are allocated more or less for “every JS construct that could be called a variable”. Conversely, temporaries are storage locations that have been allocated as part of bytecode generation (i.e. there is no corresponding storage location in the JS source). They can store intermediate calculation results and what not.

Coming back to the first point regarding callee locals, how come they’re allocated by a function called newRegister? Why, because JSC’s bytecode operates on a register VM! The RegisterID returned by newRegister wraps the VirtualRegister that our register VM is all about.

Virtual registers, locals and arguments, oh my!

A virtual register (of type VirtualRegister) consists simply of an int (which is also called its offset). Each virtual register corresponds to one of

There is no differentiation between locals and arguments at the type level (everything is a (positive) int); However, virtual registers that map to locals are negative and those that map to arguments are nonnegative. In the context of bytecode generation, the int

It feels like JSC is underusing C++ here.

In all cases, what we get after indexing with a local, argument or constant is a RegisterID. As explained, the RegisterID wraps a VirtualRegister. Why do we need this indirection?

Well, there are two extra bits of info in the RegisterID. The m_refcount and an m_isTemporary flag. The reference count is always greater than zero for a variable, but the rules under which a RegisterID is ref’d and unref’d are too complicated to go into here.

When you have an argument, you get the VirtualRegister for it by directly adding it to CallFrame::thisArgumentoffset.

When you have a local, you map it to (-1 - local) to get the corresponding Virtualregister. So

local vreg
0 -1
1 -2
2 -3

(remember, virtual registers that correspond to locals are negative).

For an argument, you map it to (arg + CallFrame::thisArgumentOffset()):

argument vreg
0 this
1 this + 1
2 this + 2

Which makes all the sense in the world when you remember what the CallFrameSlot looks like. So argument 0 is always the `this` pointer.

If the vreg is greater than some large offset (s_firstConstantRegisterIndex), then it is an index into the CodeBlock's constant pool (after subtracting the offset).

Bytecode operands

If you’ve followed any of the links to the functions doing the actual mapping of locals and arguments to a virtual register, you may have noticed that the functions are called localToOperand and argumentToOperand. Yet they’re only ever used in virtualRegisterForLocal and virtualRegisterForArgument respectively. This raises the obvious question: what are those virtual registers operands of?

Well, of the bytecode instructions in our register VM of course. Instead of recreating the pictures, I’ll simply encourage you to take a look at a recent blog post describing it at a high level.

How do we know that’s what “operand” refers to? Well, let’s look at a use of virtualRegisterForLocal in the bytecode generator. BytecodeGenerator::createVariable will allocate2 the next available local index (using the size of m_calleeLocals to keep track of it). This calls into virtualRegisterForLocal, which maps the local to a virtual register by calling localToOperand.

The newly allocated local is inserted into the function symbol table, along with its offset (i.e. the ID of the virtual register).

The SymbolTableEntry is looked up when we generate bytecode for a variable reference. A variable reference is represented by a ResolveNode3.

So looking into ResolveNode::emitBytecode, we dive into BytecodeGenerator::variable and there’s our symbolTable->get() call. And then the symbolTableEntry is passed to BytecodeGenerator::variableForLocalEntry which uses entry.varOffset() to initialize the returned Variable with offset. It also uses registerFor to retrieve the RegisterID from m_calleeLocals.

ResolveNode::emitBytecode will then pass the local RegisterID to move which calls into emitMove, which just calls OpMov::emit (a function generated by the JavaScriptCore/generator code). Note that the compiler implicitly converts the RegisterID arguments to VirtualRegister type at this step. Eventually, we end up in the (generated) function

template<OpcodeSize __size, bool recordOpcode, typename BytecodeGenerator>
static bool emitImpl(BytecodeGenerator* gen, VirtualRegister dst, VirtualRegister src)
{
    if (__size == OpcodeSize::Wide16)
	gen->alignWideOpcode16();
    else if (__size == OpcodeSize::Wide32)
	gen->alignWideOpcode32();
    if (checkImpl<__size>(gen, dst, src)) {
	if (recordOpcode)
	    gen->recordOpcode(opcodeID);
	if (__size == OpcodeSize::Wide16)
	    gen->write(Fits<OpcodeID, OpcodeSize::Narrow>::convert(op_wide16));
	else if (__size == OpcodeSize::Wide32)
	    gen->write(Fits<OpcodeID, OpcodeSize::Narrow>::convert(op_wide32));
	gen->write(Fits<OpcodeID, __size>::convert(opcodeID));
	gen->write(Fits<VirtualRegister, __size>::convert(dst));
	gen->write(Fits<VirtualRegister, __size>::convert(src));
	return true;
    }
    return false;
}

where Fits::convert(VirtualRegister) will trivially encode the VirtualRegister into the target type. Specifically the mapping is nicely summed up in the following comment

// Narrow:
// -128..-1  local variables
//    0..15  arguments
//   16..127 constants
//
// Wide16:
// -2**15..-1  local variables
//      0..64  arguments
//     64..2**15-1 constants

You may have noticed that the Variable returned by BytecodeGenerator::variableForLocalEntry already has been initialized with the virtual register offset we set when inserting the SymbolTableEntry for the local variable. And yet we use registerFor to look up the RegisterID for the local and then use the offset of the VirtualRegister contained therein. Surely those are the same? Oh well, something for a runtime assert to check.

Variables with values

Whew! Quite the detour there. Time to get back to our original snippet:

Operands<Optional<JSValue>> mustHandleValues(codeBlock->numParameters(), numVarsWithValues);
int localsUsedForCalleeSaves = static_cast<int>(CodeBlock::llintBaselineCalleeSaveSpaceAsVirtualRegisters());
for (size_t i = 0; i < mustHandleValues.size(); ++i) {
    int operand = mustHandleValues.operandForIndex(i);
    if (operandIsLocal(operand) && VirtualRegister(operand).toLocal() < localsUsedForCalleeSaves)
	continue;
    mustHandleValues[i] = callFrame->uncheckedR(operand).jsValue();
}

What are those numVarsWithValues then? Well, the definition is right before our snippet:

unsigned numVarsWithValues;
if (bytecodeIndex)
    numVarsWithValues = codeBlock->numCalleeLocals();
else
    numVarsWithValues = 0;

OK, so this looks straighforward for a change. If the bytecodeIndex is not zero, we’re doing the tier up from JIT to DFG in the body of a function (i.e. at a loop entry). In that case, we consider all our callee locals to have values. Conversely, when we’re running for the function entry (i.e. bytecodeIndex == 0), none of the callee locals are live yet. Do note that the variable is incorrectly named. Vars are not the same as callee locals; we’re dealing with the latter here.

A second gotcha is that, whereas vars are always live, temporaries might not be. The DFG compiler will do liveness analysis at compile time to make sure it’s only looking at live values. That must have been a fun bug to track down!

Values that must be handled

Back to our snippet, numVarsWithValues is used as an argument to the constructor of mustHandleValues which is of type Operands<Optional<JSValue>>. Right, so what are the Operands? They simply hold a number of T objects (here T is Optional<JSValue>) of which the first m_numArguments correspond to, well, arguments whereas the remaining correspond to locals.

What we’re doing here is recording all the live (non-heap, obviously) values when we try to do the tier up. The idea is to be able to mix those values in with the previously observed values that DFG’s Control Flow Analysis will use to emit code which will bail us out of the optimized version (i.e. do a tier down). According to the comments and commit logs, this is in order to increase the chances of a successful OSR entry (tier up), even if the resulting optimized code may be slightly less conservative.

Remember that the optimized code that we tier up to makes assumptions with regard to the types of the incoming values (based on what we’ve observed when executing at lower tiers) and wil bail out if those assumptions are not met. Taking the values of the current execution at the time of the tier up attempt ensures we won’t be doing all this work only to immediately have to tier down again.

Operands provides an operandForIndex method which will directly give you a virtual reg for every kind of element. For example, if you had called Operands<T> opnds(2, 1), then the first iteration of the loop would give you

operandForIndex(0)
-> VirtualRegisterForargument(0).offset()
  -> VirtualRegister(argumentToOperand(0)).offset()
    -> VirtualRegister(CallFrame::thisArgumentOffset).offset()
      -> CallFrame::thisArgumentOffset

The second iteration would similarly give you CallFrame::thisArgumentOffset + 1.

In the third iteration, we’re now dealing with a local, so we’d get

operandForIndex(2)
-> virtualRegisterForLocal(2 - 2).offset()
  -> VirtualRegister(localToOperand(0)).offset()
    -> VirtualRegister(-1).offset()
      -> -1

Callee save space as virtual registers

So, finally, what is our snippet doing here? It’s iterating over the values that are likely to be live at this program point and storing them in mustHandleValues. It will first iterate over the arguments (if any) and then over the locals. However, it will use the “operand” (remember, everything is an int…) to get the index of the respective local and then skip the first locals up to localsUsedForCalleeSaves. So, in fact, even though we allocated space for (arguments + callee locals), we skip some slots and only store (arguments + callee locals - localsUsedForCalleeSaves). This is OK, as the Optional<JSValue> values in the Operands will have been initialized by the default constructor of Optional<> which gives us an object without a value (i.e. an object that will later be ignored).

Here, callee-saved register (csr) refers to a register that is available for use to the LLInt and/or the baseline JIT. This is described a bit in LowLevelInterpreter.asm, but is more apparent when one looks at what csr sets are used on each platform (or, in C++).

platform metadataTable PC-base (PB) numberTag notCellMask
X86_64 csr1 csr2 csr3 csr4
x86_64_win csr3 csr4 csr5 csr6
ARM64~/~ARM64E csr6 csr7 csr8 csr9
C_LOOP 64b csr0 csr1 csr2 csr3
C_LOOP 32b csr3 - - -
ARMv7 csr0 - - -
MIPS csr0 - - -
X86 - - - -

On 64-bit platforms, offlineasm (JSC’s portable assembler) makes a range of callee-saved registers available to .asm files. Those are properly saved and restored. For example, for X86_64 on non-Windows platforms, the returned RegisterSet contains registers r12-r15 (inclusive), i.e. the callee-saved registers as defined in the System V AMD64 ABI. The mapping from symbolic names to architecture registers can be found in GPRInfo.

On 32-bit platforms, the assembler doesn’t make any csr regs available, so there’s nothing to save except if the platform makes special use of some register (like C_LOOP does for the metadataTable 4).

What are the numberTag and notCellMask registers? Out of scope, that’s what they are!

Conclusion

Well, that wraps it up. Hopefully now you have a better understanding of what the original snippet does. In the process, we learned about a few concepts by reading through the source and, importantly, we added lots of links to JSC’s source code. This way, not only can you check that the textual explanations are still valid when you read this blog post, you can use the links as spring boards for further source code exploration to your heart’s delight!

Footnotes

1 Both the interpreter – better known as LLInt – and the baseline JIT keep track of execution statistics, so that JSC can make informed decisions on when to tier up.

2 Remarkably, no RegisterID has been allocated at this point – we used the size of m_calleeLocals but never modified it. Instead, later in the function (after adding the new local to the symbol table!) the code will call addVar which will allocate a new “anonymous” local. But then the code asserts that the index of the newly allocated local (i.e. the offset of the virtual register it contains) is the same as the offset we previously used to create the virtual register, so it’s all good.

3 How did we know to look for the ResolveNode? Well, the emitBytecode method needs to be implemented by subclasses of ExpressionNode. If we look at how a simple binary expression is parsed (and given that ASTBuilder defines BinaryOperand as std::pair<ExpressionNode*, BinaryOpInfo>), it’s clear that any variable reference has already been lifted to an ExpressionNode.

So instead, we take the bottom up approach. We find the lexer/parser token definitions, one of which is the IDENT token. Then it’s simply a matter of going over its uses in Parser.cpp, until we find our smoking gun. This gets us into createResolve aaaaand

return new (m_parserArena) ResolveNode(location, ident, start);

That’s the node we’re looking for!

4 C_LOOP is a special backend for JSC’s portable assembler. What is special about it is that it generates C++ code, so that it can be used on otherwise unsupported architectures. Remember that the portable assembler (offlineasm) runs at compilation time.

January 08, 2020 12:00 PM

Angelos Oikonomopoulos: A Dive Into JavaScriptCore

Igalia WebKit

This post is an attempt to both document some aspects of JSC at the source level and to show how one can figure out what a piece of code actually does in a source base as large and complex as JSC's.

January 08, 2020 12:00 PM

December 20, 2019

New WebKit Features in Safari 13

Surfin’ Safari

This year’s releases of Safari 13 for macOS Catalina, iPadOS, iOS 13, and watchOS 6 include a tremendous number of WebKit improvements for the web across Apple’s platforms. Of particular note is the number of features that enhance website compatibility to bring a true desktop-class web browsing experience to Safari on iPadOS. This release is also packed with updates for improved privacy, performance, and a host of new tools for web developers.

Here’s a quick look at the new WebKit enhancements available with these releases.

Desktop-class Browsing on iPad

WebKit provides the heart of this new experience with deep, fundamental changes that deliver a great desktop website experience on a touch device. With the exception of iPad mini, Safari on iPad will now send a user-agent string that is identical to Safari on macOS. Beyond just a user-agent change, WebKit added new support for web standards to provide the needed compatibility and quality. That included adding new support for Pointer Events, the Visual Viewport API, and programmatic paste. You can read more details about support for those standards in the sections below. In addition, WebKit added Media Source Extensions support in Safari on iPadOS to improve compatibility with the desktop-variants of streaming video websites.

Beyond foundational new web standards support in WebKit, there are many other refinements for the desktop browsing experience on iPad. Page scaling behaviors have been fine-tuned to prevent horizontal scrolling on wide webpages with responsive design viewport tags. When a webpage is scaled down to fit entirely within the viewport, WebKit will increase the font size of content to ensure text is comfortably legible. WebKit added support for automatic Fast Tap on links and buttons to help make navigating web content feel more responsive. Improved hardware keyboard support adds the ability to scroll with the arrow keys and perform focus navigation. Find on page now works like Safari on desktop, highlighting all of the matching terms on the page with a special highlight for the current selection. The behavior of editing callout menus for text selections was polished to avoid overlapping in page controls provided by many document editing web applications. Last but not least, Safari includes support for background downloads, as well as background file uploads.

This new experience on iPad means significant changes for web developers to consider for their web technology projects. Safari on iPad is a chameleon; it can respond to servers as either a desktop device or a mobile device under different circumstances. Most of the time, Safari on iPad presents a macOS user-agent when loading a webpage. If Safari is moved into a one-third size when multitasking the desktop site will be scaled to fit the one-third size without reloading and losing your place. But loading or reloading a webpage while Safari is in one-third size will provide an iOS user-agent since the mobile layout is better suited to the smaller viewport.

Now more than ever before, web developers need to take great care in providing a single responsive web design that uses feature detection instead of relying on separate desktop and mobile sites dependent on the user-agent. Developers should be sure to test their desktop website experience on an iPad to ensure it works well for users.

Pointer Events

WebKit added support for Pointer Events to provide DOM events for generic, hardware-agnostic pointer input such as those generated by a mouse, touch, or stylus. It adds a layer of abstraction that makes it easier for web developers to handle a variety of input devices. Similar to mouse events, pointer events include coordinates, a target element, button states, but they also supports additional properties related to other forms of input, such as pressure, tilt, and more.

See the Pointer Events specification for more information.

Visual Viewport API

WebKit added support for the Visual Viewport API, that allows webpages to detect the part of the page that is visible to the user, taking zooming and the onscreen keyboard into account. Developers can use this API to move content out of the way of the onscreen keyboard. This is useful for a floating overlay, a custom completion list popup, or a custom-drawn caret in a custom editing area.

See the Visual Viewport API specifications for more information.

Programmatic Paste

WebKit also brings new support for programmatic paste in Safari for iOS and iPadOS with document.execCommand('paste'). When a page triggers programmatic paste within scope of a user gesture, a callout bar with the option to paste is provided. When the call out is tapped it will grant access to the clipboard and proceed with the paste. For paste operations where the contents of the clipboard share the same origin as the page triggering the programmatic paste, WebKit allows the paste immediately with no callout bar.

Learn more in the Document.execCommand() reference on MDN.

Accelerated Scrolling on iOS and iPadOS

Accelerated scrolling the main frame has always been available with WebKit on iOS. In addition, developers could use a CSS property called -webkit-overflow-scrolling to opt-in to fast scrolling for overflow scroll. None of that is necessary with iPadOS and iOS 13. Subframes are no longer extended to the size of their contents and are now scrollable, and overflow: scroll; and iframe always get accelerated scrolling. Fast scrolling emulation libraries are no longer needed and -webkit-overflow-scrolling: touch; is a no-op on iPad. On iPhone, it still has the side-effect of creating a CSS stacking context on scrollable elements. Developers will want to test their content to see how hardware accelerated scrolling everywhere affects it and remove unnecessary workarounds.

Performance Improvements

This release brings performance improvements that reduced the initial rendering time for webpages on iOS, and reduced load time up to 50% for webpages on watchOS. Reduced memory use by JavaScript in Safari, web views, and non-web clients that use JSContext. WebKit also achieved better graphics rendering performance showing up to a 10% improvement in the MotionMark graphics performance benchmark score.

Intelligent Tracking Prevention

The latest update to Intelligent Tracking Prevention enhances prevention of cross-site tracking through the abuse of link-decoration. The updates in ITP 2.3 as part of this release of Safari add new countermeasures. In addition to the 24-hour cookie expiry from ITP 2.2, non-cookie website data like LocalStorage will be marked for deletion if the page is navigated to from a classified domain to a landing URL with a query string or fragment identifier. Deletion happens after seven days of Safari use without user interaction on the website. Beyond link decoration, Intelligent Tracking Prevention will also downgrade document.referrer to the referrer’s eTLD+1 if the referrer has link decoration when the user was navigated from a classified domain.

For details on Intelligent Tracking Prevention updates, see the “Intelligent Tracking Prevention 2.3” blog post and our collection of other privacy related blog posts.

FIDO2-compliant USB Security Keys

Safari 13 on macOS has support for FIDO2-compliant USB security keys through the Web Authentication standard. Security keys can hold device bounded public key credentials that are associated with specific internet accounts. This allows users to add an additional layer of protection to their accounts by utilizing security keys as a second factor to authenticate. Not just that, but Web Authentication also prevents phishing. Since user agents are arbitrating the entire authentication process and the public key credentials can never leave their bounded security keys, it’s impossible for phishing sites to get users’ targeted credentials.

More Privacy and Security Improvements

Building on the strength of privacy and security in WebKit, users will have additional protections with sandbox hardening on iOS and macOS, and navigation protection from third-party iframes.

Developers will now need to call DeviceMotionEvent.requestPermission() or DeviceOrientationEvent.requestPermission() to prompt the user for permission before getting access to the events, on in Safari or Safari View Controller on iOS and iPadOS.

Apple Pay in WKWebView

In iOS 13, webpages loaded in WKWebView can now accept Apple Pay. In order to protect the security of Apple Pay transactions in WKWebView, Apple Pay cannot be used alongside of script injection APIs such as WKUserScript or evaluateJavaScript(_:completionHandler:).

If these APIs are invoked before a webpage uses Apple Pay, Apple Pay will be disabled. If a webpage uses Apple Pay before evaluateJavaScript(_:completionHandler:) is invoked, the completion handler will be called with a non-nil NSError. These restrictions are reset every time the top frame is navigated.

Media Improvements

Media improvements in WebKit improve both compatibility and capability for developers. Support for the decodingInfo() method of the Media Capabilities API allows developers to check for supported codecs, efficiently supported codecs, and optional codec features including new alpha transparency. WebKit now supports transparency in video with an alpha channel that works for all supported video formats.

In Safari on macOS, WebKit added the ability for users to share their screen with others natively, using web technologies, without the need for any plug-ins. SFSafariViewController gained WebRTC support for the navigator.mediaDevices property of the Media and Streams API.

Dark Mode for iOS and iPadOS

Last year WebKit added dark mode for the web to Safari on macOS Mojave. This year, WebKit brings the same support to style web content that matches the system appearance in Safari on iOS and iPadOS.

WebKit.org blog posts page shown in light modeWebKit.org blog posts page shown in dark mode

Learn how it works and how to add support to your web content in the blog posts on “Dark Mode Support in WebKit” and “Dark Mode in Web Inspector”.

Improved Home Screen Web Apps on iOS and iPadOS

Support for websites saved to the home screen have been polished to work more like native apps. The changes focused on better multitasking support, improved login flow to work in-line without switching to Safari, support for Apple Pay, and improved reliability for remote Web Inspector.

Safari WebDriver for iOS

Support for Safari WebDriver on iOS 13. Control via WebDriver is exposed to developers via the /usr/bin/safaridriver executable, which hosts a driver that handles REST API requests sent by WebDriver test clients. In order to run WebDriver tests on an iOS device, it must be plugged into a macOS host that has a new enough version of safaridriver. Support for hosting iOS-based WebDriver sessions is available in safaridriver included with Safari 13 and later. Older versions of safaridriver do not support iOS WebDriver sessions.

If you’ve never used safaridriver on macOS before, you’ll first need to run safaridriver --enable and authenticate as an administrator. Then, you’ll need to enable Remote Automation on every device that you intend to use for WebDriver. To do this, toggle the setting in the Settings app under Safari → Advanced → Remote Automation.

With the introduction of native WebDriver support in Safari on iOS 13, it’s now possible to run the same automated tests of desktop-oriented web content on desktop and mobile devices equally. Safari’s support comes with new, exclusive safeguards to simultaneously protect user security and privacy and also help you write more stable and consistent tests. You can try out Safari’s WebDriver support today by installing a beta of macOS Catalina and iOS 13.

You can learn more about iOS support for Safari WebDriver by reading the “WebDriver is Coming to Safari in iOS 13”.

Web Inspector Improvements

Web Inspector adds tools that bring new insights to web content during development. This release also includes many tooling refinements with more capabilities and a better debugging experience. Among the changes, Web Inspector has improved performance for debugging large, complex websites.

A new CPU Usage Timeline is available in the Timelines Tab that provides developers with insight into power efficiency through CPU usage. This helps developers analyze and improve the power efficiency of their web content. The Timelines Tab has also been updated to support importing and exporting of recorded timeline data using a JSON file format. The portability of timeline data makes it possible to share recordings with other developers, or use the data in custom tools.

Read more in the “CPU Timeline in Web Inspector” blog post. For more tips on developing power efficient web content, you can also read the “How Web Content Can Affect Power Usage” blog post.

This release introduces a new Audit Tab to run tests against web content with results that can be easily imported and exported. The Audit Tab includes a built-in accessibility audit for web content and allows developers to create their own audits for custom checks throughout the web content development process.

You can read more in the blog posts for “Audits in Web Inspector” and “Creating Web Inspector Audits” .

When an iOS or iPadOS device with Web Inspector enabled in Safari’s Advanced Settings is connected to a macOS device running Safari, Web Inspector will offer a new Device Settings menu. The Device Settings menu allows overriding developer-related Safari settings such as the User-Agent string when Web Inspector is connected to the device.

Read more about this in the “Changing Page Settings on iOS Using Web Inspector” blog post.

The Elements Tab includes a new Changes sidebar to keep track of CSS changes made in the Styles sidebar, making it easier to capture all of the changes made and re-incorporate them into production code. In the Network Tab, certificates and TLS settings are now available to review in the Security pane of the resources view.

Feedback

These improvements are available to users running watchOS 6, iOS 13, iPadOS, macOS Catalina, macOS Mojave 10.14.6 and macOS High Sierra 10.13.6. These features were also available to web developers with Safari Technology Preview releases. Changes in this release of Safari were included in the following Safari Technology Preview releases: 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89. Download the latest Safari Technology Preview release to stay on the forefront of future web features. You can also use the WebKit Feature Status page to watch for changes to your favorite web platform features.

We love hearing from you. Send a tweet to @webkit or @jonathandavis to share your thoughts on this release, and any features you were hoping for that didn’t make it. If you run into any issues, we welcome your bug reports for Safari, or WebKit bugs for web content issues.

December 20, 2019 06:00 PM

December 18, 2019

Release Notes for Safari Technology Preview 97

Surfin’ Safari

Safari Technology Preview Release 97 is now available for download for macOS Catalina and macOS Mojave. If you already have Safari Technology Preview installed, you can update in the Software Update pane of System Preferences on macOS.

This release covers WebKit revisions 251627-252823.

Resize Observer

  • Enabled Resize Observer by default (r251822)

WebAuthn

  • Added UI with instructions for authenticating with a security key while authenticating
  • Added support for legacy Google NFC Titan security keys (r252297)

Web Animations

  • Added support for AnimationEvent.pseudoElement (r251840)
  • Fixed retargeted transitions targeting accelerated properties that do not stop the original transition (r252540)
  • Fixed the easing property for a CSS transition effect (r251657)
  • Fixed the transform property always none for getKeyframes() output of a CSS Animation (r251839)
  • Fixed getKeyframes() to return the right timing function for declarative animations (r251649)

Web Inspector

  • Elements
    • Added support for multiline CSS property values (r252523)
    • Added support for highlighting nodes that match CSS rules with pseudo-selectors (r252436)
    • Added color picker support for P3 color space (r252168)
    • Fixed an issue where copying multiple DOM nodes would only copy the last selected DOM node (r252615)
    • Fixed aqua and fuchsia not being detected as CSS colors (r252448)
    • Outlined sRGB-safe areas on the P3 color picker (r252747)
  • Sources
    • Added a context menu item to reveal the local override when a resource is loaded from it (r252495)
    • Added support for matching local overrides based on URL pattern in addition to exact match (r252614)
    • Changed to prefer non-blackboxed scripts when showing a source code location link (r253087)
    • Changed to fire Node Removed breakpoints whenever the DOM node is removed from the main DOM tree, not just when it’s removed from it’s parent (r251871)
    • Fixed “Toggle Visibility” context menu item to work for elements inside a shadow tree (r252026)
    • Made the default content of the Inspector Bootstrap Script a comment that explains how it works (r251921)
    • Moved the “Local Override…” creation item from the Breakpoints section options menu to the Create Resource menu (r252517)
    • Made call frames from blackboxed scripts visually distinct (r252745)
  • Timelines
    • Added a marker for when a stop was requested (r252199)
    • Added a timeline that shows information about any recorded CSS animation/transition (r251959)
    • Labeled ResizeObserver callbacks in the JavaScript & Events timeline (r251832)
  • Layers
    • Enabled the Layers Tab by default (r252063)
  • Console
    • Defaulted to focusing the console prompt if no other content is focused after opening Web Inspector (r251958)
    • Fixed an issue where the saved result value was still being shown after navigating (r253000)
  • Settings
    • Enabled line wrapping by default (r251918)

Rendering

  • Fixed image flashing with transform: rotate animation (r252486)
  • Implemented accelerated video-to-texture upload path for ANGLE backend for WebGL (r252741)

Back-Forward Cache

  • Added site-specific back-forward cache quirk to work around an issue on vimeo.com (r252301)
  • Fixed WebAnimation to never prevent entering the back-forward cache (r251742)
  • Fixed PaymentRequest and PaymentResponse to not prevent entering the back-forward cache (r252338)
  • Fixed UserMediaRequest to not prevent entering the back-forward cache (r251746)
  • Made MediaStream and MediaStreamTrack back-forward cache friendly (r252337)

SVG

  • Added the orient property of the interface SVGMarkerElement (r252444)
  • Added percentage support for fill-opacity, stroke-opacity, stop-opacity, and flood-opacity (r251696)
  • Changed properties that take <position> to not accept 3 values (r251668)
  • Disabled SVG shapes should not be hit (r252069)
  • Fixed SVGGeometryElement.getPointAtLength to clamp its argument (r251877)
  • Fixed opacity to always serialize as a number (r251828)

Clipboard API

  • Added some infrastructure to resolve ClipboardItems into pasteboard data for writing (r252315)
  • Added support for Clipboard.readText() (r252627)

CSS

  • Added support for the Q unit (r251662)
  • Changed CSS Transitions and CSS Animations properties to treat unit-less 0 as an invalid value for times (r251658)
  • Fixed CSS grid line name positions after auto repeat with no line names (r251965)
  • Fixed -webkit-font-smoothing: none not antialiasing subsequent elements (r252622)
  • Fixed ::before and ::after elements not filling their grid cell when the container has display: contents (r251780)
  • Fixed calc() serialization to match the specifications (r253079)
  • Implemented the CSS clamp() function (r253105)

Remote Playback API

  • Enabled Remote Playback API by default (r251784, r251737)
  • Ensured the MediaRemote callback always called (r252331)

Media

  • Batched multiple EME key requests into one request and response (r251895)

JavaScript

  • Added BigInt support for ++ and -- (r252680)
  • Fixed Intl.DateTimeFormat to return resolvedOptions in the correct order (r251815)
  • Optimized Promise runtime functions (r251671)
  • Implement String.prototype.replaceAll (r252683)

Picture-in-Picture Web API

  • Enabled the Picture-in-Picture API by default (r251925, r251745, r251797)
  • Added support for the :picture-in-picture CSS pseudo-class for video elements in picture-in-picture mode (r252330)
  • Fixed picture-in-picture events to fire when entering or exiting the picture-in-picture mode (r252240)

WebAssembly

  • Created a WebAssembly interpreter (r251886)
  • Support WebAssembly.Global (r253074)

Web API

  • Added fullscreen style support for reddit.com (r251827)
  • Changed the file input to fire an input event before the change event (r252768)
  • Changed hidden framesets to provide default edgeInfo value (r251680)
  • Fixed <input type="range">.setAttribute("value") to update the value (r251718)
  • Fixed some captcha images rendering as a blank white space (r252353)
  • Fixed content sometimes disappearing for a <video> with controls and clipping (r252070)
  • Fixed a bug where focusing a shadow host which delegates focus will properly skip inner shadow hosts that delegate focus (r252537)
  • Fixed getComputedStyle returning auto for zIndex property even after it has been set on non-positioned elements (r252724)
  • Fixed very slow tile rendering due to event region painting in Google Docs spreadsheets (r252419)
  • Fixed notification permissions not getting remembered for origins without a port (r251709)
  • Fixed VeryHigh priority loads (r252431)

December 18, 2019 07:30 PM

December 12, 2019

Nikolas Zimmermann: CSS 3D transformations & SVG

Igalia WebKit

As mentioned in my first article, I have a long relationship with the WebKit project, and its SVG implementation. In this post I will explain some exciting new developments and possible advances, and I present some demos of the state of the art (if you cannot wait, go and watch them, and come back for the details). To understand why these developments are both important and achievable now though, we’ll have to first understand some history.

By zimmermann@kde.org (Nikolas Zimmermann) at December 12, 2019 12:00 AM

December 10, 2019

Preventing Tracking Prevention Tracking

Surfin’ Safari

This blog post covers enhancements to Intelligent Tracking Prevention (ITP) included in Safari on iOS and iPadOS 13.3, Safari 13.0.4 on macOS Catalina, Mojave, and High Sierra.

Tracking Prevention as a Tracking Vector

Any kind of tracking prevention or content blocking that treats web content differently based on its origin or URL risks being abused itself for tracking purposes if the set of origins or URLs provide some uniqueness to the browser and webpages can detect the differing treatment.

To combat this, tracking prevention features must make it hard or impossible to detect which web content and website data is treated as capable of tracking. We have devised three ITP enhancements that not only fight detection of differing treatment but also improve tracking prevention in general.

Origin-Only Referrer For All Third-Party Requests

ITP now downgrades all cross-site request referrer headers to just the page’s origin. Previously, this was only done for cross-site requests to classified domains.

As an example, a request to https://images.example that would previously contain the referrer header “https://store.example/baby/strollers/deluxe-stroller-navy-blue.html” will now be reduced to just “https://store.example/”.

All Third-Party Cookies Blocked on Websites Without Prior User Interaction

ITP will now block all third-party requests from seeing their cookies, regardless of the classification status of the third-party domain, unless the first-party website has already received user interaction.

The Storage Access API Takes the Underlying Cookie Policy Into Consideration

Safari’s original cookie policy restricts third-parties from setting cookies unless they already have set cookies as first-party. It is still in effect underneath ITP.

As of this ITP update, the Storage Access API takes Safari’s cookie policy into consideration when handling calls to document.hasStorageAccess().

Now a call to document.hasStorageAccess() may resolve with false for one of two reasons:

  1. Because ITP is blocking cookies and explicit storage access has not been granted.
  2. Because the domain doesn’t have cookies and thus the cookie policy is blocking cookies.

Developers have asked for this change because previously document.hasStorageAccess() could resolve with true but the iframe still couldn’t set cookies because of the cookie policy. This is a rare case in practice because Safari’s cookie policy has been in effect for a decade and a half and almost all websites that want to use cookies as third-party will also set cookies as first-party.

Absence of Cookies In Third-Party Requests Does Not Reveal ITP Status

A reasonable question to ask is can an attacker reveal ITP status for a domain outside their control by making a third-party request to the domain and checking for side effects of whether cookies were sent or not?

Safari’s default cookie policy requires a third-party to have “seeded” its cookie jar as first-party before it can use cookies as third-party. This means the absence of cookies in a third-party request can be due to ITP blocking existing cookies or the default cookie policy blocking cookies because the user never visited the website, the website’s cookies have expired, or because the user or ITP has explicitly deleted the website’s cookies.

Thus, the absence of cookies in a third-party request outside the attacker’s control is not proof that the third-party domain is classified by ITP.

Thanks To Google

We’d like to thank Google for sending us a report in which they explore both the ability to detect when web content is treated differently by tracking prevention and the bad things that are possible with such detection. Their responsible disclosure practice allowed us to design and test the changes detailed above. Full credit will be given in upcoming security release notes.

December 10, 2019 06:30 PM

December 08, 2019

Philippe Normand: HTML overlays with GstWPE, the demo

Igalia WebKit

Once again this year I attended the GStreamer conference and just before that, Embedded Linux conference Europe which took place in Lyon (France). Both events were a good opportunity to demo one of the use-cases I have in mind for GstWPE, HTML overlays!

As we, at Igalia, usually have a …

By Philippe Normand at December 08, 2019 02:00 PM

December 04, 2019

Manuel Rego: Web Engines Hackfest 2020: New dates, new venue!

Igalia WebKit

Igalia is pleased to announce the 12th annual Web Engines Hackfest. It will take place on May 18-20 in A Coruña, and in a new venue: Palexco. You can find all the information, together with the registration form, on the hackfest website: https://webengineshackfest.org/2020/.

Mission and vision

The main goal behind this event is to have a place for people from different parts of the web platform community to meet together for a few days and talk, discuss, draft, prototype, implement, etc. on different topics of interest for the whole group.

There are not many events where browser implementors from different engines can sit together and talk about their last developments, their plans for the future, or the controversial topics they have been discussing online.

However this is an event not only for developers, other roles that are part of the community, like people working on standards, are welcomed to the event.

It’s really nice to have people from different backgrounds and working on a variety of things around the web, to reach better solutions, enlighten the conversations and draft higher quality conclusions during the discussions.

We believe the combination of all these factors make the Web Engines Hackfest an unique opportunity to push forward the evolution of the web.

2020 edition

We realized that autumn is usually full of browser events (TPAC, BlinkOn, WebKit Contributors Meeting, … just to name a few), and most of the people coming to the hackfest are also attending some of them. For that reason we thought it would be a good idea to move the event from fall to spring, in order to better accommodate everyone’s schedules and avoid unfortunate conflicts or unnecessary hard choices. So next year the hackfest will happen on May from Monday 18th to Wednesday 20th (both days included).

At this stage the event is becoming popular and during the past three years we have been around 60-70 people. Igalia office has been a great venue for the hackfest during all this time, but on the last occasions we were using it as its full capacity. So this time we decided to move the hackfest to a new venue, which will allow us to grow to 100 or more participants, let’s see how things go. The venue would be Palexco, a lovely conferences building in A Coruña port, which is very close to the city center. We really hope you like the new place and enjoy it.

New venue: Palexco (picture by Jose Luis Cernadas Iglesias) New venue: Palexco (picture by Jose Luis Cernadas Iglesias)

Having more people and the new venue bring us lots of challenges but also new possibilities. So we’re changing a little bit the format of the event, we’ll have a first day in a more regular conference fashion (with some talks and lighting talks) but also including some space for discussions and hacking. And then the last 2 days will be more the usual unconference format with a bunch of breakout sessions, informal discussions, etc. We believe the conversations and discussions that happen during the hackfest are one of the best things of the event, and we hope this new format will work well.

Join us

Thanks to the changes on the venue, the event is no longer invitation-only (as it used to be). We’ll be still sending the invitations to the people usually interested on the hackfest, but you can already register by yourself just filling the registration form.

Soon we will open a call for papers for the talks, stay tuned! We’ll also have room for ligthing talks, so people attending can take advantage of them to explain their work and plans on the event.

Last but not least, Arm, Google and Igalia will be sponsoring 2020 edition, thank you very much! We hope more companies join the trend and help us to arrange the event with their support. If your company is willing to sponsor the hackfest, please don’t hesitate to contact us at hackfest@webengineshackfest.org.

Some historical information

Igalia has been organizing and hosting this event since 2009. Back then, the event was called the “WebKitGTK+ Hackfest”. The WebKitGTK+ project was, on those days, in early stages. There was lots of work to do around the project, and a few people (11 to be specific) decided to work together for a whole week to move the project forward. The event was really successful and it was happening on a similar fashion for 5 years.

On 2014 we decided to make broader the scope of the event and not restrict it to people working only on WebKitGTK+ (or WebKit), but open it to members from all parts of the web platform community (including folks working on other engines like Blink, Servo, Gecko). We changed the name to “Web Engines Hackfest”, we got a very positive response and the event has been running on yearly since then, growing more and more every year.

And now we’re looking forward to 2020 edition, in a new venue and with more people than ever. Let’s hope everything goes great.

December 04, 2019 11:00 PM

November 25, 2019

Nikolas Zimmermann: Back in town

Igalia WebKit

Welcome to my blog!

Finally I’m back after my long detour to physics :-)

Some of you might know that my colleague Rob Buis and me founded the ksvg project a little more than 18 years ago (announcement mail to kfm-devel) and met again after many years in Galicia last month.

By zimmermann@kde.org (Nikolas Zimmermann) at November 25, 2019 12:00 AM

November 21, 2019

Release Notes for Safari Technology Preview 96

Surfin’ Safari

Safari Technology Preview Release 96 is now available for download for macOS Catalina and macOS Mojave. If you already have Safari Technology Preview installed, you can update in the Software Update pane of System Preferences on macOS.

This release covers WebKit revisions 251210-251627.

Web Animations

  • Enabled the Web Animations JavaScript API by default (r251237)

WebAuthN

  • Implemented AuthenticatorCancel (r251295)

SVG

  • Added auto behavior to the width and height properties of the SVG <image> element (r251378)
  • Added bounding-box keyword to pointer-events (r251444)
  • Fixed SVGElement to conform with SVG2 (r251499)
  • Loaded event must be fired only for the SVG structurally external elements and the outermost SVG element (r251290)
  • Removed the viewTarget property of SVGViewElement (r251461)

Web API

  • Added strictness to request’s Content-Type (r251490)
  • Changed XMLHttpRequest.responseXML url to be the HTTP response URL (r251542)
  • Changed String.prototype.matchAll to throw on non-global regex (r251483)
  • Fixed a bad is array assertion JSON.parse (r251394)
  • Fixed setting border-radius on <video> element clipping the top and left sections of the video (r251385)
  • Ignored document.open or document.write after the active parser has been aborted (r251506)
  • Made requestIdleCallback suspendable (r251258)

CSS

  • Added content-box and stroke-box to the transform-box property (r251252)
  • Added support for gradients using stops with multiple positions (r251474)
  • Fixed a crash when parsing gradients with multi-position color stops (r251437)

Clipboard API

  • Implemented ClipboardItem.getType() (r251377)
  • Implemented navigator.clipboard.read() (r251279)

CSS Shadow Parts

  • Changed :part rules to be able to override the style attribute (r251285)

JavaScript

  • Removed wasmAwareLexicalGlobalObject (r251529)

Picture-in-Picture API

  • Implemented EnterPictureInPictureEvent support (r251530)
  • Added runtime logging for the Picture-in-Picture API (r251458)

Media

  • Added support for callbacks for manifest events (r251626)

Service Worker

  • Fixed MP4 video element broken with Service Worker (r251594)

Back-Forward Cache

  • Prevented putting pages that have not reached the non-visually empty layout milestone into the back-forward cache (r251275)
  • Fixed Notification to not prevent entering the back-forward cache (r251528)
  • Fixed AudioContext to not prevent entering the back-forward cache (r251537)
  • Fixed FetchResponse to not prevent entering the back-forward cache (r251545)
  • Fixed XMLHttpRequest to not prevent entering the back-forward cache (r251366)

Web Inspector

  • Elements
    • Added clickable icons for each CSS rule that distinguish the rule’s type (r251624)
    • Fixed $0 being shown for the wrong node when selecting elements in a user agent shadow tree (r251302)
    • Fixed the selection color dimmed when inside a shadow tree (r251254)
    • Replaced color wheel with square HSB color picker (r251487)
  • Sources
    • Fixed the content of the function definition popover sometimes getting cut off (r251446)
    • Changed the function/object preview popover to keep the name sticky to the top (r251466)
    • Provided a way to inject “bootstrap” JavaScript into the page as the first script executed (r251531)

WebDriver

  • Fixed the Element Click endpoint triggering a click at an incorrect y-coordinate (r251948)

November 21, 2019 06:00 PM

November 13, 2019

Manuel Rego: Web Engines Hackfest 2019

Igalia WebKit

A month ago Igalia hosted another edition of the Web Engines Hackfest in our office in A Coruña. This is my personal summary of the event, obviously biased as I’m part of the organization.

Talks

During the event we arranged six talks about a variety range of topics:

Emilio Cobos during his talk at the Web Engines Hackfest 2019 Emilio Cobos during his talk at the Web Engines Hackfest 2019

Web Platform Tests (WPT)

Apart from the talks, the main and most important part of the hackfest (at least from my personal point of view) are the breakout sessions and discussions we organize about different interest topics.

During one of these sessions we talked about the status of things regarding WPT. WPT is working really fine for Chromium and Firefox, however WebKit is still lagging behind as synchronization is still manual there. Let’s hope things will improve in the future on the WebKit side.

We also highlighted that the number of dynamic tests on WPT are less than expected, we discarded issues with the infrastructure and think that the problems are more on the side of people writing the tests, that somehow forget to cover cases when things changes dynamically.

Apart from that James Graham put over the table the results from the last MDN survey, which showed interoperability as one of the most important issues for web developers. WPT is helping with interop but despite of the improvements on that regard this is still a big problem for authors. We didn’t have any good answer about how to fix that, in my case I shared some ideas that could help to improve things at some point:

  • Mandatory tests for specs: This is already happening for some specs like HTML but not for all of them. If we manage to reach a point where every change on a spec comes with a test, probably interoperability on initial implementations will be much better. It’s easy to understand why this is not happening as people working on specs are usually very overloaded.
  • Common forum to agree on shipping features: This is a kind of utopia, as each company has their own priorities, but if we had a place were the different browser vendors talk in order to reach an agreement about when to ship a feature, that would make web author’s lifes much easier. We somehow managed to do that when we shipped CSS Grid Layout almost simultaneously in the different browsers, if we could repeat that success story for more features in the future that would be awesome.

Debugging tools

One of the afternoons we did a breakout session related to debugging tools.

First Christian Biesinger showed us JdDbg which is an amazing tool to explore data structures in the web browser (like the DOM, layout or accessibility trees). All the information is updated live while you debug your code, and you can access all of them on a single view very comfortably.

Afterwards Emilio Cobos explained how to use the reverse debugger rr. With this tool you can record a bug and then replay it as many times as you need going back and forward in time. Also Emilio showed how to annotate all the output so you can go directly to that moment in time, or how to randomize the execution to help caught race conditions. As a result of this explanation we got a bug fixed in WebKitGTK+.

Other

About MathML Fred’s talk finished sending the intent-to-implement mail to blink-dev officially announcing the beginning of the upstreaming process. Since then a bunch of patches have already landed behind a runtime flag, you can follow the progress on Chromium issue #6606 if you’re interested.

On the last day a few of us even attended the CSS Working Group confcall during the hackfest, which worked as a test for Igalia office’s infrastructure thinking on the face-to-face meeting we’ll be hosting next January.

People attending the CSSWG confcall (from left to right: Oriol, Emilio, fantasai, Christian and Brian) People attending the CSSWG confcall (from left to right: Oriol, Emilio, fantasai, Christian and Brian)

As a side note, this time we arranged a guided city tour around A Coruña and, despite of the weather, people seemed to have enjoyed it.

Acknowledgements

Thanks to everyone coming, we’re really happy for the lovely feedback you always share about the event, you’re so kind! ☺

Of course, kudos to the speakers for the effort working on such a nice set of interesting talks. 👏

Last, but not least, big thanks to the hackfest sponsors: Arm, Google, Igalia and Mozilla. Your support is critical to make this event possible, you rock. 🎸

Web Engines Hackfest 2019 sponsors: Google and Igalia Web Engines Hackfest 2019 sponsors: Arm, Google, Igalia and Mozilla

See you all next year. Some news about the next edition will be announced very soon, stay tuned!

November 13, 2019 11:00 PM

November 08, 2019

WebGPU and WSL in Web Inspector

Surfin’ Safari

In Safari Technology Preview release 91, beta support was added for the WebGPU API and WSL. In Safari Technology Preview release 94, support for showing WebGPU devices, as well as all associated render/compute pipelines and <canvas> elements, was added to Web Inspector inside the Canvas Tab.

Just like WebGL shader programs, all render/compute pipelines are editable, and any changes will have an immediate effect. Compute pipelines, as they only have one shader module, are shown in a single text editor. Render pipelines, since they have both vertex and fragment shader modules, are shown in side-by-side text editors, one for each shader module. In the case that both the vertex and fragment shader modules are shared, however, they are shown as a single text editor, just like it is for a compute pipeline.

Try editing any of the WebGPU pipelines in our gallery of WebGPU samples using Web Inspector. We’ll be keeping that page updated with the latest demos. Many thanks to Austin Eng for making the demo used in the video above.

Let us know what you think! Send feedback on Twitter (@webkit, @jonathandavis, @dcrousso) or by filing a bug.

Note: Learn more about Web Inspector from the Web Inspector Reference documentation.

November 08, 2019 01:21 AM

October 30, 2019

Release Notes for Safari Technology Preview 95

Surfin’ Safari

Safari Technology Preview Release 95 is now available for download for macOS Catalina and macOS Mojave. If you already have Safari Technology Preview installed, you can update in the Software Update pane of System Preferences on macOS.

This release covers WebKit revisions 250947-251210.

Shadow DOM

  • Added support for ShadowRoot.delegateFocus (r251043)

Images

  • Added image/apng as a supported mime type for images (r251182)
  • Fixed a bug with filter outsets that caused black lines on images on wikipedia (r251119)

WebRTC

  • Removed unified plan runtime flag (r250969)

Clipboard API

  • Implemented getType() for ClipboardItems created from bindings (r251134)
  • Refactored custom pasteboard writing codepaths to handle multiple items (r251117)
  • Refactored Pasteboard item reading functions (r250950)
  • Supported writing multiple PasteboardCustomData with SharedBuffers to the pasteboard (r251100)
  • Added support for programmatic paste requests on macOS (r250973)

Picture-in-Picture Web API

  • Implemented HTMLVideoElement.requestPictureInPicture() and Document.exitPictureInPicture() (r251160)

Service Workers

  • Changed to reject a response body promise in the case of a failure happening after the HTTP response (r251101)
  • Prevented timeout for a load intercepted by a Service Worker that receives a response (r250985)

Back-Forward Cache

  • Fixed pages frequently failing to enter the back-forward cache due to frames with a quick redirect (r251019)
  • Fixed back-forward cache after doing a Favorites navigation (r251049)
  • Fixed clearing website data for a given session to not shut down cached processes for other sessions (r251048)
  • Fixed DOMCacheStorage to not prevent pages from entering the back-forward cache (r250965)

Web Inspector

  • Sources
    • Enabled the new Sources Tab by default, which merges the Debugger Tab and Resources Tab into a single UI (r250991)
    • Added support for automatically creating an image or font local override when dragging content over a non-overridden resource (r251024, r251144)
  • Debugger
    • Added support for pattern-based script blackboxing by URL in the Settings Tab (r251039)
    • Prevented source mapped resources from being blackboxed (r251170)
  • Elements
    • Included a filter option in the Computed details sidebar for condensing all related longhand properties into their respective shorthand properties (r251038)

October 30, 2019 05:00 PM

October 28, 2019

Adrián Pérez de Castro: The Road to WebKit 2.26: a Six Month Retrospective

Igalia WebKit

Now that version 2.26 of both WPE WebKit and WebKitGTK ports have been out for a few weeks it is an excellent moment to recap and take a look at what we have achieved during this development cycle. Let's dive in!

  1. New Features
  2. Security
  3. Cleanups
  4. Releases, Releases!
  5. Buildbot Maintenance
  6. One More Thing

New Features

Emoji Picker

The GTK emoji picker has been integrated in WebKitGTK, and can be accessed with Ctrl-Shift-; while typing on input fields.

Your browser does not support WebM videos (see an animated GIF instead). GNOME Web showing the GTK emoji picker.

Data Lists

WebKitGTK now supports the <datalist> HTML element (reference), which can be used to list possible values for an <input> field. Form fields using data lists are rendered as an hybrid between a combo box and a text entry with type–ahead filtering.

Your browser does not support WebM videos (see an animated GIF instead). GNOME Web showing an <input> entry with completion backed by <datalist>.

WPE Renderer for WebKitGTK

The GTK port now supports reusing components from WPE. While there are no user-visible changes, with many GPU drivers a more efficient buffer sharing mechanism—which takes advantage of DMA-BUF, if available—is used for accelerated compositing under Wayland, resulting in better performance.

Packagers can disable this feature at build time passing -DUSE_WPE_RENDERER=OFF to CMake, which could be needed for systems which cannot provide the needed libwpe and WPEBackend-fdo libraries. It is recommended to leave this build option enabled, and it might become mandatory in the future.

In-Process DNS Cache

Running a local DNS caching service avoids doing queries to your Internet provider’s servers when applications need to resolve the same host names over and over—something web browsers do! This results in faster browsing, saves bandwidth, and partially compensates for slow DNS servers.

Patrick and Carlos have implemented a small cache inside the Network Process which keeps in memory a maximum of 400, valid for 60 seconds. Even though it may not seem like much, this improves page loads because most of the time the resources needed to render a page are spread across a handful of hosts and their cache entries will be reused over and over.

Promotional image of the “Gone in 60 Seconds” movie. This image has nothing to do with DNS, except for the time entries are kept in the cache.

While it is certainly possible to run a full-fledged DNS cache locally (like dnsmasq or systemd-resolved, which many GNU/Linux setups have configured nowadays), WebKit can be used in all kinds of devices and operating systems which may not provide such a service. The caching benefits all kinds of systems, with embedded devices (where running an additional service is often prohibitive) benefiting the most, and therefore it is always enabled by default.

Security

Remember Meltdown and Spectre? During this development cycle we worked on mitigations against side channel attacks like these. They are particularly important for a Web engine, which can download and execute code from arbitrary servers.

Subprocess Sandboxing

Both WebKitGTK and WPE WebKit follow a multi-process architecture: at least there is the “UI Process”, an application that embeds WebKitWebView widget; the “Web Process” (WebKitWebProcess, WPEWebProcess) which performs the actual rendering, and the “Network Process” (WebKitNetworkProcess, WPENetworkProcess) which takes care of fetching content from the network and also manages caches and storage.

Patrick Griffis has led the effort to add support in WebKit to isolate the Web Process from the rest of the system, running it with restricted access to the rest of the system. This is achieved using Linux namespaces—the same underlying building blocks used by containerization technologies like LXC, Kubernetes, or Flatpak. As a matter of fact, we use the same building blocks as Flatpak: Bubblewrap, xdg-dbus-proxy, and libseccomp. This not only makes it more difficult for a website to snoop on other processes' data: it also limits potential damage to the rest of the system caused by maliciously crafted content, because the Web Process is where most of which it is parsed and processed.

This feature is built by default, and using it in applications is only one function call away.

PSON

Process Swap On (cross-site) Navigation is a new feature which makes it harder for websites to steal information from others: rendering of pages from different sites always takes place in different processes. In practice, each security origin uses a different Web Process (see above) for rendering, and while navigating from one page to another new processes will be launched or terminated as needed. Chromium's Site Isolation works in a similar way.

Unfortunately, the needed changes ended up breaking a few important applications which embed WebKitGTK (like GNOME Web or Evolution) and we had to disable the feature for the GTK port just before its 2.26.0 release—it is still enabled in WPE WebKit.

Our plan for the next development cycle is keep the feature disabled by default, and to provide a way for applications to opt-in. Unfortunately it cannot be done the other way around because the public WebKitGTK API has been stable for a long time and we cannot afford breaking backwards compatibility.

HSTS

This security mechanism helps protect websites against protocol downgrade attacks: Web servers can declare that clients must interact using only secure HTTPS connections, and never revert to using unencrypted HTTP.

During the last few months Claudio Saavedra has completed the support for HTTP Strict Transport Security in libsoup—our networking backend—and the needed support code in WebKit. As a result, HSTS support is always enabled.

New WebSockets Implementation

The WebKit source tree includes a cross-platform WebSockets implementation that the GTK and WPE ports have been using. While great for new ports to be able to support the feature, it is far from optimal: we were duplicating network code because libsoup also implements them.

Now that HSTS support is in place, Claudio and Carlos decided that it was a good moment to switch libsoup's implementation so WebSockets can now also benefit from it. This also made possible to provide the RFC-7692 permessage-deflate extension, which enables applications to request compression of message payloads.

To ensure that no regressions would be introduced, Claudio also added support in libsoup for running the Autobahn 🛣 test suite, which resulted in a number of fixes.

Cleanups

During this release cycle we have deprecated the single Web Process mode, and trying to enable it using the API is a no-op. The motivation for this is twofold: in the same vein of PSON and sanboxing, we would rather not allow applications to make side channel attacks easier; not to mention that the changes needed in the code to accommodate PSON would make it extremely complicated to keep the existing API semantics. As this can potentially be trouble for some applications, we have been in touch with packagers, supporting them as best as we can to ensure that the new WebKitGTK versions can be adopted without regressions.

Another important removal was the support for GTK2 NPAPI browser plug-ins. Note that NPAPI plug-ins are still supported, but if they use GTK they must use version 3.x—otherwise they will not be loaded. The reason for this is that GTK2 cannot be used in a program which uses GTK3, and vice versa. To circumvent this limitation, in previous releases we were building some parts of the WebKit source code twice, each one using a different version of GTK, resulting in two separate binaries: we have only removed the GTK2 one. This allowed for a good clean up of the source tree, reduced build times, and killed one build dependency. With NPAPI support being sunsetted in all browsers, the main reason to keep some degree of support for it is the Flash plug-in. Sadly its NPAPI version uses GTK2 and it does not work starting with WebKitGTK 2.26.0; on the other hand, it is still possible to run the PPAPI version of Flash through FreshPlayerPlugin if needed.

Releases, Releases!

Last March we released WebKitGTK 2.24 and WPE WebKit 2.24 in sync, and the same for the current stable, 2.26. As a matter of fact, most releases since 2.22 have been done in lockstep and this has been working extremely well.

Hannibal Smith, happy about the simultaneous releases.

Both ports share many of their components, so it makes sense to stabilize and prepare them for a new release series at the same time. Many fixes apply to both ports, and the few that not hardly add noise to the branch. This allows myself and Carlos García to split the effort of backporting fixes to the stable branch as well—though I must admit that Carlos has often done more.

Buildroot ♥ WebKit

Those using Buildroot to prepare software images for various devices will be happy to know that packages for the WPE WebKit components have been imported a while ago into the source tree, and have been available since the 2019.05 release.

Two years ago I dusted off the webkitgtk package, bringing it up to the most recent version at the time, keeping up with updates and over time I have been taking care of some of its dependencies (libepoxy, brotli, and woff2) as well. Buildroot LTS releases are now receiving security updates, too.

Last February I had a great time meeting some of the Buildroot developers during FOSDEM, where we had the chance of discussing in person how to go about adding WPE WebKit packages to Buildroot. This ultimately resulted in the addition of packages libwpe, wpebackend-fdo, wpebackend-fdo, and cog to the tree.

My plan is to keep maintaining the Buildroot packages for both WebKit ports. I have also a few improvements in the pipeline, like enabling the sandboxing support (see this patch set) and usage of the WPE renderer in the WebKitGTK package.

Buildbot Maintenance

Breaking the Web is not fun, so WebKit needs extensive testing. The source tree includes tens of thousands of tests which are used to avoid regressions, and those are ran on every commit using Buildbot. The status can be checked at build.webkit.org.

Additionally, there is another set of builders which run before a patch has had the chance of being committed to the repository. The goal is to catch build failures and certain kinds of programmer errors as early as possible, ensuring that the source tree is kept “green”—that is: buildable. This is the EWS, short for Early Warning System, which trawls Bugzilla for new—or updated—patches, schedules builds with them applied, and adds a set of status bubbles in Bugzilla next to them. Igalia also contributes with EWS builders

EWS bot bubbles as shown in Bugzilla For each platform the EWS adds a status bubble after trying a patch.

Since last April there is an ongoing effort to revamp the EWS infrastructure, which is now using Buildbot as well. Carlos López has updated our machines recently to Debian Buster, then I switched them to the new EWS at ews-build.webkit.org. This is based on Buildbot as well, which brings niceties in the user interface like being able to check the status for the GTK and for the WPE WebKit port conveniently in realtime. Most importantly, this change has brought the average build time from thirteen minutes down to eight, making the “upload patch, check EWS build status” cycle shorter for developers.

Big props to Aakash Jain, who has been championing all the EWS improvements.

One More Thing

Finally, I would like to extend our thanks to everybody who has contributed to WebKit during the 2.26 development cycle, and in particular to the Igalia Multimedia team, who have been hard at work improving our WebRTC support and the GStreamer back-end 🙇.

October 28, 2019 12:30 AM

October 16, 2019

Release Notes for Safari Technology Preview 94

Surfin’ Safari

Safari Technology Preview Release 94 is now available for download for macOS Mojave and macOS Catalina. If you already have Safari Technology Preview installed, you can update in the Software Update pane of System Preferences on macOS.

This release covers WebKit revisions 250329-250947.

CSS Shadow Parts

Web Animations

  • Fixed removing an element to only cancel its declarative animations (r250335)

Storage Access API

  • Changed document.hasStorageAccess() to return true when the cookie policy allows access and false otherwise, for third parties not blocked by ITP (r250431, r250589)

WebRTC

  • Changed to allow suspending RTCPeerConnection when not connected (r250726)

Media

  • Updated MediaDevices to require a secure context (r250551)

JavaScript

  • Changed toExponential, toFixed, and toPrecision to allow arguments up to 100 (r250389)

CSS Grid

  • Preserved auto repeat() in getComputedStyle() for non-grids (r250715)

Web API

  • Accepted two values in the overflow shorthand (r250849)
  • Allowed using WebGL 2 when USE_ANGLE=1 (r250740)
  • Changed the default statusText of Response to an empty string (r250787)
  • Changed CSS ellipse() to accept 0 or 2 <shape-radius> (r250653)
  • Changed Service Worker Fetch events to time out (r250852)
  • Corrected clip-path <geometry-box> mapping (r250778)
  • Changed Fetch API no-CORs check to take into account same-origin (r250515)
  • Changed radio button groups to be scoped by shadow boundaries (r250708)
  • Fixed a newly inserted element to get assigned to a named slot if slot assignments had already happened (r250709)
  • Fixed AbortSignal to always emit the abort signal (r250727)
  • Fixed JSON.parse to correctly handle array proxies (r250860)
  • Made table’s clientWidth and clientHeight include its border sizes (r250553)
  • Updated attachShadow to support attaching a shadow root to a main element (r250770)
  • Updated Fetch data URL HEAD request to result in empty response body (r250822)
  • Updated radial gradients to reject negative radii (r250730)
  • Updated ImageBitmap to be serializable (r250721)

Web Inspector

  • Elements
    • Fixed issue where properties were always shown as invalid if they didn’t match the selected node (r250633)
  • Resources
    • Fixed issue where newlines were being unexpectedly added inside template string expressions (r250544)
    • Include local resource overrides in the Open Resource dialog (r250407)
  • Debugger
    • Prevent blackboxing of scripts that haven’t finished loading or failed to load (r250813)
  • Canvas
    • Made it more obvious that the cards in the Overview are clickable (r250859)
    • Show “No Preview Available” instead of an empty preview for WebGPU devices (r250858)
    • Support editing of WebGPU render pipelines that use the same shader module for vertex and fragment (r250874)
    • Fixed issue where clicking on the Overview path component didn’t work (r250855)
    • Dark Mode: Minor dark mode style fixes (r250533, r250854)
  • Settings
    • Enable the image transparency grid by default and create a checkbox for it (r250814)

WebDriver

  • Fixed an issue that prevented sudo safaridriver --enable from working correctly

back-forward Cache

  • Allowed pages served over HTTPS with Cache-Control: no-store header to enter the back-forward cache (r250437)
  • Allowed pages using EventSource to enter the back-forward cache (r250761)
  • Allowed pages using FontFaceSet to enter the back-forward cache (r250693)
  • Allowed pages using IDBIndex to enter the back-forward cache (r250754)
  • Added basic back-forward cache support for RTCPeerConnection (r250379)
  • Changed IDBTransaction and IDBObjectStore to not prevent a page from entering the back-forward cache (r250531)
  • Fixed pages that frequently fail to enter the back-forward cache due to pending loads (r250414)
  • Fixed pages using WebGLRenderingContext to enter the back-forward cache (r250464)
  • Fixed pages with Web Workers to enter the back-forward cache (r250527)
  • Fixed pages using PendingImageBitmap to enter the back-forward cache (r250782)
  • Fixed ServiceWorkerContainer to never prevent a page from entering the back-forward cache (r250758)
  • Fixed XMLHttpRequest sometimes preventing pages from entering the back-forward cache (r250678)
  • Fixed IDBRequest to not prevent a page from entering the back-forward cache (r250425)
  • Fixed provisional and scheduled loads in subframes to not prevent a page from entering the back-forward cache (r250686)
  • Fixed RTCDataChannel to not prevent entering back-forward cache except if in an open state (r250573)
  • Made fixes to allow youtube.com to enter the back-forward cache on macOS (r250935)
  • Improved Service Worker support for back-forward cache (r250378)

IndexedDB

  • Added size estimate for key path when estimating task size (r250666)
  • Fixed wrapping CryptoKeys for IndexedDB during serialization (r250811)
  • Included size of index records in size estimate of put/add task (r250936)
  • Updated size to actual disk usage only when estimated increase is bigger than the space available (r250937)

October 16, 2019 05:00 PM

October 02, 2019

Release Notes for Safari Technology Preview 93

Surfin’ Safari

Safari Technology Preview Release 93 is now available for download for macOS Mojave and the macOS Catalina beta. If you already have Safari Technology Preview installed, you can update in the Software Update pane of System Preferences on macOS.

This release covers WebKit revisions 249750-250329.

Resource Timing

  • Updated to report performance entries with all HTTP status codes (r250167)

SVG

  • Added auto behavior for rx and ry to the SVG <ellipse> and <rect> elements (r250147)
  • Fixed SVG <animateMotion> to reset the element to its first animation frame if its fill is set to "remove" (r249974)
  • Fixed SMIL animations of SVG <view> element having no effect (r249843)

Web API

  • Added support for sync-xhr feature policy (r250288)
  • Changed to use the same parser for <meta http-equiv="refresh"> and Refresh HTTP header (r249792)
  • Fixed Node.replaceChild()‘s pre-replacement validation order (r249821)
  • Fixed parsing Access-Control-Expose-Headers correctly (r249946)
  • Fixed preserving Content-Type to be preserved on responses created from DOMCache (r249945)
  • Fixed Date.prototype.toJSON to properly execute (r249861)
  • Fixed HTMLVideoElement with a broken poster image to take a square dimension (r250100)
  • Fixed a case where the Intersection Observer intersection ratio becomes larger than 1 (r249845)
  • Fixed cropped dropdown shadow for <datalist> (r250260)
  • Fixed blocking insecure WebSocket URL connection attempts by Workers on secure pages (r250300)
  • Fixed posting a message to a redundant Service Worker should fail silently instead of throwing (r249781)
  • Improved CSP inheritance semantics (r250255)
  • Provided a prototype for AR QuickLook to trigger processing in the originating page (r249855)
  • Removed gopher from the list of special schemes in URLParser (r249941)

Web Inspector

  • Elements
    • Prevented showing the Changes details sidebar panel when selecting text nodes (r249788)
  • Resources
  • Canvas
    • Added GPUDevice content previews and device configuration (r249786)
    • Added a list of any GPURenderPipeline/GPUComputePipeline, including the content of any GPUShaderModule, for the associated GPUDevice (r250258)
  • Console
    • Added a “Show More” button to see more than 100 items in arrays / collections (r250087)
    • Decreased the amount of horizontal space used by autocompletion bubbles (r249848)
    • Improved autocompletion typing performance by avoiding global forced layouts (r249863)
  • Miscellaneous
    • Changed to allow undocked Web Inspector windows to be dragged around by the title text (r250063)

Accessibility

  • Exposed misspelling ranges for editable content to accessibility clients. (r249893)

Apple Pay

  • Added support for telling websites why a session was cancelled (r250048)
  • Cleaned up handling of summary items and payment method updates (r250179)

JavaScript

  • Added missing syntax errors for await in function parameter default expressions (r249925)

WebGPU

  • Ensured structs and arrays with pointers as fields are disallowed (r249787)
  • Removed null from the standard library (r249794)

Web Authentication

  • Added support for more than two FIDO protocol versions (r249927)

WebDriver

  • Fixed a bug that caused safaridriver —enable to not take effect immediately in some cases.

October 02, 2019 05:00 PM

Paulo Matos: A Brief Look at the WebKit Workflow

Igalia WebKit

As I learn about the workflow for contributing to JSC (the JavaScript Compiler) in WebKit, I took a few notes as I went along. However, I decided to write them as a post in the hope that they are useful for you as well. If you use git, a Unix based system, and want to start contributing to WebKit, keep on reading.

More…

By Paulo Matos at October 02, 2019 03:56 PM

September 23, 2019

Intelligent Tracking Prevention 2.3

Surfin’ Safari

Note: Read about past updates to this technology in other blog posts about Intelligent Tracking Prevention, the Storage Access API, and ITP Debug Mode.

Intelligent Tracking Prevention (ITP) version 2.3 is included in Safari on iOS 13, the iPadOS beta, and Safari 13 on macOS for Catalina, Mojave, and High Sierra.

Enhanced Prevention of Tracking Via Link Decoration

Our previous release, ITP 2.2, focused specifically on the abuse of so-called link decoration for the purposes of cross-site tracking. With ITP 2.2, when a webpage is navigated to from a domain classified by ITP and the landing URL has a query string or fragment, the expiry of persistent client-side cookies created on that page is 24 hours.

Unfortunately, we see continued abuse of link decoration, so ITP 2.3 takes two new steps to combat this.

Capped Lifetime For All Script-Writeable Website Data

Since ITP 2.2, several trackers have announced their move from first-party cookies to alternate first-party storage such as LocalStorage. ITP 2.3 counteracts this in the following way:

  1. website.example will be marked for non-cookie website data deletion if the user is navigated from a domain classified with cross-site tracking capabilities to a final URL with a query string and/or a fragment identifier, such as website.example?clickID=0123456789.
  2. After seven days of Safari use without the user interacting with a webpage on website.example, all of website.example’s non-cookie website data is deleted.

Together with ITP’s capped expiry of client-side cookies, this change removes trackers’ ability to use link decoration combined with long-term first-party website data storage to track users. Put differently, ITP 2.3 caps the lifetime of all script-writeable website data after a navigation with link decoration from a classified domain.

The reason why we cap the lifetime of script-writable storage is simple. Site owners have been convinced to deploy third-party scripts on their websites for years. Now those scripts are being repurposed to circumvent browsers’ protections against third-party tracking. By limiting the ability to use any script-writeable storage for cross-site tracking purposes, ITP 2.3 makes sure that third-party scripts cannot leverage the storage powers they have gained over all these websites.

document.referrer Downgraded To eTLD+1

Our research has found that trackers, instead of decorating the link of the destination page, decorate their own referrer URL and read the tracking ID through document.referrer on the destination page.

ITP 2.3 counteracts this by downgrading document.referrer to the referrer’s eTLD+1 if the referrer has link decoration and the user was navigated from a classified domain. Say the user is navigated from social.example to website.example and the referrer is https://sub.social.example/some/path/?clickID=0123456789. When social.example’s script on website.example reads document.referrer to retrieve and store the click ID, ITP will make sure only https://social.example is returned.

For further reading on misuse of the referrer and changes coming to browsers in general, see the WHATWG Fetch issue Limit the length of the Referer header.

Updates To the Storage Access API

Developer Enhancement Requests

Developers have asked us for two changes to the Storage Access API, and we’re happy to provide them in Safari on iOS 13 beta, iPadOS beta, and macOS Catalina beta:

  • Only consume the user gesture (tap or click) when the user explicitly denies access, i.e. when the user is prompted and picks “Don’t allow.” Previously the gesture was also consumed when the promise was rejected without a user prompt, i.e. when the requesting domain was classified by ITP and had not received user interaction as first-party website the last 30 days of Safari use. This meant the user had to tap or click again to be shown a popup to log in to the third-party service.
  • Make document.hasStorageAccess() return true when ITP is off. Developers were confused when document.hasStorageAccess() returned false but ITP was off. Now it returns true in that case. Note that a returned true does not guarantee that the third-party can set cookies since Safari’s default cookie policy is to deny cookies for third-parties if they don’t already have cookies.

User Enhancement Request

Users have asked us to cap the number of times they can get asked for storage access by a specific piece of embedded web content.

Some services are requesting storage access on every click or tap, regardless of previous interactions with the user. To counter such repeated prompting, WebKit’s implementation of the Storage Access API will now automatically reject the request for storage access for documents where the user has picked “Don’t Allow” in the prompt twice. The reason why we don’t cap it to a single prompt is that the user may change their mind when they see the result of picking “Don’t Allow” the first time.

We will restrict the API further if we get reports of continued over prompting.

ITP Debug Mode In Safari on macOS Catalina

Safari 13 on macOS includes ITP Debug Mode. It’s been available in Safari Technology Preview for quite some time but now it’s available in regular Safari too so that you can debug your websites with the same Safari your customers are using.

Below is a step-by-step guide on how to use ITP Debug Mode. Note that both menu placement and naming has changed from the earlier experimental feature.

If you prefer to use Console:

  1. Launch the Console app.
  2. Click the Action menu → Include Info Messages.
  3. Filter on “ITPDebug” without the quotes.

If you prefer to use Terminal:

log stream -info | grep ITPDebug

Now you’re ready to enable ITP Debug Mode and see log output.

  1. Enable the Develop menu through Safari Preferences → Advanced → “Show Develop menu in menu bar.”
  2. Click “Intelligent Tracking Prevention Debug Mode” in the Develop menu.
  3. When you’re done, disable ITP Debug Mode through the same Develop menu item or by quitting Safari. Don’t leave ITP Debug Mode on when you’re not using it since it logs potentially sensitive information about your browsing. (Logging is done on the ephemeral INFO level.)

With ITP Debug Mode enabled, you will see ITP messages in the log as you browse the web. Whenever ITP decides to schedule deletion of website data it will indicate in the log “all data” or “all but cookies” to tell you whether it’s a regular deletion of all website data or the new capped lifetime of all non-cookie website data as explained above.

The domain 3rdpartytestwebkit.org is permanently classified with tracking capabilities in ITP Debug Mode, so at minimum you should see that domain show up in your log.

Classifying A Custom Domain For Testing

With ITP Debug Mode and User Defaults, you can manually set a custom domain as permanently classified with tracking capabilities. Here’s how you achieve this for a domain called website.example:

  1. Open System Preferences, click Security & Privacy, and unlock the padlock to be able to make changes.
  2. Pick the Full Disk Access category to the left.
  3. Use the + button to add the Terminal application to the list and make sure its checkbox is ticked.
  4. Open Terminal and execute the following command: defaults write com.apple.Safari ITPManualPrevalentResource website.example
  5. Go back to Security & Privacy in System Preferences and untick the checkbox for Terminal to not allow it permanent full disk access.

A Note On HttpOnly Cookies

Our blog post on ITP 2.1 provided guidance on how to protect cookies. We specifically encourage the use of Secure and HttpOnly cookies.

Since publishing that post, we’ve seen some confusion regarding the term HttpOnly cookie, sometimes mistakingly shortened to just “HTTP cookie.” Some say any cookie set by a server is an HttpOnly cookie. That is incorrect. The server needs to add the HttpOnly attribute in the Set-Cookie response header to make a cookie be HttpOnly, like so:

Set-Cookie: ExampleSessionID=a66e30012cc49846; path=/; HttpOnly

Adding the HttpOnly attribute provides these two security and privacy protections:

  • HttpOnly cookies are not exposed to JavaScript. This means tracking scripts or cross-site scripting attacks on the website cannot read and leak the contents of those cookies.
  • HttpOnly cookies are not copied into the web content process in WebKit. This means they are out of reach for speculative execution attacks that could otherwise steal the contents of those cookies.

September 23, 2019 05:00 PM

September 18, 2019

Release Notes for Safari Technology Preview 92

Surfin’ Safari

Safari Technology Preview Release 92 is now available for download for macOS Mojave and the macOS Catalina beta. If you already have Safari Technology Preview installed, you can update in the Software Update pane of System Preferences on macOS.

This release covers WebKit revisions 249190-249750.

JavaScript

  • Fixed Math.round() which produced a wrong result for value prior to 0.5 (r249597)
  • Made Promise implementation faster (r249509)

WebGPU

  • Fixed matrices to have correct alignment (r249214)
  • Implemented GPUUncapturedErrorEvent (r249539)
  • Implemented SampleLevel(), SampleBias(), and SampleGrad() in WSL (r249453)
  • Updated several interface and enum names to match specifications (r249601)

SVG

  • Fixed fragment-only URL url(#fragment) to be resolved against the current document regardless of the HTML <base> element (r249416)
  • Fixed SMIL animations of SVG <view> element (r249730)
  • Made a change to get the value of the href attribute or the xlink:href attribute of SVG animation elements to get the animation target element (r249216)

Images

  • Changed to respect EXIF orientations by default when images are rendered (r249364)

Web API

  • Fixed copying and pasting two paragraphs with a newline between them resulting in a stray paragraph with newline inside (r249307)
  • Fixed cancelled transitions on Google image search leaving content with opacity: 0 sometimes (r249511)
  • Fixed document.fonts.ready resolving too quickly (r249295)
  • Fixed responseXML for XMLHttpRequest in some cases returning null if a valid Content-Type ends with +xml (r249361)
  • Made tabIndex IDL attribute reflect its content attribute (r249237)
  • Updated HTMLImageElement::decode() to return a resolved promise for decoding non-bitmap images (r249367)
  • Updated geolocation.watchPosition() and geolocation.getCurrentPosition() to return PERMISSION_DENIED when the context is not secure (r249207)

Service Workers

  • Added missing origin check for Service-Worker-Allowed header (r249733)
  • Added support for postMessage buffering between the Service Worker and window (r249629)
  • Dropped support for registration resurrection (r249627)

WebRTC

  • Added support to RTCDataChannel.send(Blob) (r249710)
  • Fixed audio sometimes failing to be captured in WebRTC (r249715)

IndexedDB

  • Changed to cache prepared SQLiteStatement in SQLiteIDBCursor to improve performance (r249729)
  • Changed to use the SQL COUNT statement for count operations to improve performance (r249583)
  • Updated the size of the database when the database operation is completed (r249333)

Web Inspector

  • Network
    • Provided a way to view XML, HTML, and SVG resource responses as a DOM tree (r249451)
  • Debugger
    • Added support for async event listener stack traces in Workers (r249315)
    • Added support for event breakpoints in Worker contexts (r249305)
    • Allow script resources to be blackboxed, which will prevent the debugger from pausing in that script (r249450)
  • Resources
    • Provide a way to override the content of resources loaded over the network with local content in Web Inspector (r249504)
    • Fixed issue where links to CSS resources didn’t map to the right line after pretty printing if the line is after a multiline comment (r249596)
    • Fixed issue where the closing } of nested @media weren’t indented (r249607)
  • Dark Mode
    • Fixed jarring white box-shadows in the Overview Timeline View in dark mode (r249655)
  • Miscellaneous
    • Fixed import file pickers sometimes not importing (r249248)

Accessibility

  • Fixed children cache to be re-computed if the tab index is removed (r249534)

Security

  • Disabled TLS 1.0 and TLS 1.1 in WebSockets (r249684)

September 18, 2019 05:30 PM