WPT on CDP

Researching the use of the Chrome DevTools protocol to run the web-platform-tests in Google Chrome.

2018-12-11

(press the p key to view presenter's notes)

This presentation is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

1 / 33

Automating Chrome in web-platform-tests today:

wptrunner <-- WebDriver --> Chromedriver <-------- CDP ---------> Chrome

The goal of this experiment:

wptrunner <--------------------- CDP ---------------------------> Chrome

2 / 33

Challenges
Implementation overview
Results
Future work
3 / 33

Finding a library

Chrome Debugger Protocol bindings:

WPT's constraints:

Operating systems: GNU/Linux, macOS, and Windows
Platform: Python 2
License: BSD 3-clause compatible
functional integration with the WebSocket protocol supported by Chromium/Chrome (e.g. version and protocol extensions)

4 / 33

As of October 2018 (when research and development began), none of the libraries satisfied the constraints.

Finding a library

Generic WebSocket clients implementations:

Lomond
pywebsocket - "pywebsocket is intended for testing or experimental purposes."
wspy - client too primitive to interface with Chrome
WebSock - does not expose a client abstraction
gevent-websocket - Last released: Mar 12, 2017
ws4py - unmaintained
Autobahn & Twisted (Autobahn also integrates with asyncio)
websockets - requires Python 3
websocket-client - LGPL licensed
Tornado

5 / 33

In the Python ecosystem, WebSocket server implementations abound. Clients are harder to come by.

Finding a library

Generic WebSocket clients implementations satisfying WPT's constraints:

Subjective priorities:

probability of support/maintenance
number and size of dependencies

Initial experimentation: https://github.com/bocoup/wpt-cdp-experiment

Winner: Lomond

6 / 33

Finding a library

Polling

Lomond checks for automatic pings and performs other housekeeping tasks at a regular intervals. This polling is exposed as Poll events. Your application can use these events to do any processing that needs to be invoked at regular intervals.

The default poll rate of 5 seconds is granular enough for Lomond’s polling needs, while having negligible impact on CPU. If your application needs to process at a faster rate, you may set the poll parameter of connect().

Note If your application needs to be more realtime than polling once a second, you should probably use threads in tandem with the event loop.

https://lomond.readthedocs.io/en/latest/guide.html#polling

7 / 33

Lomond uses a polling-based approach by design, making it less efficient than a typical WebSocket client.

Finding a library

Runner-up: wspy

wspy is a standalone implementation of web sockets for Python, defined by RFC 6455. The incentive for creating this library is the absence of a layered implementation of web sockets outside the scope of web servers such as Apache or Nginx. wspy does not require any third-party programs or libraries outside Python's standard library. It provides low-level access to sockets, as well as high-level functionalities to easily set up a web server. Thus, it is both suited for quick server programming, as well as for more demanding applications that require low-level control over each frame being sent/received.

https://github.com/taddeus/wspy

8 / 33

The wspy library was disqualified because it doesn't support the per-message deflate WebSocket extension.

API stability

Ideal:

wptrunner <-- WebDriver --> Chromedriver <-------- CDP ---------> Chrome
wptrunner <--------------------- CDP ---------------------------> Chrome

9 / 33

The goal of this project was to remove Chromedriver and use CDP in its place. From that perspective, it was encouraging to know that Chromedriver is itself implemented in CDP. That made the task seem much more like eliminating a middleman.

API stability

Ideal:

wptrunner <-- WebDriver --> Chromedriver <-------- CDP ---------> Chrome
wptrunner <--------------------- CDP ---------------------------> Chrome

Actual target:

                                           .--- stable CDP ---.
wptrunner <-- WebDriver --> Chromedriver <-+ experimental CDP +-> Chrome
                                           '- deprecated CDP -'
wptrunner <----------------- stable CDP ------------------------> Chrome

10 / 33

The reality is a little more nuanced, though. When it comes to coordinating with changes to the Chrome DevTools Protocol, the Chromedriver maintainers have an advantage on WPT. This affords them more confidence in their use of APIs that are labeled "experimental" or "deprecated" for public consumption.

I tried to rely on the stable API alone, but I wasn't always successful. I'll say more about that shortly.

Semantics

WebDriver: Navigate To

If the current top-level browsing context is no longer open, return error with error code no such window.
Let url be the result of getting the property url from the parameters argument.
If url is not an absolute URL or is not an absolute URL with fragment or not a local scheme, return error with error code invalid argument.
Handle any user prompts and return its value if it is an error.
Let current URL be the current top-level browsing context’s active document’s document URL.
If current URL and url do not have the same absolute URL:

If timer has not been started, start a timer. If this algorithm has not completed before timer reaches the session's session page load timeout in milliseconds, return an error with error code timeout.

Navigate the current top-level browsing context to url.
If url is special except for file and current URL and URL do not have the same absolute URL :

Try to wait for navigation to complete.
Try to run the post-navigation checks.

Set the current browsing context to the current top-level browsing context.
If the current top-level browsing context contains a refresh state pragma directive of time 1 second or less, wait until the refresh timeout has elapsed, a new navigate has begun, and return to the first step of this algorithm.
Return success with data null.

source (retrieved 2018-12-10)

11 / 33

Semantics

Chrome DevTools Protocol: Page.navigate

Navigates current page to the given URL.

source (retrieved 2018-12-10)

12 / 33

From the description, you might expect that after sending this command and receiving a response, the navigation operation would be complete. That's what I expected, anyway.

That's actually not the case, which is why the comparison isn't completely fair. CDP allows for inspection into many parts of navigation. In fact, it requires it.

diff --git a/tools/wptrunner/wptrunner/executors/reftest-wait_webdriver.js b/tools/wptrunner/wptrunner/executors/reftest-wait_webdriver.js
index c1cc649..f0ba2bc 100644
--- a/tools/wptrunner/wptrunner/executors/reftest-wait_webdriver.js
+++ b/tools/wptrunner/wptrunner/executors/reftest-wait_webdriver.js
@@ -1,6 +1,11 @@
 var callback = arguments[arguments.length - 1];
 function root_wait() {
+  if (document.readyState != "complete") {
+    setTimeout(root_wait, 10);
+    return;
+  }
+
   if (!root.classList.contains("reftest-wait")) {
     observer.disconnect();
@@ -37,8 +42,4 @@ var observer = new MutationObserver(root_wait);
 observer.observe(root, {attributes: true});
-if (document.readyState != "complete") {
-    onload = root_wait;
-} else {
-    root_wait();
-}
+root_wait();

13 / 33

This is a change to a script which wptrunner injects into the document. Today, it does that using WebDriver, and everything works as expected.

In my initial implementation, navigation was occurring far earlier than I expected. This change allows the script to be injected prior to the window's load event. (It appears to support this already, but the mechanism it uses invalidates all tests which set a handler via the document's <body> element.)

-var root = document.documentElement;
-var observer = new MutationObserver(root_wait);
-
-observer.observe(root, {attributes: true});
-
-root_wait();
+var root, observer;
+
+(function begin() {
+  root = document.documentElement;
+
+  // This script may be evaluated before the document element is available.
+  if (!root) {
+    setTimeout(begin, 0);
+    return;
+  }
+  observer = new MutationObserver(root_wait);
+
+  observer.observe(root, {attributes: true});
+
+  root_wait();
+}());

14 / 33

Shortly after, I found that the script was executing even sooner than that. Tests were failing intermittently because the document.documentElement was not defined.

I actually applied this patch and ran the tests before thinking that maybe I was doing something wrong.

Semantics

Chrome DevTools Protocol: Navigation-related events

Page.domContentEventFired

Page.frameScheduledNavigation - Fired when frame schedules a potential navigation.

Page.frameStartedLoading - Fired when frame has started loading.

Page.frameStoppedLoading - Fired when frame has stopped loading.

Page.lifecycleEvent - Fired for top level page lifecycle events such as navigation, load, paint, etc.

Page.loadEventFired

Page.navigatedWithinDocument - Fired when same-document navigation happens, e.g. due to history API usage or anchor navigation.

15 / 33

Chrome DevTools Protocol also defines a bunch of "events," many of which are relevant to navigation.

To correctly implement the common case of "go to this URL and let me know if anything goes wrong," one must identify the "active frame," register for many events fired from that frame, initiate navigation, and wait for an event for some duration before giving up and reporting the operation as "timed out." This was what I attempted to implement for this experiment, but I almost certainly got it wrong.

In WebDriver, all of this is modeled with a single HTTP request.

Chrome DevTools is certainly more powerful, but this comes at the cost of complexity. Although it might be possible to improve test precision with some of this functionality, I'd be leery of relying on the timing of messages on a WebSocket channel.

For WPT, I would recommend using a library.

This is also a very indirect way to document a protocol. Even with the explicit naming of these events, the precise semantics probably aren't clear enough to support interoperability.

Challenges
Implementation
Results
Future work
16 / 33

Architecture overview

Concepts

browser - web browser process which runs a WebSocket server that implements the Chrome Debugger Protocol; this process is managed by the user of Pyppeteer
pyppeteer.Connection - abstraction around a WebSocket connection to a running browser process
pyppeteer.Session - interface for interacting with a browser window

                    .---------------- Pyppeteer ---------------.
    .---------.     |     .------------.           .---------. |
    | browser | 1 <---> * | Connection | * <---> 1 | Session | |
    '---------'     |     '------------'           '---------' |
                    '------------------------------------------'

Code hosted on GitHub.com: https://github.com/bocoup/wpt/tree/wptrunner-cdp

Change set:

15 files changed, 1042 insertions(+), 103 deletions(-)

17 / 33

API usage

$ grep -Ehro 'API status:.*' tools/pyppeteer/pyppeteer | sort | uniq -c
      3 API status: deprecated
      5 API status: experimental
     28 API status: stable

18 / 33

Each CDP method and event reference in the source code is labeled with a note on the API status at the time of writing. As mentioned earlier, I was mostly able to stick to the stable API. One piece of functionality could be implemented with either a deprecated API or an experimental API, so I used both and exposed runtime a flag to control which is used.

Challenges
Implementation
Results
Future work
19 / 33

Settings

All trials:

Used WPT at commit 8eab58f51c93f0075f4cc5e8e6d5b4fb2c4c4919
- Date: 2018-12-10
- Total tests: 31,101
Used the wpt-docker-worker (only available from tasks initiated from the web-platform-tests/wpt repository)
Did not restart after unexpected test results

20 / 33

Discrepancies

master to CDP:

361 tests with differing statuses
35 tests with differing subtest results

(361 + 35) / 31101 => 1.27% discrepancy in patch

21 / 33

Surprisingly, the results of the CDP-powered versoin of wptrunner did not completely align with those in master.

Discrepancies

master to CDP:

361 tests with differing statuses
35 tests with differing subtest results

(361 + 35) / 31101 => 1.27% discrepancy in patch

master to itself:

21 tests with differing statuses
9 tests with differing subtest results

(21 + 9) / 31101 => 0.10% discrepancy due to flakiness

22 / 33

Surprisingly, the results of the CDP-powered versoin of wptrunner did not completely align with those in master.

A small portion of those can be explained by existing flakiness in the tests. If anyone's bored, I've included a list of the flaky tests in the appendix.

Discrepancies

master to CDP:

361 tests with differing statuses
35 tests with differing subtest results

(361 + 35) / 31101 => 1.27% discrepancy in patch

master to itself:

21 tests with differing statuses
9 tests with differing subtest results

(21 + 9) / 31101 => 0.10% discrepancy due to flakiness

revert-gh-13419 (gh-13419) to CDP:

64 tests with differing statuses
32 tests with differing subtest results

(64 + 32) / 31101 => 0.31% discrepancy in patch (adjusted)

23 / 33

Surprisingly, the results of the CDP-powered versoin of wptrunner did not completely align with those in master.

A small portion of those can be explained by existing flakiness in the tests. If anyone's bored, I've included a list of the flaky tests in the appendix.

An infrastructure patch merged in October introduced a number of regressions. Many of the discrepancies identified above are unintentional fixes to those regressions. If we compare the results to a version of master where that patch is reverted, we get a clearer picture of undesirable differences.

Further research is needed to determine the source of these discrepancies. I suspect it's due to other oversimplifications in my use of the protocol.

Duration

master:

Testharness: 17.88 minutes (σ = 8.11)
Reftest: 12.16 minutes (σ = 5.29)

revert-gh-13419 (gh-13419):

Testharness: 12.07 minutes (σ = 4.33)
Reftest: 11.63 minutes (σ = 4.90)

CDP:

Testharness: 12.65 minutes (σ = 4.58)
Reftest: 7.81 minutes (σ = 3.47)

24 / 33

These are the average durations needed to run a "chunk" of each test type.

The current implementation appears to be faster at running reftests, but remember it is built on a WebSocket library that uses a polling strategy. These numebrs may change with the use of another WebSocket library.

Test improvements

gh-14109: [paint-timing] Fix test for non-automated contexts

gh-14110: [paint-timing] Avoid race condition

gh-14132: [FileAPI] Remove reference to non-existent file

25 / 33

Challenges
Implementation
Results
Future work
26 / 33

Future work

Contribute support for the per-message deflate WebSocket extension to wspy; switch to wspy

More efficient use of transport -- potentially faster and more stable
Small time/effort investment (presumably)

27 / 33

issue report for the wspy project

28 / 33

This would involve contributing a feature to wspy. That would have the added benefit of helping @jiagangzhang.

Future work

Migrate wptrunner to Python 3 and build on top of a more ergonomic library.

29 / 33

Python 3 compatibility has been discussed before because it would be beneficial for a number of other reasons.

A list of pull requests concerning Python 3 support

30 / 33

This effort is already underway as a pet project by long-time WPT contributor @Ms2ger

Future work

Work with Chrome DevTools maintainers to nail down the semantics of the most stable methods.

31 / 33

Summary

Experiment a success!

Proved concept
Suggested potential improvement in efficiency (decreased time-to-results)
Rough edges with tractable path to resolution (~100 tests invalidated)
Identified risk for production use: complexity and stability (could be mitigated through the use of a CDP-aware library)

https://github.com/bocoup/wpt/tree/wptrunner-cdp

32 / 33

Appendix: Flaky tests identified during this research

/client-hints/accept_ch_lifetime_same_origin_iframe.tentative.https.html
/content-security-policy/inheritance/blob-url-in-child-frame-self-navigate-inherits.sub.html
/cookie-store/cookieStore_delete_arguments.tentative.https.window.html
/cookie-store/cookieStore_delete_basic.tentative.https.window.html
/cookie-store/cookieStore_event_basic.tentative.https.window.html
/cookie-store/cookieStore_get_arguments.tentative.https.window.html
/cookie-store/cookieStore_get_set_basic.tentative.https.window.html
/cookies/samesite/form-get-blank-reload.html
/cookies/samesite/form-post-blank-reload.html
/fetch/api/request/destination/fetch-destination-worker.https.html
/html/webappapis/scripting/events/compile-event-handler-settings-objects.html
/paint-timing/first-image-child.html
/service-workers/cache-storage/common.https.html
/service-workers/cache-storage/window/sandboxed-iframes.https.html
/service-workers/service-worker/controller-with-no-fetch-event-handler.https.html
/service-workers/service-worker/fetch-event-redirect.https.html
/service-workers/service-worker/navigate-window.https.html
/storage/storagemanager-persist.https.window.html
/storage/storagemanager-persisted.https.any.html
/webvtt/rendering/cues-with-video/processing-model/audio_has_no_subtitles.html
/xhr/send-after-setting-document-domain.htm

↑, ←, Pg Up, k	Go to previous slide
↓, →, Pg Dn, Space, j	Go to next slide
Home	Go to first slide
End	Go to last slide
Number + Return	Go to specific slide
b / m / f	Toggle blackout / mirrored / fullscreen mode
c	Clone slideshow
p	Toggle presenter mode
t	Restart the presentation timer
?, h	Toggle this help

WPT on CDP

Finding a library

Finding a library

Finding a library

Finding a library

Finding a library

API stability

API stability

Semantics

Semantics

Semantics

Architecture overview

API usage

Settings

Discrepancies

Discrepancies

Discrepancies

Duration

Test improvements

Future work

Future work

Future work

Summary

Appendix: Flaky tests identified during this research

Help