Instead of mirroring the browser's viewport, as if we had a camera over
the browser, the entire DOM is now sent in the frame. This means that
the CLI itself can scroll without having to wait for updates from the
webextension screenshotter.
In the endless struggle to squash a Travis-specific bug, we now have
screenshots. Currently the screenshots are of the custom block font
state, so you can't read the text. But I'm planning on changing the
default state of the page to use a normal font - such that the
frame builder needs to toggle the block font on to build a frame. This
will make it more likely that screenshots will contain actual text.
This means that Browsh can now be entirely run just by running the CLI
binary. The client launches Firefox as a subprocess, then connects to it
via the Marionette protocol, installs the webextension and finally
triggers a new tab with, currently, the Google homepage in it.
I was trying to set this up for automated testing as well by installing
the built webextension as a temporary addon, because otherwise you need
to sign the extension everytime with a unique semantic version. However
for some reason I can't quite recreate the environment that MDN's
`web-ext` creates. The extension installs fine but fails to load the
`content.js` script, I can't find a backtrace or any other details about
the failure. So for now, we're just going to have to use `web-ext` as
seperate process and have the client connect to that. Which is what one
should do during development anyway, so it's not a huge loss.
Using JS's `getComputedStyle()` for every character is too CPU
intensive, so instead I'm experimenting with using a custom font
to take the canvas snapshot. The font is made up of only the unicode
block character, which basically fills the entire space given to a
monospace glyph. This also means that we can fairly reliably work out
the visibility (whether it's obscured or hidden with CSS) of text.
This proves that frames can be generated on Firefox using the canvas and
a Tree Walker to examine text nodes. Already with little optimisation
frames don't ever take longer than 200ms to render.
Chrome has a MediaStream of the viewport, hopefully that will prove
performant as well.
This doesn't have functioning text colour detection or text occlusion
support. But early research suggests this will possible by comparing 2
screenshots: one with and the other without rendered text.