Commit graph

41 commits

Author SHA1 Message Date
Hendiadyoin1 ed46d52252 Everywhere: Use AK/Math.h if applicable
AK's version should see better inlining behaviors, than the LibM one.
We avoid mixed usage for now though.

Also clean up some stale math includes and improper floatingpoint usage.
2021-07-19 16:34:21 +04:30
Wesley Moret 1b8f73b6b3 LibPDF: Fix treating not finding the linearized dict as a fatal error
We now try to parse the first indirect value and see 
if it's the `Linearization Parameter Dictionary`. if it's not, we 
fallback to reading the xref table from the end of the document
2021-07-16 20:44:10 +02:00
Wesley Moret 5d4d70355e LibPDF: Fix checking minor_ver instead of major_ver 2021-07-16 20:44:10 +02:00
Matthew Olsson 612b183703 LibPDF: Convert to east-const to comply with the recent style changes 2021-06-12 22:45:01 +04:30
Matthew Olsson 0a4d8ef98d LibPDF: Bake the flipped y-axis directly into the CTM matrix 2021-06-12 22:45:01 +04:30
Matthew Olsson 449ef14895 LibPDF: Avoid calculating rendering matrix for every glyph 2021-06-12 22:45:01 +04:30
Matthew Olsson c142dadbe8 LibPDF: Handle the TJ graphical operator 2021-06-12 22:45:01 +04:30
Matthew Olsson 47531619e3 LibPDF: Handle the gs graphical operator 2021-06-12 22:45:01 +04:30
Matthew Olsson 006f5498de LibPDF: Add support for the CalRGB ColorSpace
This isn't tested all that well, as the PDF I am testing with only uses
it for black (which is trivial). It can be tested further when LibPDF
is able to process more complex PDFs that actually use this color space
non-trivially.
2021-06-12 22:45:01 +04:30
Matthew Olsson 7b4e36bf88 LibPDF: Split ColorSpace into a different class for each color space
While unnecessary at the moment, this will allow for more fine-grained
control when complex color spaces get added.
2021-06-12 22:45:01 +04:30
Matthew Olsson ea3abb14fe LibPDF: Parse hint tables
This code isn't _actually_ used as of right now, but I wrote it at the
same time as all of the code in the previous commit. I realized after
I wrote it that these hint tables aren't super useful if the parser
already has access to the full file. However, this will be useful if
we ever want to stream PDFs from the web (and possibly view them in
the browser).
2021-06-12 22:45:01 +04:30
Matthew Olsson e23bfd7252 LibPDF: Parse linearized PDF files
This is a big step, as most PDFs which are downloaded online will be
linearized. Pretty much the only difference is that the xref structure
is slightly different.
2021-06-12 22:45:01 +04:30
Matthew Olsson be1be47613 LibPDF: Fix two parser bugs
- A newline was assumed to follow the "stream" keyword, when it can also
  be a windows-style line break
- Fix not consuming the "endobj" at the end of every indirect object
2021-06-12 22:45:01 +04:30
Matthew Olsson 78bc9d1539 LibPDF: Refine the distinction between the Document and Parser
The Parser should hold information relevant for parsing, whereas the
Document should hold information relevant for displaying pages.
With this in mind, there is no reason for the Document to hold the
xref table and trailer. These objects have been moved to the Parser,
which allows the Parser to expose less public methods (which will be
even more evident once linearized PDFs are supported).
2021-06-12 22:45:01 +04:30
Matthew Olsson cafd7c11b4 LibPDF: Account for inverted y axis when rendering text 2021-06-12 22:45:01 +04:30
Matthew Olsson 1ef5071d1b LibPDF: Harden the document/parser against errors 2021-06-12 22:45:01 +04:30
Matthew Olsson d654fe0e41 LibPDF: Differentiate Value's null and empty states 2021-06-12 22:45:01 +04:30
Ali Mohammad Pur 51c2c69357 AK+Everywhere: Disallow constructing Functions from incompatible types
Previously, AK::Function would accept _any_ callable type, and try to
call it when called, first with the given set of arguments, then with
zero arguments, and if all of those failed, it would simply not call the
function and **return a value-constructed Out type**.
This lead to many, many, many hard to debug situations when someone
forgot a `const` in their lambda argument types, and many cases of
people taking zero arguments in their lambdas to ignore them.
This commit reworks the Function interface to not include any such
surprising behaviour, if your function instance is not callable with
the declared argument set of the Function, it can simply not be
assigned to that Function instance, end of story.
2021-06-06 00:27:30 +04:30
Andreas Kling 12a42edd13 Everywhere: codepoint => code point 2021-06-01 10:01:11 +02:00
Matthew Olsson 78f3bad7e6 LibPDF: Pre-initialize common FlyStrings in CommonNames.h 2021-05-25 00:24:09 +04:30
Matthew Olsson 67b65dffa8 LibPDF: Handle string encodings
Strings can be encoded in either UTF16-BE or UTF8. In either case,
there are a few initial bytes which specify the encoding that must
be checked and also removed from the final string.
2021-05-25 00:24:09 +04:30
Matthew Olsson a08922d2f6 LibPDF: Parse outline structures 2021-05-25 00:24:09 +04:30
Matthew Olsson be6e4b6f3c LibPDF: Store indirect value refs in Value objects
IndirectValueRef is so simple that it can be stored directly in the
Value class instead of being heap allocated.

As the comment in Value says, however, in theory the max bits needed to
store is 48 (16 for the generation index and 32(?) for the object
index), but 32 should be good enough for now. We can increase it to u64
later if necessary.
2021-05-25 00:24:09 +04:30
Matthew Olsson 534a2e95d2 LibPDF: Add basic color space support to the renderer
This commit only supports the three most basic color spaces:
DeviceGray, DeviceRGB, and DeviceCMYK
2021-05-25 00:24:09 +04:30
Matthew Olsson f2d2f3fae7 LibPDF: Add a very poor path clipping implementation
This completely ignores the actual path and just uses its bounding box,
since our painter doesn't support clipping to paths.
2021-05-25 00:24:09 +04:30
Matthew Olsson bf96ad674c LibPDF: Implement stubs for all graphical commands 2021-05-25 00:24:09 +04:30
Matthew Olsson 477e3946e5 LibPDF: Add support for stream filters
This commit also splits up StreamObject into PlainTextStreamObject and
EncodedStreamObject, which is essentially just a stream object which
does not own its bytes vs one which does.
2021-05-25 00:24:09 +04:30
Matthew Olsson 97cc482087 LibPDF: Make Reader::dump_state a bit more readable 2021-05-25 00:24:09 +04:30
Matthew Olsson 8c7ebc7a3f LibPDF: Do not assume value is an object in parse_indirect_value 2021-05-25 00:24:09 +04:30
Matthew Olsson d5f94aaa7b LibPDF/PDFViewer: Support rotated pages 2021-05-18 16:35:23 +02:00
Matthew Olsson f7ea1eb610 Applications: Add a very simple PDFViewer 2021-05-18 16:35:23 +02:00
Matthew Olsson 309105678b LibPDF: Fix bad resolve_to calls in Document
We can't call resolve_to with IndirectValue{,Ref}, since the values
will obviously be resolved, and will not give us the object of the
correct type.
2021-05-18 16:35:23 +02:00
Matthew Olsson 4479c1bff0 LibPDF: Add a bitmap renderer
This commit adds the Renderer class, which is responsible for rendering
a page into a Gfx::Bitmap. There are many improvements to make here,
but this is a great start!
2021-05-18 16:35:23 +02:00
Matthew Olsson d6a9b41bac LibPDF: Parse page crop box and user units 2021-05-18 16:35:23 +02:00
Matthew Olsson 101639e526 LibPDF: Parse graphics commands 2021-05-18 16:35:23 +02:00
Matthew Olsson 03649f85e2 LibPDF: Don't rely on a stream's /Length key existing
Some PDFs omit this key apparently, but Firefox opens them fine. Let's
emulate that behavior.
2021-05-18 16:35:23 +02:00
Matthew Olsson 2f0a2865f2 LibPDF: Give Parser a reference to the Document
The Parser will need to call resolve_to on certain values.
2021-05-18 16:35:23 +02:00
Matthew Olsson 3aeaceb727 LibPDF: Parse nested Page Tree structures
We now follow nested page tree nodes to find all of the actual
page dicts, whereas previously we just assumed the root level
page tree node contained all of the page children directly.
2021-05-10 10:32:39 +02:00
Matthew Olsson 8c745ad0d9 LibPDF: Parse page structures
This commit introduces the ability to parse the document catalog dict,
as well as the page tree and individual pages. Pages obviously aren't
fully parsed, as we won't care about most of the fields until we
start actually rendering PDFs.

One of the primary benefits of the PDF format is laziness. PDFs are
not meant to be parsed all at once, and the same is true for pages.
When a Document is constructed, it builds a map of page number to
object index, but it does not fetch and parse any of the pages. A page
is only parsed when a caller requests that particular page (and is
cached going forwards).

Additionally, this commit also adds an object_cast function which
logs bad casts if DEBUG_PDF is set. Additionally, utility functions
were added to ArrayObject and DictObject to get all types of objects
from the collections to avoid having to manually cast.
2021-05-10 10:32:39 +02:00
Matthew Olsson 72f693e9ed LibPDF: Add a basic parser and Document structure
This commit adds a parser as well as the Reader class, which serves
as a utility to aid in reading the PDF both forwards and in reverse.
The parser currently is capable of reading xref tables, as well as
all values. We don't really do anything with any of this information,
however.
2021-05-10 10:32:39 +02:00
Matthew Olsson a8f5b6aaa3 LibPDF: Create basic object structure
This commit is the start of LibPDF, and introduces some basic structure
objects. This emulates LibJS's Value structure, where Value is a simple
class that can contain a pointer to a more complex Object class with
more data. All of the basic PDF objects have a representation.
2021-05-10 10:32:39 +02:00