Commit graph

631 commits

Author SHA1 Message Date
Michel Hermier 682f89d5bc LibC: Allow multiple includes of <assert.h>
ISO C requires in section 7.2:
The assert macro is redefined according to the current state of NDEBUG
each time that <assert.h> is included.

Also add tests for `assert` multiple inclusion accordingly.
2021-12-23 17:53:46 -08:00
Xavier Defrang 9e97823ff8 AK: Add convert_to_uint_from_octal 2021-12-21 13:13:04 -08:00
davidot 154ed3994c LibRegex: Parse capture group names according to the ECMA262 spec 2021-12-21 14:04:23 +01:00
davidot 733a70671b LibRegex: Disallow duplicate named capture groups in ECMA262 parser 2021-12-21 14:04:23 +01:00
Liav A 5a649d0fd5 Kernel: Return EINVAL when specifying -1 for setuid and similar syscalls
For setreuid and setresuid syscalls, -1 means to set the current
uid/euid/gid/egid value, to be more convenient for programming.
However, for other syscalls where we pass only one argument, there's no
justification to specify -1.

This behavior is identical to how Linux handles the value -1, and is
influenced by the fact that the manual pages for the group of one
argument syscalls that handle ID operations is ambiguous about this
topic.
2021-12-20 11:32:16 +01:00
Michel Hermier 7a44c11378 LibTest: Add EXPECT_NO_CRASH 2021-12-19 14:22:06 -08:00
Michel Hermier 4c6e826c05 LibTest: Add EXPECT_CRASH_WITH_SIGNAL 2021-12-19 14:22:06 -08:00
Michel Hermier c22c1900c0 Tests: Add test for raise 2021-12-19 14:22:06 -08:00
Michel Hermier 0ec35d6d81 Tests: Add test for assert 2021-12-19 14:22:06 -08:00
Michel Hermier 181f759dc9 Tests: Add test for abort 2021-12-19 14:22:06 -08:00
Nick Johnson 548529ace4 AK: Add BuiltinWrappers.h
The goal of this file is to enable C++ overloaded functions for
standard builtin functions that we use. It contains fallback
implementations for systems that do not have the builtins available.
2021-12-18 23:36:08 +01:00
Andreas Kling a409b832fa AK: Make JsonValue::from_string("") return a null JsonValue
This unbreaks the /var/run/utmp system which starts out as an empty
string, and is then turned into an object by the first update.

This isn't necessarily the best way for this to work, but it's how
it used to work, so this just fixes the regression for now.
2021-12-16 22:48:17 +01:00
sin-ack 2341b0159a Tests: Implement tests for the Serenity Stream API 2021-12-16 22:21:35 +03:30
Ben Wiederhake 208d85e707 AK+Tests: Use less space in ErrorOr 2021-12-16 09:32:51 +01:00
Ali Mohammad Pur d2e51fafa9 LibRegex: Merge alternations based on blocks and not instructions
The instructions can have dependencies (e.g. Repeat), so only unify
equal blocks instead of consecutive instructions.
Fixes #11247.

Also adds the minimal test case(s) from that issue.
2021-12-15 19:36:45 +03:30
Sahan Fernando 2c43eaa50c LibCore: Add support for range-based for loops on LineIterators 2021-12-12 14:06:38 +03:30
Idan Horowitz 8d3faecd9b Tests: Add tests for sigwait/sigwaitinfo/sigtimedwait 2021-12-12 08:34:19 +02:00
Ben Wiederhake 18ae5ede88 LibCrypto+Tests: Avoid implicitly copying ByteBuffer 2021-12-08 09:46:13 -08:00
Ben Wiederhake 768b70cc4d AK+Tests: Avoid implicitly copying ByteBuffer 2021-12-08 09:46:13 -08:00
Sam Atkins 9e3a786a64 Tests: Cast unused smart-pointer return values to void 2021-12-05 15:31:03 +01:00
Jan de Visser c369626ac1 LibSQL: Gracefully react to unimplemented valid SQL
Fixes a crash that was caused by a syntax error which is difficult to
catch by the parser: usually identifiers are accepted in column lists,
but they are not in a list of column values to be inserted in an INSERT.

Fixed this by putting in a heuristic check; we probably need a better
way to do this.

Included tests for this case.

Also introduced a new SQL Error code, `NotYetImplemented`, and return
that instead of crashing when encountering unimplemented SQL.
2021-12-04 20:49:22 +03:30
Jan de Visser 001949d77a LibSQL: Improve error handling
The handling of filesystem level errors was basically non-existing or
consisting of `VERIFY_NOT_REACHED` assertions. Addressed this by
* Adding `open` methods to `Heap` and `Database` which return errors.
* Changing the interface of methods of these classes and clients
downstream to propagate these errors.

The constructors of `Heap` and `Database` don't open the underlying
filesystem file anymore.

The SQL statement handlers return an `SQLErrorCode::InternalError`
error code if an error comes back from the lower levels. Note that some
of these errors are things like duplicate index entry errors that should
be caught before the SQL layer attempts to actually update the database.

Added tests to catch attempts to open weird or non-existent files as
databases.

Finally, in between me writing this patch and submitting the PR the
AK::Result<Foo, Bar> template got deprecated in favour of ErrorOr<Foo>.
This resulted in more busywork.
2021-12-04 20:49:22 +03:30
Idan Horowitz 246255527a Tests: Add a test to ensure sigaltstack() is working correctly 2021-12-01 21:44:11 +02:00
Timothy Flynn 7e6ad172a4 LibUnicode: Support code point names that apply to ranges of code points
For example, consider the following adjacent entries in UnicodeData.txt:

    3400;<CJK Ideograph Extension A, First>;Lo;0;L;;;;;N;;;;;
    4DBF;<CJK Ideograph Extension A, Last>;Lo;0;L;;;;;N;;;;;

Our current implementation would assign the display name "CJK Ideograph
Extension A" to code points U+3400 & U+4DBF, but not to the code points
in between. Not only should those code points be assigned a name, but
the Unicode spec also has formatting rules on what the names should be
(the names for these ranged code points are not as they appear in
UnicodeData.txt).

The spec also defines names for code point ranges that actually are
listed individually in UnicodeData.txt. For example:

    2F800;CJK COMPATIBILITY IDEOGRAPH-2F800;Lo;0;L;4E3D;;;;N;;;;;
    2F801;CJK COMPATIBILITY IDEOGRAPH-2F801;Lo;0;L;4E38;;;;N;;;;;
    2F802;CJK COMPATIBILITY IDEOGRAPH-2F802;Lo;0;L;4E41;;;;N;;;;;

Code points are only coalesced into a range if all fields after the name
are equivalent. Our parser will insert the range and its name formatting
pattern when it comes across the first code point in that range, then
ignore other code points in that range. This reduces the number of names
we generated by nearly 2,000.
2021-11-30 11:24:02 +01:00
Arne Elster cdaa179eeb LibCore: Fix relative seeking in IODevice
The recently introduced read buffer in IODevice broke relative seeking.
The amount of data in the buffer wouldn't get taken into account.
2021-11-30 10:51:10 +01:00
Hendiadyoin1 7a27ecc135 Tests: Add a simple LibGL render-test
At the moment we just check if we *can* render a simple triangle, we do
not yet actually test if the image is indeed the triangle we wanted.

This test also outputs the rendered image when GL_DEBUG is enabled to a
file called "picture.bmp" for manual verification.

Co-authored-by: sunverwerth <s.unverwerth@serenityos.org>
2021-11-29 23:17:05 +03:30
Brian Gianforcaro 95c0ec9afc Tests: Fix TestLibCoreArgsParser with add_positional_argument API change
Since we no longer populate a Vector<String> the lifetime of the strings
in all of these tests is now messed up, as the Vector<StringView> now
points to free'd memory.

We attempt to fix this for the unit tests, by saving the results in a
RAII type that should live as long as the test wants to validate some
output of the ArgParser.
2021-11-26 18:57:26 -08:00
Andreas Kling f1cc3d0fc4 Userland: Use Core::ArgsParser's Vector<StringView> API everywhere
...and remove the Vector<String> variant since there are no remaining
users of this API.
2021-11-26 23:27:57 +01:00
Andreas Kling 58fb3ebf66 LibCore+AK: Move MappedFile from AK to LibCore
MappedFile is strictly a userspace thing, so it doesn't belong in AK
(which is supposed to be user/kernel agnostic.)
2021-11-23 11:33:36 +01:00
Andreas Kling 5a79c69b02 LibGfx: Make ImageDecoderPlugin::frame() return ErrorOr<>
This is a first step towards better error propagation from image codecs.
2021-11-21 20:22:48 +01:00
Timothy Flynn 93ee922027 LibUnicode: Support locales-without-script aliases for ECMA-402
As noted by ECMA-402, if a supported locale contains all of a language,
script, and region subtag, then the implementation must also support the
locale without the script subtag. The most complicated example of this
is the zh-TW locale.

The list of locales in the CLDR database does not include zh-TW or its
maximized zh-Hant-TW variant. Instead, it inlcudes the zh-Hant locale.
However, zh-Hant-TW is listed in the default-content locale list in the
cldr-core package. This defines an alias from zh-Hant-TW to zh-Hant. We
must then also support the zh-Hant-TW alias without the script subtag:
zh-TW. This transitively maps zh-TW to zh-Hant, which is a case quite
heavily tested by test262.
2021-11-19 11:45:35 +01:00
Jelle Raaijmakers dfbdd035da AK: Implement acos<T> correctly
This is a naive implementation based on the symmetry with `asin`.

Before, I'm not really sure what we were doing, but it was returning
wildly incorrect results.
2021-11-18 21:10:30 +01:00
Ali Mohammad Pur 387df06385 LibRegex: Avoid rewriting a+ as a* as part of atomic rewriting
The initial `ForkStay` is only needed if the looping block has a
following block, if there's no following block or the following block
does not attempt to match anything, we should not insert the ForkStay,
otherwise we would be rewriting `a+` as `a*` by allowing the 'end' to be
executed.
Fixes #10952.
2021-11-18 09:09:22 +01:00
Andreas Kling 216e21a1fa AK: Convert AK::Format formatting helpers to returning ErrorOr<void>
This isn't a complete conversion to ErrorOr<void>, but a good chunk.
The end goal here is to propagate buffer allocation failures to the
caller, and allow the use of TRY() with formatting functions.
2021-11-17 00:21:13 +01:00
Andreas Kling 587f9af960 AK: Make JSON parser return ErrorOr<JsonValue> (instead of Optional)
Also add slightly richer parse errors now that we can include a string
literal with returned errors.

This will allow us to use TRY() when working with JSON data.
2021-11-17 00:21:10 +01:00
Linus Groh 58c6a156bf LibCrypto: Fix subtracting two negative SignedBigIntegers
Currently, we get the following results

    -1 - -2 = -1
    -2 - -1 =  1

Correct would be:

    -1 - -2 =  1
    -2 - -1 = -1

This was already attempted to be fixed in 7ed8970, but that change was
incorrect. This directly translates to LibJS BigInts having the same
incorrect behavior - it even was tested.
2021-11-16 10:06:53 +00:00
Tim Schumacher 8417c0fb1e Tests/LibRegex: Add tests for line end anchors in PosixBasic 2021-11-13 15:06:52 +03:30
Andreas Kling 09780ba7a9 Tests/LibGfx: Actually test image decoders in TestImageDecoder
We were passing raw Gfx::Bitmap objects into the various image decoders
instead of encoded image data. This made all of them fail, but the test
expectations were set up in a way that aligned with this outcome.

With this patch, we now test the codecs for real. Except ICO, since we
don't have an ICO file handy. That's a FIXME.
2021-11-11 11:20:58 +01:00
Andreas Kling 8b1108e485 Everywhere: Pass AK::StringView by value 2021-11-11 01:27:46 +01:00
Andreas Kling a15ed8743d AK: Make ByteBuffer::try_* functions return ErrorOr<void>
Same as Vector, ByteBuffer now also signals allocation failure by
returning an ENOMEM Error instead of a bool, allowing us to use the
TRY() and MUST() patterns.
2021-11-10 21:58:58 +01:00
Andreas Kling 5f7d008791 AK+Everywhere: Stop including Vector.h from StringView.h
Preparation for using Error.h from Vector.h. This required moving some
things out of line.
2021-11-10 21:58:58 +01:00
Jan de Visser 3425730294 LibSQL: Implement table joins
This patch introduces table joins. It uses a pretty dumb algorithm-
starting with a singleton '__unity__' row consisting of a single boolean
value, a cartesian product of all tables in the 'FROM' clause is built.
This cartesian product is then filtered through the 'WHERE' clause,
again without any smarts just using brute force.

This patch required a bunch of busy work to allow for example the
ColumnNameExpression having to deal with multiple tables potentially
having columns with the same name.
2021-11-10 14:47:49 +01:00
Jan de Visser 1c50e9aadc LibSQL: Add current statement to the ExecutionContext
Because SQL is the craptastic language that it is, sometimes expressions
need to know details about the calling statement. For example the tables
in the 'FROM' clause may be needed to determine which columns are
referenced in 'WHERE' expressions. So the current statement is added
to the ExecutionContext and a new 'execute' overload on Statement is
created which takes the Database and the Statement and builds an
ExecutionContaxt from those.
2021-11-10 14:47:49 +01:00
Jan de Visser 7ea54db430 LibSQL: Add 'schema' and 'table' to TupleElementDescriptor
These are needed to distinguish columns from different tables with the
same column name in one and the same (joined) Tuple. Not quite happy
yet with this API; I think some sort of hierarchical structure would be
better but we'll burn that bridge when we get there :^)
2021-11-10 14:47:49 +01:00
Timothy Flynn 357c97dfa8 LibUnicode: Parse the CLDR's defaultContent.json locale list
This file contains the list of locales which default to their parent
locale's values. In the core CLDR dataset, these locales have their own
files, but they are empty (except for identity data). For example:

https://github.com/unicode-org/cldr/blob/main/common/main/en_US.xml

In the JSON export, these files are excluded, so we currently are not
recognizing these locales just by iterating the locale files.

This is a prerequisite for upgrading to CLDR version 40. One of these
default-content locales is the popular "en-US" locale, which defaults to
"en" values. We were previously inferring the existence of this locale
from the "en-US-POSIX" locale (many implementations, including ours,
strip variants such as POSIX). However, v40 removes the "en-US-POSIX"
locale entirely, meaning that without this change, we wouldn't know that
"en-US" exists (we would default to "en").

For more detail on this and other v40 changes, see:
https://cldr.unicode.org/index/downloads/cldr-40#h.nssoo2lq3cba
2021-11-09 20:44:52 +01:00
Andreas Kling a7f1f1c34b LibCore: Use ErrorOr<T> for Core::File::open() 2021-11-08 00:35:27 +01:00
Andreas Kling 0de33b3d6c LibGfx: Use ErrorOr<T> for Bitmap::try_create()
Another one that was used in a fajillion places.
2021-11-08 00:35:27 +01:00
Andreas Kling a54be656ae LibRegex: Don't push LibRegex's "Error" into the global namespace 2021-11-08 00:35:27 +01:00
Idan Horowitz 390a04a985 LibJS: Convert the GetValue AO to ThrowCompletionOr 2021-11-02 19:48:35 +01:00
Idan Horowitz d73b258874 LibJS: Convert reference deletion to ThrowCompletionOr 2021-11-02 19:48:35 +01:00
Ben Wiederhake f20a42e871 Kernel: Write test that crashes ProcFS 2021-10-31 18:44:12 +01:00
Idan Horowitz 10b93506ad Tests: Convert test-wasm functions to ThrowCompletionOr 2021-10-31 18:20:37 +02:00
Daniel Bertalan fed9cb5d2d AK+Tests: Fix formatting of infinity and NaN values
When I added this code in 1472f6d, I forgot to add tests for it. That's
why I didn't realize that the values were appended to the wrong
FormatBuilder object, so an empty string was returned instead of the
expected "nan"/"inf". This made debugging some FPU issues with the
ScummVM port significantly more difficult.
2021-10-31 12:15:34 +01:00
Ali Mohammad Pur ac856cb965 LibRegex: Don't ignore empty alternatives in append_alternation()
Doing so would cause patterns like `(a|)` to not match the empty string.
2021-10-29 15:57:59 +02:00
Liav A 8554952690 Kernel + WindowServer: Re-define the interface to framebuffer devices
We create a base class called GenericFramebufferDevice, which defines
all the virtual functions that must be implemented by a
FramebufferDevice. Then, we make the VirtIO FramebufferDevice and other
FramebufferDevice implementations inherit from it.
The most important consequence of rearranging the classes is that we now
have one IOCTL method, so all drivers should be committed to not
override the IOCTL method or make their own IOCTLs of FramebufferDevice.
All graphical IOCTLs are known to all FramebufferDevices, and it's up to
the specific implementation whether to support them or discard them (so
we require extensive usage of KResult and KResultOr, together with
virtual characteristic functions).
As a result, the interface is much cleaner and understandable to read.
2021-10-27 07:57:44 +03:00
Jan de Visser 7496f17620 LibSQL Tests: Add tests for SELECT ... WHERE ... 2021-10-25 12:59:42 +02:00
Jelle Raaijmakers a44978b9b0 LibC: Fix %n conversion specifier in scanf() format
Also add a test to prevent this from happening again. There were two
bugs:

* The number of bytes just after processing the last value was written,
  instead of the number of bytes after skipping remaining whitespace.
  Confirmed by testing against GNU's `scanf()` since the man page
  leaves something to be desired.

* The number of bytes was written to the wrong variable argument; i.e.
  the first argument was overwritten.
2021-10-24 22:43:27 -07:00
Jelle Raaijmakers 00f36fc5ae Tests: Print full 32-byte range of values in TestScanf
We are trying to show 8 u32 values, each of which needs at most 8
hexadecimal characters to be shown entirely.
2021-10-24 22:43:27 -07:00
Jelle Raaijmakers e71e9de61f Tests: Reword 'output' to 'return value' in TestScanf 2021-10-24 22:43:27 -07:00
Jelle Raaijmakers e3f17401cb Tests: Use correct argument count for value conformance in TestScanf 2021-10-24 22:43:27 -07:00
Tim Schumacher 65bcee3c62 Tests: Only test truthiness for iswctype 2021-10-24 22:40:11 -07:00
Tim Schumacher 9e3e4a692d Tests: Use proper comparison macros for wctype 2021-10-24 22:40:11 -07:00
Ben Wiederhake cb868cfa41 AK+Everywhere: Make Base64 decoding fallible 2021-10-23 19:16:40 +01:00
Ben Wiederhake 3bf1f7ae87 AK: Don't crash on invalid Base64 input
In the long-term, we should probably have a way to signal decoding
failure. For now, it should suffice to at least not crash. This is
particularly relevant because apparently this can be triggered while
parsing a PEM certificate, which happens during every TLS connection.

Found by OSS Fuzz
https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=38979
2021-10-23 19:16:40 +01:00
Tim Schumacher 79bcfa967b LibC: Fix up mblen 2021-10-22 13:28:56 -07:00
Tim Schumacher 8df6955838 LibC: Fix up mbtowc
One more proper implementation and one less FIXME.
2021-10-22 13:28:56 -07:00
Liav A cf0dbc9069 Tests: Add a unit test to ensure the /dev/mem device works correctly
To ensure everything works as expected, a unit test was added with
multiple scenarios.
This binary has to have the SetUID flag, and we also bind-mount the
/usr/Tests directory to allow running of SetUID binaries.
2021-10-22 13:13:00 +02:00
Tim Schumacher 89afd4d063 LibC: Implement mbsnrtowcs 2021-10-21 23:57:32 -07:00
Tim Schumacher 552ae77f0d LibC: Implement wcsnrtombs 2021-10-21 23:57:32 -07:00
Tim Schumacher e618602433 LibC: Implement mbrlen 2021-10-21 23:47:20 -07:00
Idan Horowitz 44555eb50a LibJS: Convert test-js/test-web/test-wasm to ThrowCompletionOr 2021-10-20 12:27:19 +01:00
Idan Horowitz 40eb3a39d4 LibJS: Rename define_native_function => define_old_native_function
This method will eventually be removed once all native functions are
converted to ThrowCompletionOr
2021-10-20 12:27:19 +01:00
Idan Horowitz 20163c0584 LibJS: Add ThrowCompletionOr versions of the JS native function macros
The old versions were renamed to JS_DECLARE_OLD_NATIVE_FUNCTION and
JS_DEFINE_OLD_NATIVE_FUNCTION, and will be eventually removed once all
native functions were converted to the new format.
2021-10-20 12:27:19 +01:00
Daniel Bertalan 48687c3fbb Tests: Remove Clang workaround from TestSourceLocation
Clang 13 now correctly handles `__builtin_FILE()` and
`-ffile-prefix-map` being specified together, so this test should fully
pass.
2021-10-17 17:09:58 +01:00
Daniel Bertalan 13e6d9d71a LibC: Implement wcslcpy 2021-10-17 17:09:58 +01:00
Idan Horowitz 1639ed7e0a LibJS: Convert to_double() to ThrowCompletionOr 2021-10-17 12:12:35 +01:00
Idan Horowitz df181809fd LibJS: Convert to_bigint_int64() to ThrowCompletionOr 2021-10-17 12:12:35 +01:00
Tim Schumacher 420bdccf0b LibC: Implement mbsrtowcs 2021-10-15 21:50:19 -07:00
Tim Schumacher b0babd062e LibC: Implement wcsrtombs 2021-10-15 21:50:19 -07:00
Daniel Bertalan c8367df746 LibC: Implement wcrtomb
This function converts a single wide character into its multibyte
representation (UTF-8 in our case). It is called from libc++'s
`std::basic_ostream<wchar_t>::flush`, which gets called at program exit
from a global destructor in order to flush `std::wcout`.
2021-10-15 21:50:19 -07:00
Tim Schumacher 4b423a5ec7 LibC: Implement twalk 2021-10-15 21:50:19 -07:00
Tim Schumacher 7448626bae LibC: Implement tfind and tsearch 2021-10-15 21:50:19 -07:00
Linus Groh 52976bfac6 LibJS: Convert to_object() to ThrowCompletionOr 2021-10-13 09:55:10 +01:00
Linus Groh 4d8912a92b LibJS: Convert to_string() to ThrowCompletionOr
Also update get_function_name() to use ThrowCompletionOr, but this is
not a standard AO and should be refactored out of existence eventually.
2021-10-13 09:55:10 +01:00
Ben Wiederhake 50ad294527 AK: Implement a way to resolve relative paths lexically 2021-10-10 15:18:55 -07:00
Nico Weber 96666f3209 Tests: Fix -Wunreachable-code warnings from clang 2021-10-08 23:33:46 +02:00
Andreas Kling fc36f830ae Tests: Disable LibThreading detach tests for now
These tests don't pass (at least not without KVM) at the moment,
so let's not run them on CI.
2021-10-06 19:21:35 +02:00
Junior Rantila 1552873275 Tests: Add LibThreading to CMakeLists.txt 2021-10-06 19:05:39 +02:00
Rodrigo Tobar 4b091a7cc2 LibELF: Fix dynamic linking of dlopen()-ed libs
Consider the situation where two shared libraries libA and libB, both
depending (as in having a NEEDED dtag) on libC. libA is first
dlopen()-ed, which produces libC to be mapped and linked. When libB is
dlopen()-ed the DynamicLinker would re-map and re-link libC though,
causing any previous references to its old location to be invalid. And
if libA's PLT has been patched to point to libC's symbols, then any
further invocations to libA will cause the code to jump to a virtual
address that isn't mapped anymore, therefore causing a crash. This
situation was reported in #10014, although the setup was more convolved
in the ticket.

This commit fixes the issue by distinguishing between a main program
loading being performed by Loader.so, and a dlopen() call. The main
difference between these two cases is that in the former the
s_globals_objects maps is always empty, while in the latter it might
already contain dependencies for the library being dlopen()-ed. Hence,
when collecting dependencies to map and link, dlopen() should skip those
that are present in the global map to avoid the issue described above.

With this patch the original issue seen in #10014 is gone, with all
python3 modules (so far) loading correctly.

A unit test reproducing a simplified issue is also included in this
commit. The unit test includes the building of two dynamic libraries A
and B with both depending on libline.so (and B also depending on A); the
test then dlopen()s libA, invokes one its function, then does the same
with libB.
2021-10-06 12:33:21 +02:00
Mahmoud Mandour 0e5b2c923d LibSQL: Add an INSERT without column names test
This adds a passing test of an insert statement that contains no column
names and assumes full tuple input
2021-10-04 15:51:48 +02:00
Mahmoud Mandour 235573f7ba LibSQL: Test INSERT statement with wrong number of values 2021-10-04 15:51:48 +02:00
Mahmoud Mandour 4df85840c3 LibSQL: Test INSERT statement with wrong data types 2021-10-04 15:51:48 +02:00
Mahmoud Mandour 6065383e05 LibSQL: Check NoError individually in execution tests
This enables tests to check that a statement is erroneous, we could not
do so by only calling `execute()` since it asserted that no errors
occurred.
2021-10-04 15:51:48 +02:00
Tim Schumacher 7af7fc8c16 Everywhere: Fix more Copyright header inconsistencies 2021-10-04 11:10:09 +01:00
Ali Mohammad Pur 8f722302d9 LibRegex: Use a match table for character classes
Generate a sorted, compressed series of ranges in a match table for
character classes, and use a binary search to find the matches.
This is about a 3-4x speedup for character class match performance. :^)
2021-10-03 19:16:36 +02:00
davidot ac2c3a73b1 LibJS: Add a specific test for invalid unicode characters in the lexer
Also fixes that it tried to make substrings past the end of the source
if we overran the source length.
2021-10-03 17:42:05 +02:00
Tim Schumacher 4302e7ac26 Tests: Add tests for mbrtowc 2021-10-03 11:13:50 +00:00
Tim Schumacher 67a579aab0 AK: Add a basic formatter for wchar_t 2021-10-03 11:13:50 +00:00
Tim Schumacher e7f99edefa Tests: Add a test for mbsinit 2021-10-03 11:13:50 +00:00
Tim Schumacher 05b283f552 LibC: Implement wmemmove 2021-10-03 05:28:51 +00:00
Tim Schumacher fa1208edfd LibC: Implement wmemset 2021-10-03 05:28:51 +00:00
Tim Schumacher 485c0ef691 LibC: Implement wmemcpy 2021-10-03 05:28:51 +00:00
Tim Schumacher 0ca1df4dc6 LibC: Implement wmemchr 2021-10-03 05:28:51 +00:00
Tim Schumacher 5ac2e84264 LibC: Implement wcsstr 2021-10-03 05:28:51 +00:00
Tim Schumacher 1b078f87b7 LibC: Implement wcspbrk 2021-10-03 05:28:51 +00:00
Liav A 4974727dbb Kernel: Move x86 IO instructions code into the x86 specific folder 2021-10-01 12:27:20 +02:00
Nico Weber 841a5fe81b Tests: Fix typos 2021-10-01 01:33:43 +01:00
davidot ce3f29a135 LibJS + test-js: Get results from the global object directly
This is as the spec would require you to do it and necessary for changes
to come in the following commits.
2021-09-30 08:16:32 +01:00
Rodrigo Tobar a8fae3d2c8 LibPthread: Add first test cases for RWlock 2021-09-28 18:36:20 +03:30
Andreas Kling f67648f872 LibWeb: Rename HTMLDocumentParser => HTMLParser 2021-09-25 23:36:43 +02:00
Ben Wiederhake 743470c8f2 AK: Introduce ability to default-initialize a Variant
I noticed that Variant<Empty, …> is a somewhat common pattern while
working on #10080, and this will simplify a few use-cases. :^)
2021-09-21 04:22:52 +04:30
Ali Mohammad Pur 436693c0c9 LibTLS: Use a setter for on_tls_ready_to_write with some more smarts
The callback should be called as soon as the connection is established,
and if we actually set the callback when it already is, we expect it to
be called immediately.
2021-09-19 21:10:23 +04:30
thankyouverycool 41ce6d0cd0 Tests: Conform font tests to new font format 2021-09-19 00:58:59 +02:00
Andreas Kling 1be4cbd639 AK: Make Utf8View constructors inline and remove C string constructor
Using StringView instead of C strings is basically always preferable.
The only reason to use a C string is because you are calling a C API.
2021-09-18 19:54:24 +02:00
Tim Schumacher b8c756a53a LibC: Primitively implement wcscoll
At the moment, sorting like LC_COLLATE=C would do is better than
nothing.
2021-09-18 02:57:56 +00:00
Tim Schumacher 42031f026a LibC: Implement towctrans 2021-09-17 22:59:51 +00:00
Tim Schumacher 8c7b566629 LibC: Implement iswctype 2021-09-17 22:59:51 +00:00
Tim Schumacher ff0ab8b9a9 LibC: Implement wctrans 2021-09-17 22:59:51 +00:00
Tim Schumacher 7b17230d7a LibC: Implement wctype 2021-09-17 22:59:51 +00:00
Ben Wiederhake 5f0f0ac413 crash: Don't test for qemu-unsupported feature
See #10042 for details. In short: qemu doesn't seem to implement that
feature, therefore the test correctly fails. However, that does not help
us, so we skip that test.
2021-09-16 20:51:24 +00:00
Ben Wiederhake c680ef0a09 crash: Run automatically during CI 2021-09-16 20:51:24 +00:00
Andrew Kaster 6e7cc40b18 Meta: Add Meta/CMake to the CMAKE_MODULE_PATH for Serenity and Lagom
This makes it so we don't need to specify the full path to all the
helper scripts we include() from different places in the codebase and
feels a lot cleaner.
2021-09-15 19:04:52 +04:30
Ali Mohammad Pur 741886a4c4 LibRegex: Make the optimizer understand references and capture groups
Otherwise the fork in patterns like `(1+)\1` would be (incorrectly)
optimized away.
2021-09-15 15:52:28 +04:30
Andreas Kling 0a09eaf3a1 LibJS+LibTest: Use JS::Script and JS::SourceTextModule in test-js
Instead of creating a Parser and Lexer manually in test-js, we now
use either JS::Script::parse() or JS::SourceTextModule::parse()
to load tests.
2021-09-14 21:41:51 +02:00
Ali Mohammad Pur ccb53c64e9 AK: Add an abstraction over multiple disjoint buffers
DisjointChunks<T> provides a nice interface over multiple sequential
Vector<T>'s, allowing the user to iterate over/index into/slice from
said buffers as if they were a single contiguous buffer.
To work with views on such objects, DisjointSpans<T> is provided, which
has the same behaviour but does not own the underlying objects.
2021-09-14 21:33:15 +04:30
Idan Horowitz d6cfa34667 AK: Make URL::m_port an Optional<u16>, Expose raw port getter
Our current way of signalling a missing port with m_port == 0 was
lacking, as 0 is a valid port number in URLs.
2021-09-14 00:14:45 +02:00
Ali Mohammad Pur 246ab432ff LibRegex: Add a basic optimization pass
This currently tries to convert forking loops to atomic groups, and
unify the left side of alternations.
2021-09-13 14:38:53 +04:30
Timothy Flynn 470262c8ab LibJS: Use ErrorType::NotAnObjectOfType instead of NotA 2021-09-12 00:16:39 +02:00
Timothy Flynn c59b97043e LibWeb: Use ErrorType::NotAnObjectOfType instead of NotA 2021-09-12 00:16:39 +02:00
Idan Horowitz 6704961c82 AK: Replace the mutable String::replace API with an immutable version
This removes the awkward String::replace API which was the only String
API which mutated the String and replaces it with a new immutable
version that returns a new String with the replacements applied. This
also fixes a couple of UAFs that were caused by the use of this API.

As an optimization an equivalent StringView::replace API was also added
to remove an unnecessary String allocations in the format of:
`String { view }.replace(...);`
2021-09-11 20:36:43 +03:00
Ben Wiederhake 2e4ec891da Everywhere: Fix format-vulnerabilities
Command used:
grep -Pirn '(out|warn)ln\((?!["\)]|format,|stderr,|stdout,|output, ")' \
     AK Kernel/ Tests/ Userland/
(Plus some manual reviewing.)

Let's pick ArgsParser as an example:
    outln(file, m_general_help);
This will fail at runtime if the general help happens to contain braces.

Even if this transformation turns out to be unnecessary in a place or
two, this way the code is "more obviously" correct.
2021-09-11 15:16:26 +01:00
Brian Gianforcaro a4efaa7b47 Tests/Kernel: Fix test after off-by-one fix in Memory::is_user_range()
Commit 890c647e0f fixed an off-by-one bug, so the mapping of the page
at the very end of the user address space now works correctly.

This change adjusts the test so cover the corner cases the original
version was designed too.validate.
2021-09-11 04:15:16 +00:00
Ali Mohammad Pur 14c8373eb0 AK+Kernel: Reduce the number of template parameters of IntrusiveRBTree
This makes the user-facing type only take the node member pointer, and
lets the compiler figure out the other needed types from that.
2021-09-10 18:05:46 +03:00
Ali Mohammad Pur 5a0cdb15b0 AK+Everywhere: Reduce the number of template parameters of IntrusiveList
This makes the user-facing type only take the node member pointer, and
lets the compiler figure out the other needed types from that.
2021-09-10 18:05:46 +03:00
Timothy Flynn 4f2bcebe74 LibUnicode+LibJS: Store locale keyword values as a single string
Previously, LibUnicode would store the values of a keyword as a Vector.
For example, the locale "en-u-ca-abc-def" would have its keyword "ca"
stored as {"abc, "def"}. Then, canonicalization would occur on each of
the elements in that Vector.

This is incorrect because, for example, the keyword value "true" should
only be dropped if that is the entire value. That is, the canonical form
of "en-u-kb-true" is "en-u-kb", but "en-u-kb-abc-true" does not change
for canonicalization. However, we would canonicalize that locale as
"en-u-kb-abc".
2021-09-08 21:08:48 +01:00
Idan Horowitz cb9720baab AK: Set IntrusiveRBTree Node key on insertion instead of construction
This makes the API look much nicer.
2021-09-08 19:17:07 +03:00
Idan Horowitz 1db9250766 AK: Make IntrusiveRedBlackTree capable of holding non-raw pointers
This is completely based on e4412f1f59
and will allow us to convert some AK::HashMap users in the kernel.
2021-09-08 19:17:07 +03:00
davidot 43b17f27a3 test-js: Add a mark_as_garbage method to force GC to collect that object
This should fix the flaky tests of test-js.
It also fixes the tests when running with the -g flag since the values
will not be garbage collected too soon.
2021-09-08 08:53:02 +01:00
Ali Mohammad Pur 7fefb8148b LibRegex: Use the correct capture group index in ERE bytecode generation
Otherwise the left and right capture instructions wouldn't point to the
same capture group if there was another nested group there.
2021-09-07 20:01:58 +02:00
Andreas Kling 6ad427993a Everywhere: Behaviour => Behavior 2021-09-07 13:53:14 +02:00
Timothy Flynn 50158abaf1 LibUnicode: Implement locale-aware BEFORE_DOT special casing
Note that the algorithm in the Unicode spec is for checking that a code
point precedes U+0307, but the special casing condition NotBeforeDot is
interested in the inverse of this rule.
2021-09-06 15:24:27 +01:00
Timothy Flynn 436faf9fd9 LibUnicode: Implement locale-aware MORE_ABOVE special casing 2021-09-06 15:24:27 +01:00
Timothy Flynn 1427ebc622 LibUnicode: Implement locale-aware AFTER_SOFT_DOTTED special casing 2021-09-06 15:24:27 +01:00
Timothy Flynn 0053d48c41 LibUnicode: Implement locale-aware AFTER_I special casing 2021-09-06 15:24:27 +01:00
Ali Mohammad Pur 63523d3836 Tests/LibRegex: Decrease the size of the fork chain test 2021-09-06 13:51:30 +04:30
Ali Mohammad Pur abbe9da255 LibRegex: Make infinite repetitions short-circuit on empty matches
This makes (addmittedly weird) patterns like `(a*)*` work correctly
without going into an infinite fork loop.
2021-09-06 13:51:30 +04:30
Ali Mohammad Pur 97e97bccab Everywhere: Make ByteBuffer::{create_*,copy}() OOM-safe 2021-09-06 01:53:26 +02:00
Ali Mohammad Pur 3a9f00c59b Everywhere: Use OOM-safe ByteBuffer APIs where possible
If we can easily communicate failure, let's avoid asserting and report
failure instead.
2021-09-06 01:53:26 +02:00
Brian Gianforcaro 782e7834c3 Tests: Reduce the execution time of the LibC TestQSort test
Executing 100 times vs 10 times doesn't increase test coverage
substantial, this API is stable and more iterations is just a
waste of time.

Without KVM this test is a clear outlier in runtime during CI:

```
START  LibC/TestQsort (106/172)
PASS   LibC/TestQsort (41.233692s)
```
2021-09-05 22:17:28 +02:00
Andreas Kling 7dda773426 AK: Add rvalue-ref qualifiers for Optional's value() and value_or()
This avoids a value copy when calling value() or value_or() on a
temporary Optional. This is very common when using the HashMap::get()
API like this:

    auto value = hash_map.get(key).value_or(fallback_value);
2021-09-04 03:02:08 +02:00
Daniel Bertalan d7b6cc6421 Everywhere: Prevent risky implicit casts of (Nonnull)RefPtr
Our existing implementation did not check the element type of the other
pointer in the constructors and move assignment operators. This meant
that some operations that would require explicit casting on raw pointers
were done implicitly, such as:
- downcasting a base class to a derived class (e.g. `Kernel::Inode` =>
  `Kernel::ProcFSDirectoryInode` in Kernel/ProcFS.cpp),
- casting to an unrelated type (e.g. `Promise<bool>` => `Promise<Empty>`
  in LibIMAP/Client.cpp)

This, of course, allows gross violations of the type system, and makes
the need to type-check less obvious before downcasting. Luckily, while
adding the `static_ptr_cast`s, only two truly incorrect usages were
found; in the other instances, our casts just needed to be made
explicit.
2021-09-03 23:20:23 +02:00
Timothy Flynn a05419db55 LibUnicode: Add lexer to test if a string matches the "type" production 2021-09-02 17:56:42 +01:00
Andrew Kaster 58797a1289 Tests: Remove all file(GLOB) from CMakeLists in Tests
Using a file(GLOB) to find all the test files in a directory is an easy
hack to get things started, but has some drawbacks. Namely, if you add
a test, it won't be found again without re-running CMake. `ninja` seems
to do this automatically, but it would be nice to one day stop seeing it
rechecking our globbed directories.
2021-09-02 09:08:23 +02:00
sin-ack b5a145b466 Tests: Add tests for Core::deferred_invoke 2021-09-02 03:47:47 +04:30
Timothy Flynn fd0011989a LibUnicode: Resolve the most likely territory alias when there are many 2021-09-01 14:14:47 +01:00
Timothy Flynn 72f49e42b4 LibUnicode: Perform complex Unicode locale alias substitution 2021-09-01 14:14:47 +01:00
Timothy Flynn da89cf9afb LibUnicode: Canonicalize calendar subtags
Calendar subtags are a bit of an odd-man-out in that we must match the
variants "ethiopic-amete-alem" in that order, without any other variant
in the locale. So a separate method is needed for this, and we now defer
sorting the variant list until after other canonicalization is done.
2021-09-01 14:14:47 +01:00
Timothy Flynn 8458f477a4 LibUnicode: Canonicalize timezone subtags 2021-09-01 14:14:47 +01:00
Timothy Flynn 335f985b31 LibUnicode: Canonicalize the subtag "imperial" to "uksystem" 2021-09-01 14:14:47 +01:00
Timothy Flynn 2d90144888 LibUnicode: Canonicalize the subtag "primary" and "tertiary" to "levelN" 2021-09-01 14:14:47 +01:00
Timothy Flynn 409f39b336 LibUnicode: Canonicalize the subtag "names" to "prprname" 2021-09-01 14:14:47 +01:00
Timothy Flynn f907a7dc38 LibUnicode: Canonicalize the subtag "yes" to "true" 2021-09-01 14:14:47 +01:00
Timothy Flynn 556374a904 LibUnicode: Substitute Unicode locale aliases during canonicalization
Unicode TR35 defines how locale subtag aliases should be emplaced when
converting a locale to canonical form. For most subtags, it is a simple
substitution. Language subtags depend on context; for example, the
language "sh" should become "sr-Latn", but if the original locale has a
script subtag already ("sh-Cyrl"), then only the language subtag of the
alias should be taken ("sr-Latn").

To facilitate this, we now make two passes when canonicalizing a locale.
In the first pass, we convert the LocaleID structure to canonical syntax
(where the conversions all happen in-place). In the second pass, we form
the canonical string based on the canonical syntax.
2021-09-01 14:14:47 +01:00
Timothy Flynn d13142f015 LibJS+LibUnicode: Store parsed Unicode locale data as full strings
Originally, it was convenient to store the parsed Unicode locale data as
views into the original string being parsed. But to implement locale
aliases will require mutating the data that was parsed. To prepare for
that, store the parsed data as proper strings.
2021-09-01 14:14:47 +01:00
Andrew Kaster b3e3e4d45d Tests: Convert remaining LibC tests to LibTest
Convert them to using outln instead of printf at the same time.
2021-09-01 13:44:24 +02:00
Peter Elliott 33d7fdca28 Everywhere: Use my cool new @serenityos.org email address 2021-09-01 11:37:25 +04:30
Peter Elliott 8d2c04821f Tests: Test LibMarkdown against commonmark test suite
TestCommonmark runs the CommonMark test suite
(https://spec.commonmark.org/0.30/spec.json) against LibMarkdown.
Currently 44/652 tests pass.
2021-08-31 16:53:51 +02:00
Hediadyoin1 fdef6e5f76 AK: Add FixedPoint arithmetic helper
Co-authored-by: Hendiadyoin1 <leon2002.la@gmail.com>
Co-authored-by: kleines Filmröllchen <malu.bertsch@gmail.com>
2021-08-31 17:03:55 +04:30
Ali Mohammad Pur 30a1a25daa Tests/LibWasm: Handle all stream errors in parse_webassembly_module 2021-08-30 22:47:02 +02:00
Ali Mohammad Pur 99199b9bfd Tests/LibWasm: Add support for javascript bigint values
Some i64 values will not fit in normal doubles, and these values _are_
tested by the test suite, this makes the test runtime capable of
handling them correctly.
2021-08-30 22:47:02 +02:00
Timothy Flynn f897c2edb3 LibUnicode: Canonicalize locale private use extensions 2021-08-30 19:42:40 +01:00
Timothy Flynn 6f0cb52dc4 LibUnicode: Canonicalize locale extensions 2021-08-30 19:42:40 +01:00
Timothy Flynn 30855e6663 LibUnicode: Parse locale private use extensions 2021-08-30 19:42:40 +01:00
Timothy Flynn 29f76ef7c8 LibUnicode: Parse locale extensions of the other extension form 2021-08-30 19:42:40 +01:00
Timothy Flynn d2d304fcf8 LibUnicode: Parse locale extensions of the transformed extension form 2021-08-30 19:42:40 +01:00
Timothy Flynn eda92d15e4 LibUnicode: Parse locale extensions of the Unicode locale extension form 2021-08-30 19:42:40 +01:00
Timothy Flynn 587d4663a3 AK: Return early from swap() when swapping the same object
When swapping the same object, we could end up with a double-free error.
This was found while quick-sorting a Vector of Variants holding complex
types, reproduced by the new swap_same_complex_object test case.
2021-08-30 19:42:40 +01:00
Ali Mohammad Pur 206bc01f81 LibRegex: Allow null bytes in pattern
That check was rather pointless as the input is a StringView which knows
its own bounds.
Fixes #9686.
2021-08-30 18:43:09 +02:00
Timothy Flynn b7a95cba65 LibUnicode: Implement grammar validators for Unicode TR-35
ECMA-402 requires validating user input against the EBNF grammar for
Unicode locales described in TR-35: https://www.unicode.org/reports/tr35

This commit adds validators for that grammar, as well as other helper to
e.g. canonicalize a locale string.
2021-08-26 22:04:09 +01:00
Timothy Flynn 262e412634 AK: Implement method to convert a String/StringView to title case
This implementation preserves consecutive spaces in the orginal string.
2021-08-26 22:04:09 +01:00
Jean-Baptiste Boric c97f7ea23b Tests: Test setjmp/sigsetjmp LibC functions
Since there are no real users of these functions in Serenity's
userland and this is my third attempt at this... This time, the great
LibTest test suite will make sure that I do it right!
2021-08-26 00:54:23 +02:00
Jan de Visser 85a84b0794 LibSQL: Introduce Serializer as a mediator between Heap and client code
Classes reading and writing to the data heap would communicate directly
with the Heap object, and transfer ByteBuffers back and forth with it.
This makes things like caching and locking hard. Therefore all data
persistence activity will be funneled through a Serializer object which
in turn submits it to the Heap.

Introducing this unfortunately resulted in a huge amount of churn, in
which a number of smaller refactorings got caught up as well.
2021-08-21 22:03:30 +02:00
Jan de Visser d074a601df LibSQL+SQLServer: Bare bones INSERT and SELECT statements
This patch provides very basic, bare bones implementations of the
INSERT and SELECT statements. They are *very* limited:
- The only variant of the INSERT statement that currently works is
   SELECT INTO schema.table (column1, column2, ....) VALUES
      (value11, value21, ...), (value12, value22, ...), ...
   where the values are literals.
- The SELECT statement is even more limited, and is only provided to
  allow verification of the INSERT statement. The only form implemented
  is: SELECT * FROM schema.table

These statements required a bit of change in the Statement::execute
API. Originally execute only received a Database object as parameter.
This is not enough; we now pass an ExecutionContext object which
contains the Database, the current result set, and the last Tuple read
from the database. This object will undoubtedly evolve over time.

This API change dragged SQLServer::SQLStatement into the patch.

Another API addition is Expression::evaluate. This method is,
unsurprisingly, used to evaluate expressions, like the values in the
INSERT statement.

Finally, a new test file is added: TestSqlStatementExecution, which
tests the currently implemented statements. As the number and flavour of
implemented statements grows, this test file will probably have to be
restructured.
2021-08-21 22:03:30 +02:00
Jan de Visser b74721e604 LibSQL: Redesign Value implementation and add new types
The implemtation of the Value class was based on lambda member variables
implementing type-dependent behaviour. This was done to ensure that
Values can be used as stack-only objects; the simplest alternative,
virtual methods, forces them onto the heap. The problem with the the
lambda approach is that it bloats the Values (which are supposed to be
lightweight objects) quite considerably, because every object contains
more than a dozen function pointers.

The solution to address both problems (we want Values to be able to live
on the stack and be as lightweight as possible) chosen here is to
encapsulate type-dependent behaviour and state in an implementation
class, and let the Value be an AK::Variant of those implementation
classes. All methods of Value are now basically straight delegates to
the implementation object using the Variant::visit method.

One issue complicating matters is the addition of two aggregate types,
Tuple and Array, which each contain a Vector of Values. At this point
Tuples and Arrays (and potential future aggregate types) can't contain
these aggregate types. This is limiting and needs to be addressed.

Another area that needs attention is the nomenclature of things; it's
a bit of a tangle of 'ValueBlahBlah' and 'ImplBlahBlah'. It makes sense
right now I think but admit we probably can do better.

Other things included here:
- Added the Boolean and Null types (and Tuple and Array, see above).
- to_string now always succeeds and returns a String instead of an
  Optional. This had some impact on other sources.
- Added a lot of tests.
- Started moving the serialization mechanism more towards where I want
  it to be, i.e. a 'DataSerializer' object which just takes
  serialization and deserialization requests and knows for example how
  to store long strings out-of-line.

One last remark: There is obviously a naming clash between the Tuple
class and the Tuple Value type. This is intentional; I plan to make the
Tuple class a subclass of Value (and hence Key and Row as well).
2021-08-21 22:03:30 +02:00
Jan de Visser a5e28f2897 LibSQL: Make TupleDescriptor a shared pointer instead of a stack object
Tuple descriptors are basically the same for for example all rows in
a table. Makes sense to share them instead of copying them for every
single row.
2021-08-21 22:03:30 +02:00
Timothy Flynn 562d4e497b LibRegex: Treat pattern string characters as unsigned
For example, consider the following pattern:

    new RegExp('\ud834\udf06', 'u')

With this pattern, the regex parser should insert the UTF-8 encoded
bytes 0xf0, 0x9d, 0x8c, and 0x86. However, because these characters are
currently treated as normal char types, they have a negative value since
they are all > 0x7f. Then, due to sign extension, when these characters
are cast to u64, the sign bit is preserved. The result is that these
bytes are inserted as 0xfffffffffffffff0, 0xffffffffffffff9d, etc.

Fortunately, there are only a few places where we insert bytecode with
the raw characters. In these places, be sure to treat the bytes as u8
before they are cast to u64.
2021-08-20 19:16:33 +02:00
Andreas Kling 13f4890c38 LibCore: Make Core::File::open() return OSError in case of failure 2021-08-20 15:31:46 +02:00
Timothy Flynn 4f2cbe119b LibRegex: Allow Unicode escape sequences in capture group names
Unfortunately, this requires a slight divergence in the way the capture
group names are stored. Previously, the generated byte code would simply
store a view into the regex pattern string, so no string copying was
required.

Now, the escape sequences are decoded into a new string, and a vector
of all parsed capture group names are stored in a vector in the parser
result structure. The byte code then stores a view into the
corresponding string in that vector.
2021-08-19 23:49:25 +02:00
Timothy Flynn fd8ccedf2b AK: Add GenericLexer API to consume an escaped Unicode code point
This parsing is already duplicated between LibJS and LibRegex, and will
shortly be needed in more places in those libraries. Move it to AK to
prevent further duplication.

This API will consume escaped Unicode code points of the form:
    \\u{code point}
    \\unnnn (where each n is a hexadecimal digit)
    \\unnnn\\unnnn (where the two escaped values are a surrogate pair)
2021-08-19 23:49:25 +02:00
Timothy Flynn 325eabc770 LibRegex: Ensure the GoBack operation decrements the code unit index
This was missed in commit 27d555bab0.
2021-08-18 09:47:09 +04:30
Timothy Flynn a9716ad44e LibRegex: In non-Unicode mode, parse \u{4} as a repetition pattern 2021-08-18 09:47:09 +04:30
davidot 7613c22b06 LibJS: Add a mode to parse JS as a module
In a module strict mode should be enabled at the start of parsing and we
allow import and export statements.
2021-08-15 23:51:47 +01:00
Timothy Flynn 9509433e25 LibRegex: Implement and use a REPEAT operation for bytecode repetition
Currently, when we need to repeat an instruction N times, we simply add
that instruction N times in a for-loop. This doesn't scale well with
extremely large values of N, and ECMA-262 allows up to N = 2^53 - 1.

Instead, add a new REPEAT bytecode operation to defer this loop from the
parser to the runtime executor. This allows the parser to complete sans
any loops (for this instruction), and allows the executor to bail early
if the repeated bytecode fails.

Note: The templated ByteCode methods are to allow the Posix parsers to
continue using u32 because they are limited to N = 2^20.
2021-08-15 11:43:45 +01:00
Timothy Flynn f1ce998d73 LibRegex+LibJS: Combine named and unnamed capture groups in MatchState
Combining these into one list helps reduce the size of MatchState, and
as a result, reduces the amount of memory consumed during execution of
very large regex matches.

Doing this also allows us to remove a few regex byte code instructions:
ClearNamedCaptureGroup, SaveLeftNamedCaptureGroup, and NamedReference.
Named groups now behave the same as unnamed groups for these operations.
Note that SaveRightNamedCaptureGroup still exists to cache the matched
group name.

This also removes the recursion level from the MatchState, as it can
exist as a local variable in Matcher::execute instead.
2021-08-15 11:43:45 +01:00
Timothy Flynn 1a173be29d LibRegex: Disallow unescaped quantifiers in Unicode mode 2021-08-15 11:43:45 +01:00
Timothy Flynn c3e1f1f687 LibRegex: Use correct source characters for Unicode identity escapes 2021-08-15 11:43:45 +01:00
Timothy Flynn 6a485f612f LibRegex: Implement legacy octal escape parsing closer to the spec
The grammar for the ECMA-262 CharacterEscape is:

  CharacterEscape[U, N] ::
    ControlEscape
    c ControlLetter
    0 [lookahead ∉ DecimalDigit]
    HexEscapeSequence
    RegExpUnicodeEscapeSequence[?U]
    [~U]LegacyOctalEscapeSequence
    IdentityEscape[?U, ?N]

It's important to parse the standalone "\0 [lookahead ∉ DecimalDigit]"
before parsing LegacyOctalEscapeSequence. Otherwise, all standalone "\0"
patterns are parsed as octal, which are disallowed in Unicode mode.

Further, LegacyOctalEscapeSequence should also be parsed while parsing
character classes.
2021-08-15 11:43:45 +01:00
Timothy Flynn 83ca8c7e38 LibRegex: Convert LibRegex tests to use StringView in place of C-strings
A subsequent commit will add tests that require a string containing only
"\0". As a C-string, this will be interpreted as the null terminator. To
make the diff for that commit easier to grok, this commit converts all
tests to use StringView without any other functional changes.
2021-08-15 11:43:45 +01:00
Timothy Flynn 0c8f2f5aca LibRegex: Ensure escaped hexadecimals are exactly 2 digits in length 2021-08-15 11:43:45 +01:00
Timothy Flynn 2e4b6fd1ac LibRegex: Ensure escaped code points are exactly 4 digits in length 2021-08-15 11:43:45 +01:00
Timothy Flynn e887314472 LibRegex: Fix ECMA-262 parsing of invalid identity escapes
* Only alphabetic (A-Z, a-z) characters may be escaped with \c. The loop
  currently parsing \c includes code points between the upper/lower case
  groups.
* In Unicode mode, all invalid identity escapes should cause a parser
  error, even in browser-extended mode.
* Avoid an infinite loop when parsing the pattern "\c" on its own.
2021-08-15 11:43:45 +01:00
Brian Gianforcaro a2a5cb0f24 AK: Add Time::is_negative() to detect negative time values 2021-08-15 12:20:38 +02:00
Daniel Bertalan 0a36cea9dc Tests: Re-enable UserspaceEmulator tests on the Clang build
Now that problems that made UE crash have been fixed, this test should
now pass.
2021-08-14 18:42:14 +02:00
Itamar e57fdb63f8 Tests: Add regression tests for the LibCpp preprocessor
Similarly to the LibCpp parser regression tests, these tests run the
preprocessor on the .cpp test files under
Userland/LibCpp/Tests/preprocessor, and compare the output with existing
.txt ground truth files.
2021-08-14 12:40:55 +02:00
Timothy Flynn df14d11a11 LibRegex: Disallow invalid interval qualifiers in Unicode mode
Fixes all remaining 'built-ins/RegExp/property-escapes' test262 tests.
2021-08-11 13:11:01 +02:00
Timothy Flynn 1e91334008 LibUnicode: Handle edge-case script extensions, Common and Inherited
These script extensions have some peculiar behavior in the Unicode spec.
The UCD ScriptExtension file does not contain these scripts. Rather, it
is implied the code points which have these scripts as an extension are
the code points that both:

  1. Have Common or Inherited as their primary script value
  2. Do not have any other script value in their script extension lists

Because these are not explictly listed in the UCD, we must manually form
these script extensions.
2021-08-11 13:11:01 +02:00
Timothy Flynn 47bb350ebd LibUnicode: Generate separate tables for scripts and script extensions
Notice that unlike the note in populate_general_category_unions(),
script extension do indeed have code point ranges which overlap. Thus,
this commit adds code to handle that, and hooks it into the GC unions.
2021-08-11 13:11:01 +02:00
Timothy Flynn 5ac23d244d LibUnicode: Generate separate tables for Unicode properties
Similar to General Categories, this generates separate tables for the
Property list.
2021-08-11 13:11:01 +02:00
Timothy Flynn b06c104076 LibUnicode: Include Unassigned code points in the Other General Category
Now that the generator parses unassigned General Category properties, it
can include Unassigned (Cn) in the Other (C) category.
2021-08-11 13:11:01 +02:00
Timothy Flynn 7dce2bfe23 LibUnicode: Generate separate tables for General Category properties
Previously, each code point's General Category was part of the generated
UnicodeData structure. This ultimately presented two problems, one
functional and one performance related:

  * Some General Categories are applied to unassigned code points, for
    example the Unassigned (Cn) category. Unassigned code points are
    strictly excluded from UnicodeData.txt, so by relying on that file,
    the generator is unable to handle these categories.

  * Lookups for General Categories are slower when searching through the
    large UnicodeData hash map. Even though lookups are O(1), the hash
    function turned out to be slower than binary searching through a
    category-specific table.

So, now a table is generated for each General Category. When querying a
code point for a category, a binary search is done on each code point
range in that category's table to check if code point has that category.

Further, General Categories are now parsed from the UCD file
DerivedGeneralCategory.txt. This file is a normal "prop list" file and
contains the categories for unassigned code points.
2021-08-11 13:11:01 +02:00
Mandar Kulkarni aaf232f903 Tests: Add test for String::bijective_base_from() 2021-08-09 14:14:07 +04:30
Daniel Bertalan 146dcf4856 Tests: Disable UserspaceEmulator tests for Clang builds
There seems to be more incorrect assumptions about Clang-built
executables' memory layout than expected. These make the CI fail even
though the system is functional in all other aspects. While this is
being fixed, let's just disable tests for UserspaceEmulator.
2021-08-08 10:55:36 +02:00
Daniel Bertalan 7396e4aedc LibDebug: Store 64-bit numbers in AttributeValue
This helps us avoid weird truncation issues and fixes a bug on Clang
builds where truncation while reading caused the DIE offsets following
large LEB128 numbers to be incorrect. This removes the need for the
separate `LongUnsignedNumber` type.
2021-08-08 10:55:36 +02:00
Daniel Bertalan 5f2f460cc8 Tests: Add Clang pragma for turning off optimizations
Clang does not accept `GCC optimize("O0")`, so it fails to build the
system with it.
2021-08-08 10:55:36 +02:00
Itamar 4673a517f6 LibCpp: Do lexing in the Preprocessor
We now call Preprocessor::process_and_lex() and pass the result to the
parser.

Doing the lexing in the preprocessor will allow us to maintain the
original position information of tokens after substituting definitions.
2021-08-07 21:24:11 +02:00
Lenny Maiorani 8e949c5c91 Tests: Remove unused variables for clang build
Problem:
- Clang will not build `Tests/LibTLS` due to unused variables.

Solution:
- Remove the unused variables.
2021-08-06 23:55:27 +02:00
TheFightingCatfish 4e8e1b7b3a AK: Improve the parsing of data urls
Improve the parsing of data urls in URLParser to bring it more up-to-
spec. At the moment, we cannot parse the components of the MIME type
since it is represented as a string, but the spec requires it to be
parsed as a "MIME type record".
2021-08-06 10:45:17 +02:00
Timothy Flynn 484ccfadc3 LibRegex: Support property escapes of Unicode script extensions 2021-08-04 13:50:32 +01:00
Timothy Flynn 06088df729 LibRegex: Support property escapes of the Unicode script property
Note that unlike binary properties and general categories, scripts must
be specified in the non-binary (Script=Value) form.
2021-08-04 13:50:32 +01:00
Brian Gianforcaro 4df1657898 Tests: Add coverage for sys$alarm() success case 2021-08-03 18:44:01 +02:00
Brian Gianforcaro ea401fb3c3 Tests: Add coverage for sys$alarm() canceling a stale timer
This is a regression test to validate the functionality that was
reported broken in #9071, where the kernel would spin attempting
to cancel a stale timer.
2021-08-03 18:44:01 +02:00
Timothy Flynn dc9f516339 LibRegex: Generate negated property escapes as a single instruction
These were previously generated as two instructions, Compare [Inverse]
and Compare [Property].
2021-08-02 21:02:09 +04:30
Timothy Flynn 4de4312827 LibRegex: Support property escapes of the form \p{Type=Value}
Before now, only binary properties could be parsed. Non-binary props are
of the form "Type=Value", where "Type" may be General_Category, Script,
or Script_Extension (or their aliases). Of these, LibUnicode currently
supports General_Category, so LibRegex can parse only that type.
2021-08-02 21:02:09 +04:30
Timothy Flynn 1e10d6d7ce LibRegex: Support property escapes of Unicode General Categories
This changes LibRegex to parse the property escape as a Variant of
Unicode Property & General Category values. A byte code instruction is
added to perform matching based on General Category values.
2021-08-02 21:02:09 +04:30
Ali Mohammad Pur 85d87cbcc8 LibRegex: Add some tests for Fork{Stay,Jump} performance
Without the previous fixes, these will blow up the stack.
2021-08-02 17:22:50 +04:30
Brian Gianforcaro d1644c26d6 Tests: Remove unused header includes 2021-08-01 08:10:16 +02:00
Brian Gianforcaro c54ae3afd6 Tests: Fix AK/TestJSON.cpp by not relying on disk resources
The following commit broke Tests/AK/TestJSON.cpp as it removed the
file that the test loaded from disk to validate JSON parsing.

    commit ad141a2286
    Author: Andreas Kling <kling@serenityos.org>
    Date:   Sat Jul 31 15:26:14 2021 +0200

        Base: Remove "test.frm" from HackStudio test project

Instead of restoring the file, lets just embed a bit of JSON in the
test case to avoid using external resources, as they obviously are
surprising and make the test less portable across environments.
2021-07-31 23:56:40 +02:00
Timothy Flynn d485cf29d7 LibRegex+LibUnicode: Begin implementing Unicode property escapes
This supports some binary property matching. It does not support any
properties not yet parsed by LibUnicode, nor does it support value
matching (such as Script_Extensions=Latin).
2021-07-30 21:26:31 +01:00
Andreas Kling bccdc08487 Kernel: Unmapping a non-mapped region with munmap() should be a no-op
Not a regression per se from 0fcb9efd86
since we were crashing before that which is obviously worse.
2021-07-30 13:16:55 +02:00
Brian Gianforcaro c9395d7e9a Tests: Validate unmapping 0x0 doesn't crash the Kernel
Previously unmapping any offset starting at 0x0 would assert in the
kernel, add a regression test to validate the fix.

Co-authored-by: Federico Guerinoni <guerinoni.federico@gmail.com>
2021-07-30 11:28:55 +02:00
Timothy Flynn c4bfda7f7f LibUnicode: Handle code points that are both cased and case-ignorable
Apparently, some code points fit both categories, for example U+0345
(COMBINING GREEK YPOGEGRAMMENI). Handle this fact when determining if
a code point is a final code point in a string.
2021-07-28 23:42:29 +02:00
Timothy Flynn 7827aede6f LibUnicode: Check word break when deciding on case-ignorable code points 2021-07-28 23:42:29 +02:00
Timothy Flynn c45a014645 LibUnicode: Check property list when deciding if a code point is cased 2021-07-28 23:42:29 +02:00
ovf 898b8ffcb6 LibWeb: Avoid assertion failure on parsing numeric character references 2021-07-28 18:32:22 +02:00
Timothy Flynn 39f971e42b LibUnicode: Begin implementing special Unicode case folding
This implements unconditional special case folding, and conditional
folding for non-locale cases. Worth noting that the only conditional,
non-locale special case is for converting an uppercase sigma to
lowercase.
2021-07-27 21:04:36 +01:00
ovf 13c7d55320 LibWeb: Fix parsing of character references in attribute values 2021-07-27 00:03:43 +02:00
Timothy Flynn 4dda3edc9e LibUnicode: Introduce a Unicode library for interacting with UCD files
The Unicode standard publishes the Unicode Character Database (UCD) with
information about every code point, such as each code point's upper case
mapping. LibUnicode exists to download and parse UCD files at build time
and to provide accessors to that data.

As a start, LibUnicode includes upper- and lower-case code point
converters.
2021-07-26 17:03:55 +01:00
brapru 7e40c17460 AK: Create MACAddress from string
Previously there was no way to create a MACAddress by passing a direct
address as a string. This will allow programs like the arp utility to
create a MACAddress instance by user-passed addresses.
2021-07-25 17:57:08 +02:00
Luke a00b5fc7b7 Tests: Add tests for the quoted printable decoder 2021-07-24 20:11:28 +04:30
Timothy Flynn 345ef6abba LibRegex: Support ECMA-262 Unicode escapes of the form "\u{code_point}"
When the Unicode flag is set, regular expressions may escape code points
by surrounding the hexadecimal code point with curly braces, e.g. \u{41}
is the character "A".

When the Unicode flag is not set, this should be considered a repetition
symbol - \u{41} is the character "u" repeated 41 times. This is left as
a TODO for now.
2021-07-23 23:06:57 +01:00
Timothy Flynn 47f6bb38a1 LibRegex: Support UTF-16 RegexStringView and improve Unicode matching
When the Unicode option is not set, regular expressions should match
based on code units; when it is set, they should match based on code
points. To do so, the regex parser must combine surrogate pairs when
the Unicode option is set. Further, RegexStringView needs to know if
the flag is set in order to return code point vs. code unit based
string lengths and substrings.
2021-07-23 23:06:57 +01:00
Brian Gianforcaro c2282ee28d Tests: Add test coverage for sys$pledge(..) argument validation 2021-07-23 19:02:25 +02:00
Brian Gianforcaro fa448456a9 Tests: Add test coverage for sys$unveil(..) argument validation 2021-07-23 19:02:25 +02:00
Ali Mohammad Pur d40d10aae7 AK: Implement {any,all}_of(IterableContainer&&, Predicate)
This is a generally nicer-to-use version of the existing {any,all}_of()
that doesn't require the user to explicitly provide two iterators.
As a bonus, it also allows arbitrary iterators (as opposed to the hard
requirement of providing SimpleIterators in the iterator version).
2021-07-22 22:56:20 +02:00
Ali Mohammad Pur 6c9ef20010 AK: Add a CommonType<Ts...> type trait
Also adds a simple-ish test for CommonType.
2021-07-22 22:56:20 +02:00
Timothy Flynn 9b83cd1abf AK: Add Utf16View for decoding UTF-16 strings
Also includes a way to transcode from and to UTF-8 strings.
2021-07-22 09:10:44 +02:00
Andreas Kling c7d891765c LibGfx: Use "try_" prefix for static factory functions
Also mark them as [[nodiscard]].
2021-07-21 18:02:15 +02:00
Andrew Kaster 64aac345d3 AK: Use new Formatter for each element in Formatter<Vector<T>>
The state of the formatter for the previous element should be thrown
away for each iteration. This showed up when trying to format a
Vector<String>, since Formatter<StringView> was unhappy about some state
that gets set when it's called. Add a test for Formatter<Vector>.
2021-07-19 05:17:05 +04:30
Peter Bindels ef85c4f747 Tests: Make mmap test point to new kernel address too
During a recent commit the 64-bit kernel was moved to a different
address, breaking this test (unnoticed). This fixes it, so we can
turn on breaking x86_64 tests on the CI again.
2021-07-18 22:08:20 +02:00
Ali Mohammad Pur f364fcec5d LibRegex+Everywhere: Make LibRegex more unicode-aware
This commit makes LibRegex (mostly) capable of operating on any of
the three main string views:
- StringView for raw strings
- Utf8View for utf-8 encoded strings
- Utf32View for raw unicode strings

As a result, regexps with unicode strings should be able to properly
handle utf-8 and not stop in the middle of a code point.
A future commit will update LibJS to use the correct type of string
depending on the flags.
2021-07-18 21:10:55 +04:30