Some of the code assumed that chars were always signed while that is
not the case on ARM hosts.
Also, some of the code tried to use EOF (-1) in a way similar to what
fgetc() does, however instead of storing the characters in an int
variable a char was used.
While this seemed to work it also meant that character 0xFF would be
incorrectly seen as an end-of-file.
Careful reading of fgetc() reveals that fgetc() stores character
data in an int where valid characters are in the range of 0-255 and
the EOF value is explicitly outside of that range (usually -1).
Doing these as custom classes might be faster, especially when writing
them in SSE, but this would cause a lot of Code duplication and due to
the nature of constexprs and the intelligence of the compiler they might
be using SSE/MMX either way
It's prone to finding "technically uninitialized but can never happen"
cases, particularly in Optional<T> and Variant<Ts...>.
The general case seems to be that it cannot infer the dependency
between Variant's index (or Optional's boolean state) and a particular
alternative (or Optional's buffer) being untouched.
So it can flag cases like this:
```c++
if (index == StaticIndexForF)
new (new_buffer) F(move(*bit_cast<F*>(old_buffer)));
```
The code in that branch can _technically_ make a partially initialized
`F`, but that path can never be taken since the buffer holding an
object of type `F` and the condition being true are correlated, and so
will never be taken _unless_ the buffer holds an object of type `F`.
This commit also removed the various 'diagnostic ignored' pragmas used
to work around this warning, as they no longer do anything.
This commit makes it possible to instantiate `Vector<T&>` and use it
to store references to `T` in a vector.
All non-pointer observers are made to return the reference, and the
pointer observers simply yield the underlying pointer.
Note that the 'find_*' methods act on the values and not the pointers
that are stored in the vector.
This commit also makes errors in various vector methods much more
readable by directly using requires-clauses on them.
And finally, it should be noted that Vector cannot hold temporaries :^)
The methods of this class were all over the place, this commit reorders
them to place them in a more logical order:
- Constructors/Destructor
- Observers
- Comparisons and const existence checks
- Mutators: insert
- Mutators: append
- Mutators: prepend
- Mutators: assignment
- Mutators: remove
- OOM-safe mutators: insert
- OOM-safe mutators: append
- OOM-safe mutators: prepend
- OOM-safe size management
- Size management
- Iterators
This fixes a bug where a Utf8View was created with data from a temporary
string, which was immediately deleted. This lead to a use-after-free
issue. This also changes most occurences for StringBuilder::to_string in
URLParser to use ::string_view(), as the value is passed as StringView
const& most of the time anyways.
This fixes oss-fuzz issue 34973.
When a code point is invalid, the full string was outputted to the debug
output. For large strings, this can make the system quite slow.
Furthermore, one of the cases incorrectly assumed the data to be null
terminated. This patch modifies the debug statements not to print the
full string.
This fixes oss-fuzz issue 35050.
Other software might not expect these to be defined and behave
differently if they _are_ defined, e.g. scummvm which checks if
the TODO macro is defined and fails to build if it is.
This commit initializes the LibVideo library and implements parsing
basic Matroska container files. Currently, it will only parse audio
and video tracks.
Previously, AK::Function would accept _any_ callable type, and try to
call it when called, first with the given set of arguments, then with
zero arguments, and if all of those failed, it would simply not call the
function and **return a value-constructed Out type**.
This lead to many, many, many hard to debug situations when someone
forgot a `const` in their lambda argument types, and many cases of
people taking zero arguments in their lambdas to ignore them.
This commit reworks the Function interface to not include any such
surprising behaviour, if your function instance is not callable with
the declared argument set of the Function, it can simply not be
assigned to that Function instance, end of story.
This changes URL parser to use the 0xFFFFFFFF constant instead of 0 to
indicate end of file. This fixes a bug where inputs containing null
bytes would terminate the parser early, because they were interpreted
as end of file.
This patch adds a state_name method to URLParser to convert a state to a
string. With this, the debugging statements now display the state names.
Furthermore, this fixes a bug where non-ASCII code points were
formatted as characters, which fails an assertion in the formatting
system.
Because non-ASCII code points have negative byte values, trimming away
control characters requires checking for negative bytes values.
This also adds a test case with a URL containing non-ASCII code points.
Ideally a Function should not be clear()ed while it's running.
Unfortunately a number of callers do just that and identifying all of
them would require inspecting all call sites for operator() and clear().
Instead this adds support for deferring clear() until after the
function has finished executing.
Previously this was generating a crazy number of symbols, and it was
also pretty-damn-slow as it was defined recursively, which made the
compiler incapable of inlining it (due to the many many layers of
recursion before it terminated).
This commit replaces the recursion with a pack expansion and marks it
always-inline.
StringView::lines() supports line-separators “\n”, “\r”, and “\r\n”.
The method will drop an entire line if it is surrounded by “\r”
and “\n” separators on the left and right sides respectively.
The previous behavior was to always VERIFY that the UTF-8 bytes were
valid when iterating over the code points of an UTF8View. This change
makes it so we instead output the 0xFFFD 'REPLACEMENT CHARACTER'
code point when encountering invalid bytes, and keep iterating the
view after skipping one byte.
Leaving the decision to the consumer would break symmetry with the
UTF32View API, which would in turn require heavy refactoring and/or
code duplication in generic code such as the one found in
Gfx::Painter and the Shell.
To make it easier for the consumers to detect the original bytes, we
provide a new method on the iterator that returns a Span over the
data that has been decoded. This method is immediately used in the
TextNode::compute_text_for_rendering method, which previously did
this in a ad-hoc waay.
This also add tests for the new behavior in TestUtf8.cpp, as well
as reinforcements to the existing tests to check if the underlying
bytes match up with their expected values.
This replaces ctype.h with CharacterType.h everywhere I could find
issues with narrowing conversions. While using it will probably make
sense almost everywhere in the future, the most critical places should
have been addressed.
This patch introduces CharacterTypes.h, which aims to be a replacement
for most usages of ctype.h. In contrast to that implementation, this
header makes use of exclusively constexpr functions and support the full
Unicode code point set without any narrowing-conversion issues.
In order for IntrusiveList to be capable of replacing InlineLinkedList,
it needs to support reverse iteration. InlineLinkedList currently
supports manual reverse iteration by calling list->last() followed by
node->prev().
Previously we'd incur the costs for a function call via the PLT even
for the most trivial ref-count actions like increasing/decreasing the
reference count.
By moving the code to the header file we allow the compiler to inline
this code into the caller's function.
Checking for this (and get()'ing it) is always invalid, so let's just
disallow it.
This also finds two bugs where the code is checking for types that can
never actually be in the variant (which was actually a refactor
artifact).
This changes the URL class to use the correct constness for getters,
setters and other methods. It also changes the entire class to use east
const style.
This patch introduces a new operator== to compare an Optional to its
contained type directly. If the Optional does not contain a value, the
comparison will always return false.
This also adds a test case for the new behavior as well as comparison
between Optional objects themselves.
This adds a hostname parameter as the third parameter to
URL::create_with_file_scheme(). If the hostname is "localhost", it will
be ignored (as per the URL specification).
This can for example be used by ls(1) to create more conforming file
URLs.
The m_path member variable has been superseded by m_paths. Thus, it has
been removed. The path() getter will continue to exist as a convenience
method for getting the path joined together as a string.
This adds URL serialization methods which are more in line with the
specification.
The serialize_for_display() method should be used e.g. in the browser
address bar, and as per the spec should not display username and
password. Furthermore, it could decode most percent-encoded code points,
although that is not implemented yet.
This adds a new URL parser, which aims to be compliant with the URL
specification (https://url.spec.whatwg.org/). It also contains a
rudimentary data URL parser.
This adds a few helper functions and a private constructor to
instantiate a data URL to the URL class. These will be needed by the
upcoming URL parser.
This adds the m_username, m_password, m_paths and m_cannot_be_a_base_url
member variables to the URL class. These are necessary for the upcoming
new URL parser.
The deprecated m_path variable shadows the m_paths variable if it is
non-null. This behavior will be removed once the old URL parser has been
removed.
This removes URLParser, because its two exposed functions, urlencode()
and urldecode(), have been superseded by URL::percent_encode() and
URL::percent_decode(). This is in preparation for the introduction of a
new URL parser.
This replaces all occurrences of those functions with the newly
implemented functions URL::percent_encode() and URL::percent_decode().
The old functions will be removed in a further commit.
This adds a few new functions to percent encode/decode strings according
to the URL specification. The functions allow specifying a
PercentEncodeSet, which is defined by the specification. It will be used
to replace the current urlencode() and urldecode() functions in a
further commit.
This commit adds a few duplicate helper functions in the URL class, such
as is_digit() and is_ascii_digit(). This will be cleaned up as soon as
the upcoming new URL parser will replace the current one.
This adds a peek method for Utf8CodepointIterator, which enables it to
be used in some parsing cases where peeking is necessary.
peek(0) is equivalent to operator*, expect that peek() does not contain
any assertions and will just return an empty Optional<u32>.
This also implements a test case for iterating UTF-8.
This renames all references to protocol to scheme, which is the name
used by the URL standard (https://url.spec.whatwg.org/). Externally, all
methods referencing "protocol" were duplicated with "scheme". The old
methods still exist as compatibility.
This patch removes unnecessary function parameter names in declarations
of the URL class. It also changes parameter types from String to
StringView where applicable.
For non-x86 targets, it's not very nice to define inline functions in
AK/Memory.h with asm volatile implementations. Guard this inline
assembly with ARCH(I386) and provide portable alternatives. Since we
always compile with optimizations, the hand-vectorized memset and
memcpy seem to be of dubious value, but we'll keep them here until
proven one way or another.
This should fix the Lagom build on native M1 macOS that was reported
on Discord the other day.
Previously the StringBuilder class would use memcpy() to write
directly into the ByteBuffer's buffer. Instead we should use the
append() method which ensures we don't overrun the buffer.
This allows us to mark the slow part (i.e. where we copy the buffer) as
NEVER_INLINE because this should almost never get called and therefore
should also not get inlined into callers.
Previously ByteBuffer::grow() behaved like Vector<T>::resize().
However the function name was somewhat ambiguous - and so this patch
updates ByteBuffer to behave more like Vector<T> by replacing grow()
with resize() and adding an ensure_capacity() method.
This also lets the user change the buffer's capacity without affecting
the size which was not previously possible.
Additionally this patch makes the capacity() method public (again).
Previously, we would go crazy and shift things way out of bounds.
Add tests to verify that the decoding algorithm is safe around the
limits of the result type.
printf didn't check whether the additional integer variable belongs to
the field width specifier or to the precision specifier, and always
applied it to the field width instead.
Implement the case distinction that we already use in literal width
and precision specifiers for the variable version as well so that
they are correctly attributed.
These dbgln's caused excessive load in the WebServer process,
accounting for ~67% of the processing time when serving a webpage
with a bunch of resources like serenityos.org/happy/2nd/.
We want to discourage folks from using APIs which lull you into a sense
of false safety in terms of OOM. There are cases where you want to force
allocations to succeed or crash, but those should use a more explicit
API than `AK::adopt_own(.)`.
The ASAN_[UN]POISON_MEMORY_REGION macros can be used to manually notify
the AddressSanitizer runtime about the reachability of instrumented code
accessing a memory region. This is most useful for manually managed
heaps and arenas that do not go directly to malloc or alligned_alloc.
Previously GCC came to the conclusion that we were reading
m_outline_capacity via ByteBuffer(ByteBuffer const&) -> grow()
-> capacity() even though that could never be the case because
m_size is 0 at that point which means we have an inline buffer
and capacity() would return inline_capacity in that case without
reading m_outline_capacity.
This makes GCC inline parts of the grow() function into the
ByteBuffer copy constructor which seems sufficient for GCC to
realize that m_outline_capacity isn't actually being read.
When compiling the Kernel with Og, the compiler complains that
m_outline_capacity might be uninitialized when calling capacity()
Note that this fix is not really what we want. Ideally only outline
buffer and outline capacity would need initialized, not the entire
inline buffer. However, clang considers the class to not be
default-constructible if we make that change, while gcc accepts it.
We had two functions for doing mostly the same thing. Combine both
of them into String::find() and use that everywhere.
Also add some tests to cover basic behavior.
Problem:
- The constructor is defined to be the default constructor.
Solution:
- Let the compiler generate the destructor by setting it to the
default.
We were accidentally calling calculate_base64_decoded_length instead,
which resulted in extra allocations during the StringBuilder::append
calls that can be avoided.
This patch implements a Unicode-safe substring method, which can be used
when offset and length should be specified in actual characters instead
of bytes.
This can be used to mitigate issues where a string is split in the
middle of a UTF-8 multi-byte character, which leads to invalid UTF-8.
Furthermore, it implements to common shorthands for substring methods
which take only an offset and return the substring until the end of the
string.
If a line was larger than 1024 bytes or the file ended without a
newline character, can_read_line would return false.
IODevice::can_read_line() now reads until a newline is found or
EOF is reached.
fixes#5907
This introduces the UnicodeUtils file, which contains helper functions
related to Unicode. This is in contrast to StringUtils, whose functions
are not directly related to Unicode and are, in theory,
encoding-agnostic.
Removing the element from the intrusive linked list might not be safe
if doing so requires a lock. Instead this is something the caller
should have done so let's verify instead that we're not on any lists.
Problem:
- Bitmasks are duplicated.
- Bitmasks are C-style arrays.
Solution:
- Move bitmasks to BitmapView.h.
- Change C-style arrays to be AK::Array for added safety.
Previously <AK/Function.h> also included <AK/OwnPtr.h>. That's about to
change though. This patch fixes a few build problems that will occur
when that change happens.
This changes Variant::visit() to forward the value returned by the
selected visitor invocation. By perfectly forwarding the returned value,
this allows for the visitor to return by value or reference.
Note that all provided visitors must return the same type - the compiler
will otherwise fail with the message: "inconsistent deduction for auto
return type".
Problem:
- Function local `constexpr` variables do not need to be
`static`. This consumes memory which is unnecessary and can prevent
some optimizations.
- C-style arrays are not as safe as AK::Arrays and require the user to
specify the length of the array manually.
Solution:
- Remove `static` keyword.
- Change from C-style array for AK::Array.
In case the write was to stderr/stdout, and it just so happened to fail
because of an issue like "the pty is gone", VERIFY() would end up
calling vout() back to write to stderr, which would then fail forever
until the stack is exhausted.
"Fixes" the issue where the Shell would crash in horrible ways when the
terminal is closed.
Previously StringBuilder would start allocating an external buffer
once the caller has used up more than half of the inline buffer's
capacity. Instead we should prefer to use the inline buffer until
it is full and only then start to allocate an external buffer.
Problem:
- `BitmapView` permits changing the underlying `Bitmap`. This violates
the idea of a "view" since views are simply overlays which can
themselves change but do not change the underlying data.
Solution:
- Migrate all non-`const` member functions to Bitmap.
Problem:
- Static variables take memory and can be subject to less optimization.
- This static variable is only used in 1 place.
Solution:
- Move the variable into the function and make it non-static.
This was removed as part of the ByteBuffer changes but the allocation
optimization is still necessary at least for non-SerenityOS targets
where malloc_good_size() isn't supported or returns a small value and
causes a whole bunch of unnecessary reallocations.
As the parser now flattens out the instructions and inserts synthetic
nesting/structured instructions where needed, we can treat the whole
thing as a simple parsed bytecode stream.
This currently knows how to execute the following instructions:
- unreachable
- nop
- local.get
- local.set
- {i,f}{32,64}.const
- block
- loop
- if/else
- branch / branch_if
- i32_add
- i32_and/or/xor
- i32_ne
This also extends the 'wasm' utility to optionally execute the first
function in the module with optionally user-supplied arguments.
Problem:
- `BitmapView` permits changing the underlying `Bitmap`. This violates
the idea of a "view" since views are simply overlays which can
themselves change but do not change the underlying data.
Solution:
- Migrate all non-`const` member functions to Bitmap.
The current code is factored such that reads to the entirety of the last
byte should be dropped. This was relying on the fact that last would be
one past the end in that case. Instead of actually reading that byte
when it's completely out of bounds of the bitmask, just skip reads that
would be invalid. Add more tests to make sure that the behavior is
correct for byte aligned reads of byte aligned bitmaps.
As we removed the support of VBE modesetting that was done by GRUB early
on boot, we need to determine if we can modeset the resolution with our
drivers, and if not, we should enable text mode and ensure that
SystemServer knows about it too.
Also, SystemServer should first check if there's a framebuffer device
node, which is an indication that text mode was not even if it was
requested. Then, if it doesn't find it, it should check what boot_mode
argument the user specified (in case it's self-test). This way if we
try to use bochs-display device (which is not VGA compatible) and
request a text mode, it will not honor the request and will continue
with graphical mode.
Also try to print critical messages with mininum memory allocations
possible.
In LibVT, We make the implementation flexible for kernel-specific
methods that are implemented in ConsoleImpl class.
Previously ByteBuffer would internally hold a RefPtr to the byte
buffer and would behave like a reference type, i.e. copying a
ByteBuffer would not create a duplicate byte buffer, but rather
two objects which refer to the same internal buffer.
This also changes ByteBuffer so that it has some internal capacity
much like the Vector<T> type. Unlike Vector<T> however a byte
buffer's data may be uninitialized.
With this commit ByteBuffer makes use of the kmalloc_good_size()
API to pick an optimal allocation size for its internal buffer.
This commit replaces the former, hand-written parser with a new one that
can be generated automatically according to a state change diagram.
The new `EscapeSequenceParser` class provides a more ergonomic interface
to dealing with escape sequences. This interface has been inspired by
Alacritty's [vte library](https://github.com/alacritty/vte/).
I tried to avoid changing the application logic inside the `Terminal`
class. While this code has not been thoroughly tested, I can't find
regressions in the basic command line utilities or `vttest`.
`Terminal` now displays nicer debug messages when it encounters an
unknown escape sequence. Defensive programming and bounds checks have
been added where we access parameters, and as a result, we can now
endure 4-5 seconds of `cat /dev/urandom`. :D
We generate EscapeSequenceStateMachine.h when building the in-kernel
LibVT, and we assume that the file is already in place when the userland
library is being built. This will probably cause problems later on, but
I can't find a way to do it nicely.