The code in Spinlock.h has no architectural specific logic, thus can be
moved to the Arch directory. This contains no functional change.
Also add the Spinlock.cpp file for aarch64 which contains stubs for the
lock and unlock functions.
Previously the embedmap.sh script generated a warning, since there was
no section defined where the actual kernel.map could be stored. This is
necesarry for generating kernel backtraces.
This compiler builtin abstracts away the specifics of fetching the frame
pointer. This will allow the KSyms.cpp to be build for the aarch64
target. While we're here, lets also change the
PerformanceEventBuffer.cpp to not rely on x86_64 specifics.
Previously in the aarch64 Kernel, this would cause dbgln() to actually
print more characters of the next string in memory, because strings in
the Kernel are not zero terminated by default. Prevent this by using the
passed in length of the string.
When calling dbgln(), the formatting code in AK/Format.h calls
Processor::is_initialized() to determine whether to add some text about
the current processor to the debug output. Instead of crashing, we just
return false, such that we can use dbgln() etc in the aarch64 Kernel.
This allows us to use the AK formatting functions in the aarch64 Kernel.
Also add FIXME to make sure that this file will be removed when the
proper abstractions are in place in the normal Kernel/kprintf.cpp.
The compiler figured out that the MemoryManager is not initialised, and
thus MemoryManager::the() cannot return a valid reference. Once the
necesarry code is in place, this compiler flag can be removed.
Coverage tools like LLVM's source-based coverage or GNU's --coverage
need to be able to write out coverage files from any binary, regardless
of its security posture. Not ignoring these pledges and veils means we
can't get our coverage data out without playing some serious tricks.
However this is pretty terrible for normal exeuction, so only skip these
checks when we explicitly configured userspace for coverage.
It doesn't make sense after introduction of routing table which allows
having multiple gateways for every interface, and isn't used by any of
the userspace programs now.
This will allow using the console tty and WindowServer regardless of
your kernel command line. Also this fixes a bug where, when booting in
text mode, the console was in graphical mode, and would not accept
input.
That code used the old AK::Result container, which leads to overly
complicated initialization flow when trying to figure out the correct
partition table type. Instead, when using the ErrorOr container the code
is much simpler and more understandable.
Previously the system had no concept of assigning different routes for
different destination addresses as the default gateway IP address was
directly assigned to a network adapter. This default gateway was
statically assigned and any update would remove the previously existing
route.
This patch is a beginning step towards implementing #180. It implements
a simple global routing table that is referenced during the routing
process. With this implementation it is now possible for a user or
service (i.e. DHCP) to dynamically add routes to the table.
The routing table will select the most specific route when possible. It
will select any direct match between the destination and routing entry
addresses. If the destination address overlaps between multiple entries,
the Kernel will use the longest prefix match, or the longest number of
matching bits between the destination address and the routing address.
In the event that there is no entries found for a specific destination
address, this implementation supports entries for a default route to be
set for any specified interface.
This is a small first step towards enhancing the system's routing
capabilities. Future enhancements would include referencing a
configuration file at boot to load pre-defined static routes.
I've noticed that the KVM hypervisor vendor ID string contained null
terminators in the serialized JSON string in /proc/cpuinfo - let's avoid
that, and err on the side of caution and strip them from all strings
built from CPUID register values. They may not be fixed width after all.
This creates all interfaces when the device is enumerated, with a link
to the configuration that it is a part of. As such, a new class,
`USBInterface` has been introduced to express this state.
Some other parts of the USB stack may require us to perform a control
transfer. Instead of abusing `friend` to expose the default pipe, let's
just expose it via a function.
This also introduces a new class, `USBConfiguration` that stores a
configuration. The device, when instructed, sets this configuration and
holds a pointer to it so we have a record of what configuration is
currently active.
AnonymousFile always allocates in multiples of a page size when created
with anon_create. This is especially an issue if we use AnonymousFile
shared memory to store a shared data structure that isn't exactly a
multiple of a page in size. Therefore, we can just allow mmaps of
AnonymousFile to map only an initial part of the shared memory.
This makes SharedSingleProducerCircularQueue work when it's introduced
later.
In most cases it's safe to abort the requested operation and go forward,
however, in some places it's not clear yet how to handle these failures,
therefore, we use the MUST() wrapper to force a kernel panic for now.
On the QEMU microvm machine type, it became apparent that the BIOS was
not setting the i8042 controller to function as expected. To ensure that
the controller is always outputting correct scan codes, set it to scan
code 2 and enable first port translation to ensure all scan codes are
translated to scan code set 1. This is the expected behavior when using
SeaBIOS, but on qboot (the BIOS for the QEMU microvm machine type), the
firmware doesn't take care of this so we need to do this ourselves.
This keeps us from accidentally overwriting an already set region name,
for example when we are mapping a file (as, in this case, the file name
is already stored in the region).
Since KASLR was added kernel_load_base only signifies the address at
which the kernel image start, not the start of kernel memory, meaning
that a valid kernel stack can be allocated before it in memory.
We use kernel_mapping_base, the lowest address covered by the kernel
page directory, as the minimal address when performing safety checks
during backtrace generation.
When we lock a mutex, eventually `Thread::block` is invoked which could
in turn invoke `Process::big_lock().restore_exclusive_lock()`. This
would then try to add the current thread to a different blocked thread
list then the one in use for the original mutex being locked, and
because it's an intrusive list, the thread is removed from its original
list during the `.append()`. When the original mutex eventually
unblocks, we no longer have the thread in the intrusive blocked threads
list and we panic.
Solve this by making the big lock mutex special and giving it its own
blocked thread list. Because the process big lock is temporary and is
being actively removed from e.g. syscalls, it's a matter of time before
we can also remove the fix introduced by this commit.
Fixes issue #9401.
If we unregister from the RegionTree before unmapping, there's a race
where a new region can get inserted at the same address that we're about
to unmap. If this happens, ~Region() will then unmap the newly inserted
region, which now finds itself with cleared-out page table entries.
This had no business being in RegionTree, since RegionTree doesn't track
identity-mapped regions anyway. (We allow *any* address to be identity
mapped, not just the ones that are part of the RegionTree's range.)
This patch adds RegionTree::get_lock() which exposes the internal lock
inside RegionTree. We can then lock it from the outside when doing
lookups or traversal.
This solution is not very beautiful, we should find a way to protect
this data with SpinlockProtected or something similar. This is a stopgap
patch to try and fix the currently flaky CI.
This syscall ends up disabling interrupts while changing the time,
and the clock is a global resource anyway, so preventing threads in the
same process from running wouldn't solve anything.
Let's use terminology from the the Intel manual to avoid confusion.
Also add `_string` suffixes to better distinguish the numeric values
from the string values.
...and remove the last remaining client of the API. It's no longer
possible to ask the RegionTree for a VM range. You can only ask it to
place your Region somewhere in available space.
This patch move AddressSpace (the per-process memory manager) to using
the new atomic "place" APIs in RegionTree as well, just like we did for
MemoryManager in the previous commit.
This required updating quite a few places where VM allocation and
actually committing a Region object to the AddressSpace were separated
by other code.
All you have to do now is call into AddressSpace once and it'll take
care of everything for you.
Instead of first allocating the VM range, and then inserting a region
with that range into the MM region tree, we now do both things in a
single atomic operation:
- RegionTree::place_anywhere(Region&, size, alignment)
- RegionTree::place_specifically(Region&, address, size)
To reduce the number of things we do while locking the region tree,
we also require callers to provide a constructed Region object.
This patch ports MemoryManager to RegionTree as well. The biggest
difference between this and the userspace code is that kernel regions
are owned by extant OwnPtr<Region> objects spread around the kernel,
while userspace regions are owned by the AddressSpace itself.
For kernelspace, there are a couple of situations where we need to make
large VM reservations that never get backed by regular VMObjects
(for example the kernel image reservation, or the big kmalloc range.)
Since we can't make a VM reservation without a Region object anymore,
this patch adds a way to create unbacked Region objects that can be
used for this exact purpose. They have no internal VMObject.)
RegionTree holds an IntrusiveRedBlackTree of Region objects and vends a
set of APIs for allocating memory ranges.
It's used by AddressSpace at the moment, and will be used by MM soon.
This patch stops using VirtualRangeAllocator in AddressSpace and instead
looks for holes in the region tree when allocating VM space.
There are many benefits:
- VirtualRangeAllocator is non-intrusive and would call kmalloc/kfree
when used. This new solution is allocation-free. This was a source
of unpleasant MM/kmalloc deadlocks.
- We consolidate authority on what the address space looks like in a
single place. Previously, we had both the range allocator *and* the
region tree both being used to determine if an address was valid.
Now there is only the region tree.
- Deallocation of VM when splitting regions is no longer complicated,
as we don't need to keep two separate trees in sync.
This is important for dmidecode because it does an fstat on the DMI
blobs, trying to figure out their size. Because we already know the size
of the blobs when creating the SysFS components, there's no performance
penalty whatsoever, and this allows dmidecode to not use the /dev/mem
device as a fallback.
The current implementation of read/write will fail in StorageDevice
when the request length is less than the block size of the underlying
device. Fix it by calculating the offset within a block for such cases
and using it for copying data from the bounce buffer.
8233da3398 introduced a not-so-subtle bug
where an application with an existing pledge set containing `no_error`
could elevate its pledge set by pledging _anything_, this commit makes
sure that no new promise is accepted.
Some error indication was done by returning bool. This was changed to
propagate the error by ErrorOr from the underlying functions. The
returntype of the underlying functions was also changed to propagate the
error.
We're now able to detect all the regular CPUID feature flags from
ECX/EDX for EAX=1 :^)
None of the new ones are being used for anything yet, but they will show
up in /proc/cpuinfo and subsequently lscpu and SystemMonitor.
Note that I replaced the periods from the SSE 4.1 and 4.2 instructions
with underscores, which matches the internal enum names, Linux's
/proc/cpuinfo and the general pattern of replacing special characters
with underscores to limit feature names to [a-z0-9_].
The enum member stringification has been moved to a new function for
better re-usability and to avoid cluttering up Processor.cpp.
This will make it possible to add many, many more CPU features - more
than the current limit 32 and later limit of 64 if we stick with an enum
class to be specific :^)
Checks of ECX go before EDX, and the bit indices are now ordered
properly. Additionally, handling of the EDX[11] bit has been moved into
a lambda function to keep the series of if statements neatly together.
All of this makes it *a lot* easier to follow along and compare the
implementation to the tables in the Intel manual, e.g. to find missing
checks.
This solves a problem where any non-trivial member in the global BSP
Processor instance would get re-initialized (improperly), losing data
that was already initialized earlier.
Expose the block size variable via a member function in the
AsyncBlockDeviceRequest so that the driver doesn't need to assume any
value such as 512 bytes.
The underlying driver does not need to recalculate the buffer size as
it is passed in the AsyncBlockDevice struct anyway. This also helps in
removing any assumptions of the underlying block size of the device.
This makes pledge() ignore promises that would otherwise cause it to
fail with EPERM, which is very useful for allowing programs to run under
a "jail" so to speak, without having them termiate early due to a
failing pledge() call.
Contrary to the past, we don't attempt to assume the real name of a TTY
device, but instead, we generate a pseudo name only when needed to do so
which is still OK because we don't break abstraction layer rules and we
still can provide userspace with the required information.
Now that we reclaim the memory range that is created by KASLR before
the start of the kernel image, there's no need to be conservative with
the KASLR offset.
The obsolete ttyname and ptsname syscalls are removed.
LibC doesn't rely on these anymore, and it helps simplifying the Kernel
in many places, so it's an overall an improvement.
In addition to that, /proc/PID/tty node is removed too as it is not
needed anymore by userspace to get the attached TTY of a process, as
/dev/tty (which is already a character device) represents that as well.
This ioctl operation will allow userspace to determine the index number
of a MasterPTY after opening /dev/ptmx and actually getting an internal
file descriptor of MasterPTY.
This will replace the /dev/tty symlink created by SystemServer, so
instead of a symlink, a character device will be created. When doing
read(2), write(2) and ioctl(2) on this device, it will "redirect" these
operations to the attached TTY of the current process.
This ensures we don't just waste the memory range between the default
base load address and the actual load address that was shifted by the
KASLR offset.
This requirement comes from the fact the Prekernel mapping logic only
uses 2 MiB pages.
This unfortunately reduces the bits of entropy in kernel addresses from
16 bits to 7, but it could be further improved in the future by making
the Prekernel mapping logic a bit more dynamic.
To do so, we now check that the framebuffer type is RGB so we know that
the Multiboot bootloader actually provided a valid framebuffer to work
with.
This fixes a problem I observed on my ICH7 test machine that apparently
the multiboot_framebuffer_addr was not null but there was no framebuffer
that was set up for RGB colors, and by initializing that console, there
was a memory curroption caused somewhere in the EBDA area to probably
cause a complete system lockup.
This helps solving an issue when we boot with text mode screen so the
Kernel initializes an early text mode console, but even after disabling
it, that console can still access VGA ports. This wouldn't be a problem
for emulated hardware but bare metal hardware might have a "conflict",
especially if the native driver explicitly request to disable the VGA
emulation.
This class already has variables named m_lock, and it's also strange
that locals are named with the `m_` prefix. So lets fix that to make
the code more readable.
Found by PVS-Studio.
C++20 provides the `requires` clause which simplifies the ability to
limit overload resolution. Prefer it over `EnableIf`
With all uses of `EnableIf` being removed, also remove the
implementation so future devs are not tempted.
Since the allocated memory is going to be zeroed immediately anyway,
let's avoid redundantly scrubbing it with MALLOC_SCRUB_BYTE just before
that.
The latest versions of gcc and Clang can automatically do this malloc +
memset -> calloc optimization, but I've seen a couple of places where it
failed to be done.
This commit also adds a naive kcalloc function to the kernel that
doesn't (yet) eliminate the redundancy like the userland does.
This is mainly useful when adding an HostController but due to OOM
condition, we abort temporary Vector insertion of a DeviceIdentifier
and then exit the iteration loop to report back the error if occured.
Instead, hold the lock while we copy the contents to a stack-based
Vector then iterate on it without any locking.
Because we rely on heap allocations, we need to propagate errors back
in case of OOM condition, therefore, both PCI::enumerate API function
and PCI::Access::add_host_controller_and_enumerate_attached_devices use
now a ErrorOr<void> return value to propagate errors. OOM Error can only
occur when enumerating the m_device_identifiers vector under a spinlock
and trying to expand the temporary Vector which will be used locklessly
to actually iterate over the PCI::DeviceIdentifiers objects.
This code attempts to copy the `Protocol::Resource3DSpecification`
struct into request, starting at `Protocol::ResourceCreate3D::target`
member of the `Protocol::ResourceCreate3D` struct.
The problem is that the `Protocol::Resource3DSpecification` struct
does not having the trailing `u32 padding` that the `ResourceCreate3D`
struct has. Leading to memcopy overrunning the struct and corrupting
32 bits of data trailing the struct.
Found by SonarCloud:
- Memory copy function overflows the destination buffer.
As there is no need for a Prekernel on aarch64, the Prekernel code was
moved into Kernel itself. The functionality remains the same.
SERENITY_KERNEL_AND_INITRD in run.sh specifies a kernel and an inital
ramdisk to be used by the emulator. This is needed because aarch64
does not need a Prekernel and the other ones do.
These fences should not be needed, since we force the use of
synchronous operations through synchronous_virtio_gpu_command. The use
of these fences also causes severe lag when SERENITY_GL is enabled.
This commit flips VirtIOGPU back to using a Mutex for its operation
lock (instead of a spinlock). This is necessary for avoiding a few
system hangs when queuing actions on the driver from multiple
processes, which becomes much more of an issue when using VirGL from
multiple userspace process.
This does result in a few code paths where we inevitably have to grab
a mutex from inside a spinlock, the only way to fix both issues is to
move to issuing asynchronous virtio gpu commands.
The project appears to build just fine without it, and the explicit use
of `LibC` causes it to conflict with the system-wide `fd_set.h` when
building inside of Serenity.
This additionally refactors FramebufferDevice::try_to_initialize to not
leave the FramebufferDevice in an invalid state on errors.
This also unifies the logic between FramebufferDevice::mmap and
FramebufferDevice::try_to_initialize.
This comes with the drawback of removing the UNMAP_AFTER_INIT attribute
from this function, which wasn't honoured by IntelNativeGraphicsAdapter
anyway.
If init crashes, all other userspace processes exit too, thus rendering
the system unusable. Previously, the kernel would still keep running
even without a userland, showing just a black screen without any
indication of the issue.
We now panic the kernel, which shows a message on the console. In the
case of the CI runners, it shuts down the virtual machine, so we don't
have to wait for the 1 hour timeout if an issue arises with
SystemServer.