This commit implements the ISO 9660 filesystem as specified in ECMA 119.
Currently, it only supports the base specification and Joliet or Rock
Ridge support is not present. The filesystem will normalize all
filenames to be lowercase (same as Linux).
The filesystem can be mounted directly from a file. Loop devices are
currently not supported by SerenityOS.
Special thanks to Lubrsi for testing on real hardware and providing
profiling help.
Co-Authored-By: Luke <luke.wilde@live.co.uk>
...and also RangeAllocator => VirtualRangeAllocator.
This clarifies that the ranges we're dealing with are *virtual* memory
ranges and not anything else.
This enables further work on implementing KASLR by adding relocation
support to the pre-kernel and updating the kernel to be less dependent
on specific virtual memory layouts.
GCC and Clang allow us to inject a call to a function named
__sanitizer_cov_trace_pc on every edge. This function has to be defined
by us. By noting down the caller in that function we can trace the code
we have encountered during execution. Such information is used by
coverage guided fuzzers like AFL and LibFuzzer to determine if a new
input resulted in a new code path. This makes fuzzing much more
effective.
Additionally this adds a basic KCOV implementation. KCOV is an API that
allows user space to request the kernel to start collecting coverage
information for a given user space thread. Furthermore KCOV then exposes
the collected program counters to user space via a BlockDevice which can
be mmaped from user space.
This work is required to add effective support for fuzzing SerenityOS to
the Syzkaller syscall fuzzer. :^) :^)
We don't need an entirely separate VMObject subclass to influence the
location of the physical pages.
Instead, we simply allocate enough physically contiguous memory first,
and then pass it to the AnonymousVMObject constructor that takes a span
of physical pages.
This patch changes the semantics of purgeable memory.
- AnonymousVMObject now has a "purgeable" flag. It can only be set when
constructing the object. (Previously, all anonymous memory was
effectively purgeable.)
- AnonymousVMObject now has a "volatile" flag. It covers the entire
range of physical pages. (Previously, we tracked ranges of volatile
pages, effectively making it a page-level concept.)
- Non-volatile objects maintain a physical page reservation via the
committed pages mechanism, to ensure full coverage for page faults.
- When an object is made volatile, it relinquishes any unused committed
pages immediately. If later made non-volatile again, we then attempt
to make a new committed pages reservation. If this fails, we return
ENOMEM to userspace.
mmap() now creates purgeable objects if passed the MAP_PURGEABLE option
together with MAP_ANONYMOUS. anon_create() memory is always purgeable.
GCC-11 added a new option `-fzero-call-used-regs` which causes the
compiler to zero function arguments before return of a function. The
goal being to reduce the possible attack surface by disarming ROP
gadgets that might be potentially useful to attackers, and reducing
the risk of information leaks via stale register data. You can find
the GCC commit below[0].
This is a mitigation I noticed on the Linux KSPP issue tracker[1] and
thought it would be useful mitigation for the SerenityOS Kernel.
The reduction in ROP gadgets is observable using the ropgadget utility:
$ ROPgadget --nosys --nojop --binary Kernel | tail -n1
Unique gadgets found: 42754
$ ROPgadget --nosys --nojop --binary Kernel.RegZeroing | tail -n1
Unique gadgets found: 41238
The size difference for the i686 Kernel binary is negligible:
$ size Kernel Kernel.RegZerogin
text data bss dec hex filename
13253648 7729637 6302360 27285645 1a0588d Kernel
13277504 7729637 6302360 27309501 1a0b5bd Kernel.RegZeroing
We don't have any great workloads to measure regressions in Kernel
performance, but Kees Cook mentioned he measured only around %1
performance regression with this enabled on his Linux kernel build.[2]
References:
[0] d10f3e900b
[1] https://github.com/KSPP/linux/issues/84
[2] https://lore.kernel.org/lkml/20210714220129.844345-1-keescook@chromium.org/
This implements a simple bootloader that is capable of loading ELF64
kernel images. It does this by using QEMU/GRUB to load the kernel image
from disk and pass it to our bootloader as a Multiboot module.
The bootloader then parses the ELF image and sets it up appropriately.
The kernel's entry point is a C++ function with architecture-native
code.
Co-authored-by: Liav A <liavalb@gmail.com>
By default, the compiler will assume that `operator new` returns
pointers that are aligned correctly for every built-in type. This is not
the case in the kernel on x64, since the assumed alignment is 16
(because of long double), but the kmalloc blocks are only
`alignas(void*)`.
This adds a new section .ksyms at the end of the linker map, reserves
5MiB for it (which are after end_of_kernel_image so they get re-used
once MemoryManager is initialized) and then embeds the symbol map into
the kernel binary with objcopy. This also shrinks the .ksyms section to
the real size of the symbol file (around 900KiB at the moment).
By doing this we can make the symbol map available much earlier in the
boot process, i.e. even before VFS is available.
The previous allocator was very naive and kept the state of all pages
in one big bitmap. When allocating, we had to scan through the bitmap
until we found an unset bit.
This patch introduces a new binary buddy allocator that manages the
physical memory pages.
Each PhysicalRegion is divided into zones (PhysicalZone) of 16MB each.
Any extra pages at the end of physical RAM that don't fit into a 16MB
zone are turned into 15 or fewer 1MB zones.
Each zone starts out with one full-sized block, which is then
recursively subdivided into halves upon allocation, until a block of
the request size can be returned.
There are more opportunities for improvement here: the way zone objects
are allocated and stored is non-optimal. Same goes for the allocation
of buddy block state bitmaps.
This involves refactoring VirtIOConsole into VirtIOConsole and
VirtIOConsolePort. VirtIOConsole is the VirtIODevice, it owns multiple
VirtIOConsolePorts as well as two control queues. Each
VirtIOConsolePort is a CharacterDevice.
This replaces all uses of LexicalPath in the Kernel with the functions
from KLexicalPath. This also allows the Kernel to stop including
AK::LexicalPath.
This adds KLexicalPath, which are a few static functions which aim to
mostly emulate AK::LexicalPath. They are however constrained to work
with absolute paths only, containing no '.' or '..' path segments and no
consecutive slashes. This way, it is possible to avoid use StringView
for the return values and thus avoid allocating new String objects.
As explained above, the functions are currently very strict about the
allowed input paths. This seems to not be a problem currently. Since the
functions VERIFY this, potential bugs caused by this will become
immediately obvious.
Instead of using one file for the entire "backend" of the ProcFS data
and metadata, we could split that file into two files that represent
2 logical chunks of the ProcFS exposed objects:
1. Global and inter-process information. This includes all fixed data in
the root folder of the ProcFS, networking information and the bus
folder.
2. Per-process information. This includes all dynamic data about a
process that resides in the /proc/PID/ folder.
This change makes it more easier to read the code and to change it,
hence we do it although there's no technical benefit from it now :)
The new ProcFS design consists of two main parts:
1. The representative ProcFS class, which is derived from the FS class.
The ProcFS and its inodes are much more lean - merely 3 classes to
represent the common type of inodes - regular files, symbolic links and
directories. They're backed by a ProcFSExposedComponent object, which
is responsible for the functional operation behind the scenes.
2. The backend of the ProcFS - the ProcFSComponentsRegistrar class
and all derived classes from the ProcFSExposedComponent class. These
together form the entire backend and handle all the functions you can
expect from the ProcFS.
The ProcFSExposedComponent derived classes split to 3 types in the
manner of lifetime in the kernel:
1. Persistent objects - this category includes all basic objects, like
the root folder, /proc/bus folder, main blob files in the root folders,
etc. These objects are persistent and cannot die ever.
2. Semi-persistent objects - this category includes all PID folders,
and subdirectories to the PID folders. It also includes exposed objects
like the unveil JSON'ed blob. These object are persistent as long as the
the responsible process they represent is still alive.
3. Dynamic objects - this category includes files in the subdirectories
of a PID folder, like /proc/PID/fd/* or /proc/PID/stacks/*. Essentially,
these objects are always created dynamically and when no longer in need
after being used, they're deallocated.
Nevertheless, the new allocated backend objects and inodes try to use
the same InodeIndex if possible - this might change only when a thread
dies and a new thread is born with a new thread stack, or when a file
descriptor is closed and a new one within the same file descriptor
number is opened. This is needed to actually be able to do something
useful with these objects.
The new design assures that many ProcFS instances can be used at once,
with one backend for usage for all instances.
The intention is to add dynamic mechanism for notifying the userspace
about hotplug events. Currently, the DMI (SMBIOS) blobs and ACPI tables
are exposed in the new filesystem.
Neither the kernel nor LibELF support loading libraries with larger
PT_LOAD alignment. The default on x86 is 4096 while it's 2MiB on x86_64.
This changes the alignment to 4096 on all platforms.