Commit graph

837 commits

Author SHA1 Message Date
Liav A cbf78975f1 Kernel: Add the futimens syscall
We have a problem with the original utimensat syscall because when we
do call LibC futimens function, internally we provide an empty path,
and the Kernel get_syscall_path_argument method will detect this as an
invalid path.

This happens to spit an error for example in the touch utility, so if a
user is running "touch non_existing_file", it will create that file, but
the user will still see an error coming from LibC futimens function.

This new syscall gets an open file description and it provides the same
functionality as utimensat, on the specified open file description.
The new syscall will be used later by LibC to properly implement LibC
futimens function so the situation described with relation to the
"touch" utility could be fixed.
2023-04-10 10:21:28 +02:00
Liav A 5a94e8dfd0 Kernel: Ensure jailed processes can be reaped by a jailed parent process
We were detaching from the jail process list too early. To ensure we
detach properly, leverage the remove_from_secondary_lists method
so the possibly jailed parent process can still see the dying process
and therefore clean it properly.
2023-04-09 18:49:01 +02:00
Idan Horowitz 65641187ff Kernel: Restructure execve to ensure Process::m_space is always in use
Instead of setting up the new address space on it's own, and only swap
to the new address space at the end, we now immediately swap to the new
address space (while still keeping the old one alive) and only revert
back to the old one if we fail at any point.

This is done to ensure that the process' active address space (aka the
contents of m_space) always matches actual address space in use by it.
That should allow us to eventually make the page fault handler process-
aware, which will let us properly lock the process address space lock.
2023-04-06 20:30:03 +03:00
Andreas Kling 84ac957d7a Kernel: Make Credentials the authority on process SID
The SID was duplicated between the process credentials and protected
data. And to make matters worse, the credentials SID was not updated in
sys$setsid.

This patch fixes this by removing the SID from protected data and
updating the credentials SID everywhere.
2023-04-05 11:37:27 +02:00
Andreas Kling e69b2572a6 Kernel: Move Process's TTY pointer into protected data 2023-04-05 11:37:27 +02:00
Andreas Kling 1e2ef59965 Kernel: Move Process's process group pointer into protected data
Now that it's no longer using LockRefPtr, we can actually move it into
protected data. (LockRefPtr couldn't be stored there because protected
data is immutable at times, and LockRefPtr uses some of its own bits
for locking.)
2023-04-05 11:37:27 +02:00
Andreas Kling 1c77803845 Kernel: Stop using *LockRefPtr for TTY
TTY was only stored in Process::m_tty, so make that a SpinlockProtected.
2023-04-05 11:37:27 +02:00
Andreas Kling 3371165588 Kernel: Make the getsockname/getpeername syscall helper a bit nicer
Instead of templatizing on a bool parameter, use an enum for clarity.
2023-04-04 10:33:42 +02:00
Andreas Kling 5bc7882b68 Kernel: Make sys$times not use the big lock
...and also make the Process tick counters clock_t instead of u32.
It seems harmless to get interrupted in the middle of reading these
counters and reporting slightly fewer ticks in some category.
2023-04-04 10:33:42 +02:00
Andreas Kling 496d918e92 Kernel: Stop using *LockRefPtr for Kernel::Timer 2023-04-04 10:33:42 +02:00
Andreas Kling 83b409083b Kernel: Stop using *LockRefPtr for ProcessGroup
Had to wrap Process::m_pg in a SpinlockProtected for this to be safe.
2023-04-04 10:33:42 +02:00
Andreas Kling c3915e4058 Kernel: Stop using *LockRefPtr for Thread
These were stored in a bunch of places. The main one that's a bit iffy
is the Mutex::m_holder one, which I'm going to simplify in a subsequent
commit.

In Plan9FS and WorkQueue, we can't make the NNRPs const due to
initialization order problems. That's probably doable with further
cleanup, but left as an exercise for our future selves.

Before starting this, I expected the thread blockers to be a problem,
but as it turns out they were super straightforward (for once!) as they
don't mutate the thread after initiating a block, so they can just use
simple const-ified NNRPs.
2023-04-04 10:33:42 +02:00
Andreas Kling a098266ff5 Kernel: Simplify Process factory functions
- Instead of taking the first new thread as an out-parameter, we now
  bundle the process and its first thread in a struct and use that
  as the return value.

- Make all Process factory functions return ErrorOr. Use this to convert
  some places to more TRY().

- Drop the "try_" prefix on Process factory functions.
2023-04-04 10:33:42 +02:00
Andreas Kling 65438d8a85 Kernel: Stop using *LockRefPtr for Process pointers
The only persistent one of these was Thread::m_process and that never
changes after initialization. Make it const to enforce this and switch
everything over to RefPtr & NonnullRefPtr.
2023-04-04 10:33:42 +02:00
Liav A d16d805d96 Kernel: Merge {get,set}_process_name syscalls to the prctl syscall
It makes much more sense to have these actions being performed via the
prctl syscall, as they both require 2 plain arguments to be passed to
the syscall layer, and in contrast to most syscalls, we don't get in
these removed syscalls an automatic representation of Userspace<T>, but
two FlatPtr(s) to perform casting on them in the prctl syscall which is
suited to what has been done in the removed syscalls.

Also, it makes sense to have these actions in the prctl syscall, because
they are strongly related to the process control concept of the prctl
syscall.
2023-03-15 20:10:48 +01:00
Liav A 633006926f Kernel: Make the Jails' internal design a lot more sane
This is done with 2 major steps:
1. Remove JailManagement singleton and use a structure that resembles
    what we have with the Process object. This is required later for the
    second step in this commit, but on its own, is a major change that
    removes this clunky singleton that had no real usage by itself.
2. Use IntrusiveLists to keep references to Process objects in the same
    Jail so it will be much more straightforward to iterate on this kind
    of objects when needed. Previously we locked the entire Process list
    and we did a simple pointer comparison to check if the checked
    Process we iterate on is in the same Jail or not, which required
    taking multiple Spinlocks in a very clumsy and heavyweight way.
2023-03-12 10:21:59 -06:00
Andreas Kling e6fc7b3ff7 Kernel: Switch LockRefPtr<Inode> to RefPtr<Inode>
The main place where this is a little iffy is in RAMFS where inodes
have a LockWeakPtr to their parent inode. I've left that as a
LockWeakPtr for now.
2023-03-09 21:54:59 +01:00
Andreas Kling d1371d66f7 Kernel: Use non-locking {Nonnull,}RefPtr for OpenFileDescription
This patch switches away from {Nonnull,}LockRefPtr to the non-locking
smart pointers throughout the kernel.

I've looked at the handful of places where these were being persisted
and I don't see any race situations.

Note that the process file descriptor table (Process::m_fds) was already
guarded via MutexProtected.
2023-03-07 00:30:12 +01:00
Andreas Kling 7369d0ab5f Kernel: Stop using NonnullLockRefPtrVector 2023-03-06 23:46:36 +01:00
Andreas Kling 359d6e7b0b Everywhere: Stop using NonnullOwnPtrVector
Same as NonnullRefPtrVector: weird semantics, questionable benefits.
2023-03-06 23:46:35 +01:00
Liav A be1d7c325a Kernel: Move process coredump metadata modification to the prctl syscall 2023-03-05 16:55:08 +01:00
Liav A 11a7e21c2a Kernel+Userland: Add support for using the PCSpeaker with various tones 2023-03-05 08:38:29 +00:00
Liav A 800e244ed9 Kernel+LibC: Move the FD_SETSIZE declaration to API/POSIX/select.h file 2023-03-01 19:36:53 -07:00
Liav A bedd90b1f0 Kernel: Properly lock Process protected data in the prctl syscall 2023-02-24 22:26:07 +01:00
Liav A c56e1c5378 Kernel/FileSystem: Simplify the ProcFS significantly
Since the ProcFS doesn't hold many global objects within it, the need
for a fully-structured design of backing components and a registry like
with the SysFS is no longer true.

To acommodate this, let's remove all backing store and components of the
ProcFS, so now it resembles what we had in the early days of ProcFS in
the project - a mostly-static filesystem, with very small amount of
kmalloc allocations needed.
We still use the inode index mechanism to understand the role of each
inode, but this is done in a much "static"ier way than before.
2023-02-24 22:14:18 +01:00
Liav A 9216caeec2 Kernel: Fix typo proccess => process in a name of Process method 2023-02-24 22:14:18 +01:00
Timon Kruiper 3295137224 Kernel: Add optional userspace backtrace to Process::crash
This is very useful for debugging the initial userspace applications, as
the CrashReporter is not yet running.
2023-02-08 18:19:48 +00:00
Sam Atkins fe7b08dad7 Kernel: Protect Process::m_name with a spinlock
This also lets us remove the `get_process_name` and `set_process_name`
syscalls from the big lock. :^)
2023-02-06 20:36:53 +01:00
Liav A 722ae35329 Kernel/FileSystem: Simplify the ProcFS inode code
This is done by merging all scattered pieces of derived classes from the
ProcFSInode class into that one class, so we don't use inheritance but
rather simplistic checks to determine the proper code for each ProcFS
inode with its specific characteristics.
2023-01-29 12:59:30 +01:00
Timon Kruiper 12322670cb Kernel: Use InterruptsState abstraction in execve.cpp
This was using the x86_64 specific cpu_flags abstraction, which is not
compatible with aarch64.
2023-01-27 20:47:08 +00:00
Andrew Kaster 046c23f567 Kernel+LibC: Move LibC/signal_numbers.h to Kernel/API/POSIX
Make Userland and Tests users just include signal.h, and move Kernel
users to the new API file.
2023-01-21 10:43:59 -07:00
Andreas Kling 5dcc58d54a Kernel+LibCore: Make %sid path parsing not take ages
Before this patch, Core::SessionManagement::parse_path_with_sid() would
figure out the root session ID by sifting through /sys/kernel/processes.

That file can take quite a while to generate (sometimes up to 40ms on my
machine, which is a problem on its own!) and with no caching, many of
our programs were effectively doing this multiple times on startup when
unveiling something in /tmp/session/%sid/

While we should find ways to make generating /sys/kernel/processes fast
again, this patch addresses the specific problem by introducing a new
syscall: sys$get_root_session_id(). This extracts the root session ID
by looking directly at the process table and takes <1ms instead of 40ms.

This cuts WebContent process startup time by ~100ms on my machine. :^)
2023-01-10 19:32:31 +01:00
Liav A 04221a7533 Kernel: Mark Process::jail() method as const
We really don't want callers of this function to accidentally change
the jail, or even worse - remove the Process from an attached jail.
To ensure this never happens, we can just declare this method as const
so nobody can mutate it this way.
2023-01-07 03:44:59 +03:30
Liav A d8ebcaede8 Kernel: Add helper function to check if a Process is in jail
Use this helper function in various places to replace the old code of
acquiring the SpinlockProtected<RefPtr<Jail>> of a Process to do that
validation.
2023-01-06 17:29:47 +01:00
Nico Weber a96f307af1 Everywhere: Make global inline functions not static
`inline` already assigns vague linkage, so there's no need to
also assign per-TU linkage. Allows the linker to dedup these
functions across TUs (and is almost always just the Right Thing
to do in C++ -- this ain't C).
2023-01-04 20:04:57 +01:00
Nico Weber 0a3cc10bb6 Everywhere: Remove some redundant inline keywords
Functions defined inside class bodies (including static functions)
are implicitly inline, no need to type it out.
2023-01-04 20:04:57 +01:00
kleines Filmröllchen a6a439243f Kernel: Turn lock ranks into template parameters
This step would ideally not have been necessary (increases amount of
refactoring and templates necessary, which in turn increases build
times), but it gives us a couple of nice properties:
- SpinlockProtected inside Singleton (a very common combination) can now
  obtain any lock rank just via the template parameter. It was not
  previously possible to do this with SingletonInstanceCreator magic.
- SpinlockProtected's lock rank is now mandatory; this is the majority
  of cases and allows us to see where we're still missing proper ranks.
- The type already informs us what lock rank a lock has, which aids code
  readability and (possibly, if gdb cooperates) lock mismatch debugging.
- The rank of a lock can no longer be dynamic, which is not something we
  wanted in the first place (or made use of). Locks randomly changing
  their rank sounds like a disaster waiting to happen.
- In some places, we might be able to statically check that locks are
  taken in the right order (with the right lock rank checking
  implementation) as rank information is fully statically known.

This refactoring even more exposes the fact that Mutex has no lock rank
capabilites, which is not fixed here.
2023-01-02 18:15:27 -05:00
Liav A e598f22768 Kernel: Disallow executing SUID binaries if process is jailed
Check if the process we are currently running is in a jail, and if that
is the case, fail early with the EPERM error code.

Also, as Brian noted, we should also disallow attaching to a jail in
case of already running within a setid executable, as this leaves the
user with false thinking of being secure (because you can't exec new
setid binaries), but the current program is still marked setid, which
means that at the very least we gained permissions while we didn't
expect it, so let's block it.
2022-12-30 15:49:37 -05:00
Liav A 6c0486277e Kernel: Reintroduce the msyscall syscall as the annotate_mapping syscall
This syscall will be used later on to ensure we can declare virtual
memory mappings as immutable (which means that the underlying Region is
basically immutable for both future annotations or changing the
protection bits of it).
2022-12-16 01:02:00 -07:00
Agustin Gianni ac40090583 Kernel: Add the auxiliary vector to the stack size validation
This patch validates that the size of the auxiliary vector does not
exceed `Process::max_auxiliary_size`. The auxiliary vector is a range
of memory in userspace stack where the kernel can pass information to
the process that will be created via `Process:do_exec`.

The reason the kernel needs to validate its size is that the about to
be created process needs to have remaining space on the stack.
Previously only `argv` and `envp` were taken into account for the
size validation, with this patch, the size of `auxv` is also
checked. All three elements contain values that a user (or an
attacker) can specify.

This patch adds the constant `Process::max_auxiliary_size` which is
defined to be one eight of the user-space stack size. This is the
approach taken by `Process:max_arguments_size` and
`Process::max_environment_size` which are used to check the sizes
of `argv` and `envp`.
2022-12-14 15:09:28 +00:00
sin-ack 9b425b860c Kernel+LibC+Tests: Implement pwritev(2)
While this isn't really POSIX, it's needed by the Zig port and was
simple enough to implement.
2022-12-11 19:55:37 -07:00
sin-ack 70337f3a4b Kernel+LibC: Implement setregid(2)
This copies and adapts the setresgid syscall, following in the footsteps
of setreuid and setresuid.
2022-12-11 19:55:37 -07:00
sin-ack 2a502fe232 Kernel+LibC+LibCore+UserspaceEmulator: Implement faccessat(2)
Co-Authored-By: Daniel Bertalan <dani@danielbertalan.dev>
2022-12-11 19:55:37 -07:00
sin-ack eb5389e933 Kernel+LibC+LibCore: Implement mkdirat(2) 2022-12-11 19:55:37 -07:00
sin-ack 5c1d5ed51d Kernel: Implement Process::custody_for_dirfd
This allows deduplicating a bunch of code that has to work with
POSIX' *at syscall semantics.
2022-12-11 19:55:37 -07:00
Liav A 718ae68621 Kernel+LibCore+LibC: Implement support for forcing unveil on exec
To accomplish this, we add another VeilState which is called
LockedInherited. The idea is to apply exec unveil data, similar to
execpromises of the pledge syscall, on the current exec'ed program
during the execve sequence. When applying the forced unveil data, the
veil state is set to be locked but the special state of LockedInherited
ensures that if the new program tries to unveil paths, the request will
silently be ignored, so the program will continue running without
receiving an error, but is still can only use the paths that were
unveiled before the exec syscall. This in turn, allows us to use the
unveil syscall with a special utility to sandbox other userland programs
in terms of what is visible to them on the filesystem, and is usable on
both programs that use or don't use the unveil syscall in their code.
2022-11-26 12:42:15 -07:00
Liav A 5e062414c1 Kernel: Add support for jails
Our implementation for Jails resembles much of how FreeBSD jails are
working - it's essentially only a matter of using a RefPtr in the
Process class to a Jail object. Then, when we iterate over all processes
in various cases, we could ensure if either the current process is in
jail and therefore should be restricted what is visible in terms of
PID isolation, and also to be able to expose metadata about Jails in
/sys/kernel/jails node (which does not reveal anything to a process
which is in jail).

A lifetime model for the Jail object is currently plain simple - there's
simpy no way to manually delete a Jail object once it was created. Such
feature should be carefully designed to allow safe destruction of a Jail
without the possibility of releasing a process which is in Jail from the
actual jail. Each process which is attached into a Jail cannot leave it
until the end of a Process (i.e. when finalizing a Process). All jails
are kept being referenced in the JailManagement. When a last attached
process is finalized, the Jail is automatically destroyed.
2022-11-05 18:00:58 -06:00
kleines Filmröllchen b8567d7a9d Kernel: Make scheduler control syscalls more generic
The syscalls are renamed as they no longer reflect the exact POSIX
functionality. They can now handle setting/getting scheduler parameters
for both threads and processes.
2022-10-27 11:30:19 +01:00
Idan Horowitz 4ce326205e Kernel: Stop verifying interrupts are disabled in Process::for_each
This is a left-over from back when we didn't have any locking on the
global Process list, nor did we have SMP support, so this acted as some
kind of locking mechanism. We now have proper locks around the Process
list, so this is no longer relevant.
2022-08-27 21:54:13 +03:00
Andreas Kling d3e8eb5918 Kernel: Make file-backed memory regions remember description permissions
This allows sys$mprotect() to honor the original readable & writable
flags of the open file description as they were at the point we did the
original sys$mmap().

IIUC, this is what Dr. POSIX wants us to do:
https://pubs.opengroup.org/onlinepubs/9699919799/functions/mprotect.html

Also, remove the bogus and racy "W^X" checking we did against mappings
based on their current inode metadata. If we want to do this, we can do
it properly. For now, it was not only racy, but also did blocking I/O
while holding a spinlock.
2022-08-24 14:57:51 +02:00