Commit graph

804 commits

Author SHA1 Message Date
Liav A d8ebcaede8 Kernel: Add helper function to check if a Process is in jail
Use this helper function in various places to replace the old code of
acquiring the SpinlockProtected<RefPtr<Jail>> of a Process to do that
validation.
2023-01-06 17:29:47 +01:00
Nico Weber a96f307af1 Everywhere: Make global inline functions not static
`inline` already assigns vague linkage, so there's no need to
also assign per-TU linkage. Allows the linker to dedup these
functions across TUs (and is almost always just the Right Thing
to do in C++ -- this ain't C).
2023-01-04 20:04:57 +01:00
Nico Weber 0a3cc10bb6 Everywhere: Remove some redundant inline keywords
Functions defined inside class bodies (including static functions)
are implicitly inline, no need to type it out.
2023-01-04 20:04:57 +01:00
kleines Filmröllchen a6a439243f Kernel: Turn lock ranks into template parameters
This step would ideally not have been necessary (increases amount of
refactoring and templates necessary, which in turn increases build
times), but it gives us a couple of nice properties:
- SpinlockProtected inside Singleton (a very common combination) can now
  obtain any lock rank just via the template parameter. It was not
  previously possible to do this with SingletonInstanceCreator magic.
- SpinlockProtected's lock rank is now mandatory; this is the majority
  of cases and allows us to see where we're still missing proper ranks.
- The type already informs us what lock rank a lock has, which aids code
  readability and (possibly, if gdb cooperates) lock mismatch debugging.
- The rank of a lock can no longer be dynamic, which is not something we
  wanted in the first place (or made use of). Locks randomly changing
  their rank sounds like a disaster waiting to happen.
- In some places, we might be able to statically check that locks are
  taken in the right order (with the right lock rank checking
  implementation) as rank information is fully statically known.

This refactoring even more exposes the fact that Mutex has no lock rank
capabilites, which is not fixed here.
2023-01-02 18:15:27 -05:00
Liav A e598f22768 Kernel: Disallow executing SUID binaries if process is jailed
Check if the process we are currently running is in a jail, and if that
is the case, fail early with the EPERM error code.

Also, as Brian noted, we should also disallow attaching to a jail in
case of already running within a setid executable, as this leaves the
user with false thinking of being secure (because you can't exec new
setid binaries), but the current program is still marked setid, which
means that at the very least we gained permissions while we didn't
expect it, so let's block it.
2022-12-30 15:49:37 -05:00
Liav A 6c0486277e Kernel: Reintroduce the msyscall syscall as the annotate_mapping syscall
This syscall will be used later on to ensure we can declare virtual
memory mappings as immutable (which means that the underlying Region is
basically immutable for both future annotations or changing the
protection bits of it).
2022-12-16 01:02:00 -07:00
Agustin Gianni ac40090583 Kernel: Add the auxiliary vector to the stack size validation
This patch validates that the size of the auxiliary vector does not
exceed `Process::max_auxiliary_size`. The auxiliary vector is a range
of memory in userspace stack where the kernel can pass information to
the process that will be created via `Process:do_exec`.

The reason the kernel needs to validate its size is that the about to
be created process needs to have remaining space on the stack.
Previously only `argv` and `envp` were taken into account for the
size validation, with this patch, the size of `auxv` is also
checked. All three elements contain values that a user (or an
attacker) can specify.

This patch adds the constant `Process::max_auxiliary_size` which is
defined to be one eight of the user-space stack size. This is the
approach taken by `Process:max_arguments_size` and
`Process::max_environment_size` which are used to check the sizes
of `argv` and `envp`.
2022-12-14 15:09:28 +00:00
sin-ack 9b425b860c Kernel+LibC+Tests: Implement pwritev(2)
While this isn't really POSIX, it's needed by the Zig port and was
simple enough to implement.
2022-12-11 19:55:37 -07:00
sin-ack 70337f3a4b Kernel+LibC: Implement setregid(2)
This copies and adapts the setresgid syscall, following in the footsteps
of setreuid and setresuid.
2022-12-11 19:55:37 -07:00
sin-ack 2a502fe232 Kernel+LibC+LibCore+UserspaceEmulator: Implement faccessat(2)
Co-Authored-By: Daniel Bertalan <dani@danielbertalan.dev>
2022-12-11 19:55:37 -07:00
sin-ack eb5389e933 Kernel+LibC+LibCore: Implement mkdirat(2) 2022-12-11 19:55:37 -07:00
sin-ack 5c1d5ed51d Kernel: Implement Process::custody_for_dirfd
This allows deduplicating a bunch of code that has to work with
POSIX' *at syscall semantics.
2022-12-11 19:55:37 -07:00
Liav A 718ae68621 Kernel+LibCore+LibC: Implement support for forcing unveil on exec
To accomplish this, we add another VeilState which is called
LockedInherited. The idea is to apply exec unveil data, similar to
execpromises of the pledge syscall, on the current exec'ed program
during the execve sequence. When applying the forced unveil data, the
veil state is set to be locked but the special state of LockedInherited
ensures that if the new program tries to unveil paths, the request will
silently be ignored, so the program will continue running without
receiving an error, but is still can only use the paths that were
unveiled before the exec syscall. This in turn, allows us to use the
unveil syscall with a special utility to sandbox other userland programs
in terms of what is visible to them on the filesystem, and is usable on
both programs that use or don't use the unveil syscall in their code.
2022-11-26 12:42:15 -07:00
Liav A 5e062414c1 Kernel: Add support for jails
Our implementation for Jails resembles much of how FreeBSD jails are
working - it's essentially only a matter of using a RefPtr in the
Process class to a Jail object. Then, when we iterate over all processes
in various cases, we could ensure if either the current process is in
jail and therefore should be restricted what is visible in terms of
PID isolation, and also to be able to expose metadata about Jails in
/sys/kernel/jails node (which does not reveal anything to a process
which is in jail).

A lifetime model for the Jail object is currently plain simple - there's
simpy no way to manually delete a Jail object once it was created. Such
feature should be carefully designed to allow safe destruction of a Jail
without the possibility of releasing a process which is in Jail from the
actual jail. Each process which is attached into a Jail cannot leave it
until the end of a Process (i.e. when finalizing a Process). All jails
are kept being referenced in the JailManagement. When a last attached
process is finalized, the Jail is automatically destroyed.
2022-11-05 18:00:58 -06:00
kleines Filmröllchen b8567d7a9d Kernel: Make scheduler control syscalls more generic
The syscalls are renamed as they no longer reflect the exact POSIX
functionality. They can now handle setting/getting scheduler parameters
for both threads and processes.
2022-10-27 11:30:19 +01:00
Idan Horowitz 4ce326205e Kernel: Stop verifying interrupts are disabled in Process::for_each
This is a left-over from back when we didn't have any locking on the
global Process list, nor did we have SMP support, so this acted as some
kind of locking mechanism. We now have proper locks around the Process
list, so this is no longer relevant.
2022-08-27 21:54:13 +03:00
Andreas Kling d3e8eb5918 Kernel: Make file-backed memory regions remember description permissions
This allows sys$mprotect() to honor the original readable & writable
flags of the open file description as they were at the point we did the
original sys$mmap().

IIUC, this is what Dr. POSIX wants us to do:
https://pubs.opengroup.org/onlinepubs/9699919799/functions/mprotect.html

Also, remove the bogus and racy "W^X" checking we did against mappings
based on their current inode metadata. If we want to do this, we can do
it properly. For now, it was not only racy, but also did blocking I/O
while holding a spinlock.
2022-08-24 14:57:51 +02:00
Andreas Kling cf16b2c8e6 Kernel: Wrap process address spaces in SpinlockProtected
This forces anyone who wants to look into and/or manipulate an address
space to lock it. And this replaces the previous, more flimsy, manual
spinlock use.

Note that pointers *into* the address space are not safe to use after
you unlock the space. We've got many issues like this, and we'll have
to track those down as wlel.
2022-08-24 14:57:51 +02:00
Samuel Bowman 91574ed677 Kernel: Fix boot profiling
Boot profiling was previously broken due to init_stage2() passing the
event mask to sys$profiling_enable() via kernel pointer, but a user
pointer is expected.

To fix this, I added Process::profiling_enable() as an alternative to
Process::sys$profiling_enable which takes a u64 rather than a
Userspace<u64 const*>. It's a bit of a hack, but it works.
2022-08-23 11:48:50 +02:00
Anthony Iacono ec3d8a7a18 Kernel: Remove unused Process::in_group() 2022-08-23 01:01:48 +02:00
Anthony Iacono f86b671de2 Kernel: Use Process::credentials() and remove user ID/group ID helpers
Move away from using the group ID/user ID helpers in the process to
allow for us to take advantage of the immutable credentials instead.
2022-08-22 12:46:32 +02:00
Andreas Kling 8ed06ad814 Kernel: Guard Process "protected data" with a spinlock
This ensures that both mutable and immutable access to the protected
data of a process is serialized.

Note that there may still be multiple TOCTOU issues around this, as we
have a bunch of convenience accessors that make it easy to introduce
them. We'll need to audit those as well.
2022-08-21 12:25:14 +02:00
Andreas Kling 728c3fbd14 Kernel: Use RefPtr instead of LockRefPtr for Custody
By protecting all the RefPtr<Custody> objects that may be accessed from
multiple threads at the same time (with spinlocks), we remove the need
for using LockRefPtr<Custody> (which is basically a RefPtr with a
built-in spinlock.)
2022-08-21 12:25:14 +02:00
Andreas Kling 122d7d9533 Kernel: Add Credentials to hold a set of user and group IDs
This patch adds a new object to hold a Process's user credentials:

- UID, EUID, SUID
- GID, EGID, SGID, extra GIDs

Credentials are immutable and child processes initially inherit the
Credentials object from their parent.

Whenever a process changes one or more of its user/group IDs, a new
Credentials object is constructed.

Any code that wants to inspect and act on a set of credentials can now
do so without worrying about data races.
2022-08-20 18:32:50 +02:00
Andreas Kling bec314611d Kernel: Move InodeMetadata methods out of line 2022-08-20 17:20:44 +02:00
Andreas Kling 11eee67b85 Kernel: Make self-contained locking smart pointers their own classes
Until now, our kernel has reimplemented a number of AK classes to
provide automatic internal locking:

- RefPtr
- NonnullRefPtr
- WeakPtr
- Weakable

This patch renames the Kernel classes so that they can coexist with
the original AK classes:

- RefPtr => LockRefPtr
- NonnullRefPtr => NonnullLockRefPtr
- WeakPtr => LockWeakPtr
- Weakable => LockWeakable

The goal here is to eventually get rid of the Lock* classes in favor of
using external locking.
2022-08-20 17:20:43 +02:00
kleines Filmröllchen 4314c25cf2 Kernel: Require lock rank for Spinlock construction
All users which relied on the default constructor use a None lock rank
for now. This will make it easier to in the future remove LockRank and
actually annotate the ranks by searching for None.
2022-08-19 20:26:47 -07:00
Linus Groh 146903a3b5 Kernel: Require semicolon after VERIFY_{NO_,}PROCESS_BIG_LOCK_ACQUIRED
This matches out general macro use, and specifically other verification
macros like VERIFY(), VERIFY_NOT_REACHED(), VERIFY_INTERRUPTS_ENABLED(),
and VERIFY_INTERRUPTS_DISABLED().
2022-08-17 22:56:51 +02:00
Andreas Kling b6d0636656 Kernel: Don't leak file descriptors in sys$pipe()
If the final copy_to_user() call fails when writing the file descriptors
to the output array, we have to make sure the file descriptors don't
remain in the process file descriptor table. Otherwise they are
basically leaked, as userspace is not aware of them.

This matches the behavior of our sys$socketpair() implementation.
2022-08-16 20:35:32 +02:00
zzLinus ca74443012 Kernel/LibC: Implement posix syscall clock_getres() 2022-07-25 15:33:50 +02:00
Idan Horowitz 9db10887a1 Kernel: Clean up sys$futex and add support for cross-process futexes 2022-07-21 16:39:22 +02:00
Idan Horowitz 55c7496200 Kernel: Propagate OOM conditions out of sys$futex 2022-07-21 16:39:22 +02:00
Hendiadyoin1 d783389877 Kernel+LibC: Add posix_fallocate syscall 2022-07-15 12:42:43 +02:00
sin-ack 3f3f45580a Everywhere: Add sv suffix to strings relying on StringView(char const*)
Each of these strings would previously rely on StringView's char const*
constructor overload, which would call __builtin_strlen on the string.
Since we now have operator ""sv, we can replace these with much simpler
versions. This opens the door to being able to remove
StringView(char const*).

No functional changes.
2022-07-12 23:11:35 +02:00
gggggg-gggggg d728017578 Kernel+LibC+LibCore: Pass fcntl extra argument as pointer-sized variable
The extra argument to fcntl is a pointer in the case of F_GETLK/F_SETLK
and we were pulling out a u32, leading to pointer truncation on x86_64.
Among other things, this fixes Assistant on x86_64 :^)
2022-07-10 20:09:11 +02:00
Tim Schumacher cf0ad3715e Kernel: Implement sigsuspend using a SignalBlocker
`sigsuspend` was previously implemented using a poll on an empty set of
file descriptors. However, this broke quite a few assumptions in
`SelectBlocker`, as it verifies at least one file descriptor to be
ready after waking up and as it relies on being notified by the file
descriptor.

A bare-bones `sigsuspend` may also be implemented by relying on any of
the `sigwait` functions, but as `sigsuspend` features several (currently
unimplemented) restrictions on how returns work, it is a syscall on its
own.
2022-07-08 22:27:38 +00:00
Tim Schumacher add4dd3589 Kernel: Do a POSIX-correct signal handler reset on exec 2022-07-05 20:58:38 +03:00
Andrew Kaster 940be19259 Kernel: Create /proc/pid/cmdline to expose process arguments in procfs
In typical serenity style, they are just a JSON array
2022-06-19 09:05:35 +02:00
Ariel Don 9a6bd85924 Kernel+LibC+VFS: Implement utimensat(3)
Create POSIX utimensat() library call and corresponding system call to
update file access and modification times.
2022-05-21 18:15:00 +02:00
MacDue d951e2ca97 Kernel: Add /proc/{pid}/children to ProcFS
This exposes the child processes for a process as a directory
of symlinks to the respective /proc entries for each child.

This makes for an easier and possibly more efficient way
to find and count a process's children. Previously the only
method was to parse the entire /proc/all JSON file.
2022-05-06 02:12:51 +04:30
sin-ack bc7c8879c5 Kernel+LibC+LibCore: Implement the unlinkat(2) syscall 2022-04-23 10:43:32 -07:00
Luke Wilde 1682b0b6d8 Kernel: Remove big lock from sys$set_coredump_metadata
The only requirement for this syscall is to make
Process::m_coredump_properties SpinlockProtected.
2022-04-09 21:51:16 +02:00
Jelle Raaijmakers 7826729ab2 Kernel: Track big lock blocked threads in separate list
When we lock a mutex, eventually `Thread::block` is invoked which could
in turn invoke `Process::big_lock().restore_exclusive_lock()`. This
would then try to add the current thread to a different blocked thread
list then the one in use for the original mutex being locked, and
because it's an intrusive list, the thread is removed from its original
list during the `.append()`. When the original mutex eventually
unblocks, we no longer have the thread in the intrusive blocked threads
list and we panic.

Solve this by making the big lock mutex special and giving it its own
blocked thread list. Because the process big lock is temporary and is
being actively removed from e.g. syscalls, it's a matter of time before
we can also remove the fix introduced by this commit.

Fixes issue #9401.
2022-04-06 18:27:19 +02:00
Idan Horowitz 086969277e Everywhere: Run clang-format 2022-04-01 21:24:45 +01:00
Ali Mohammad Pur 8233da3398 Kernel: Add a 'no_error' pledge promise
This makes pledge() ignore promises that would otherwise cause it to
fail with EPERM, which is very useful for allowing programs to run under
a "jail" so to speak, without having them termiate early due to a
failing pledge() call.
2022-03-26 21:34:56 +04:30
Liav A b5ef900ccd Kernel: Don't assume paths of TTYs and pseudo terminals anymore
The obsolete ttyname and ptsname syscalls are removed.
LibC doesn't rely on these anymore, and it helps simplifying the Kernel
in many places, so it's an overall an improvement.

In addition to that, /proc/PID/tty node is removed too as it is not
needed anymore by userspace to get the attached TTY of a process, as
/dev/tty (which is already a character device) represents that as well.
2022-03-22 20:26:05 +01:00
int16 256744ebdf Kernel: Make mmap validation functions return ErrorOr<void> 2022-03-22 12:20:19 +01:00
int16 4b96d9c813 Kernel: Move mmap validation functions to Process 2022-03-22 12:20:19 +01:00
Andreas Kling 580d89f093 Kernel: Put Process unveil state in a SpinlockProtected container
This makes path resolution safe to perform without holding the big lock.
2022-03-08 00:19:49 +01:00
Andreas Kling 24f02bd421 Kernel: Put Process's current directory in a SpinlockProtected
Also let's call it "current_directory" instead of "cwd" everywhere.
2022-03-08 00:19:49 +01:00