Commit graph

1163 commits

Author SHA1 Message Date
Andreas Kling 04c362b4dd Kernel: Fix TOCTOU in sys$unveil()
Make sure we reject the unveil attempt with EPERM if the veil was locked
by another thread while we were parsing argument (and not holding the
veil state spinlock.)

Thanks Brian for spotting this! :^)

Amendment to #14907.
2022-08-18 01:04:28 +02:00
Andreas Kling ae8558dd5c Kernel: Don't do path resolution in sys$chdir() while holding spinlock
Path resolution may do blocking I/O so we must not do it while holding
a spinlock. There are tons of problems like this throughout the kernel
and we need to find and fix all of them.
2022-08-18 00:58:34 +02:00
Samuel Bowman b5a2f59320 Kernel: Make sys$unveil() not take the big process lock
The unveil syscall uses the UnveilData struct which is already
SpinlockProtected, so there is no need to take the big lock.
2022-08-18 00:04:31 +02:00
Linus Groh 146903a3b5 Kernel: Require semicolon after VERIFY_{NO_,}PROCESS_BIG_LOCK_ACQUIRED
This matches out general macro use, and specifically other verification
macros like VERIFY(), VERIFY_NOT_REACHED(), VERIFY_INTERRUPTS_ENABLED(),
and VERIFY_INTERRUPTS_DISABLED().
2022-08-17 22:56:51 +02:00
Andreas Kling ce6e93d96b Kernel: Make sys$socketpair() not take the big lock
This system call mainly accesses the file descriptor table, and this is
already guarded by MutexProtected.
2022-08-16 20:43:23 +02:00
Andreas Kling 164c9617c3 Kernel: Only lock file descriptor table once in sys$pipe()
Instead of locking it twice, we now frontload all the work that doesn't
touch the fd table, and then only lock it towards the end of the
syscall.

The benefit here is simplicity. The downside is that we do a bit of
unnecessary work in the EMFILE error case, but we don't need to optimize
that case anyway.
2022-08-16 20:39:45 +02:00
Andreas Kling b6d0636656 Kernel: Don't leak file descriptors in sys$pipe()
If the final copy_to_user() call fails when writing the file descriptors
to the output array, we have to make sure the file descriptors don't
remain in the process file descriptor table. Otherwise they are
basically leaked, as userspace is not aware of them.

This matches the behavior of our sys$socketpair() implementation.
2022-08-16 20:35:32 +02:00
Andreas Kling 307932857e Kernel: Make sys$pipe() not take the big lock
This system call mainly accesses the file descriptor table, and this is
already guarded by MutexProtected.
2022-08-16 20:20:11 +02:00
Andreas Kling 0b58fd5aef Kernel: Remove unnecessary TOCTOU bug in sys$pipe()
We don't need to explicitly check for EMFILE conditions before doing
anything in sys$pipe(). The fd allocation code will take care of it
for us anyway.
2022-08-16 20:16:17 +02:00
Andreas Kling ae8f1c7dc8 Kernel: Leak a ref() on the new Process ASAP in sys$fork()
This fixes an issue where failing the fork due to OOM or other error,
we'd end up destroying the Process too early. By the time we got to
WaitBlockerSet::finalize(), it was long gone.
2022-08-15 00:53:28 +02:00
Brian Gianforcaro 09d5360be3 Kernel: Validate the sys$alarm signal send always succeeds
Previously we were ignoring this return code, instead use MUST(..)
to make sure it always succeeds.
2022-08-10 11:38:18 -04:00
Undefine 97cc33ca47 Everywhere: Make the codebase more architecture aware 2022-07-27 21:46:42 +00:00
zzLinus ca74443012 Kernel/LibC: Implement posix syscall clock_getres() 2022-07-25 15:33:50 +02:00
Tim Schumacher e79f0e2ee9 Kernel+LibC: Don't hardcode the maximum signal number everywhere 2022-07-22 10:07:15 -07:00
Idan Horowitz 3a80b25ed6 Kernel: Support F_SETLKW in fcntl 2022-07-21 16:39:22 +02:00
Idan Horowitz 9db10887a1 Kernel: Clean up sys$futex and add support for cross-process futexes 2022-07-21 16:39:22 +02:00
Idan Horowitz 55c7496200 Kernel: Propagate OOM conditions out of sys$futex 2022-07-21 16:39:22 +02:00
Idan Horowitz 364f6a9bf0 Kernel: Remove the Socket::{protocol,}connect ShouldBlock argument
This argument is always set to description.is_blocking(), but
description is also given as a separate argument, so there's no point
to piping it through separately.
2022-07-21 16:39:22 +02:00
Hendiadyoin1 c3e57bfccb Kernel: Try to set [cm]time in Inode::did_modify_contents
This indirectly resolves a fixme in sys$msync
2022-07-15 12:42:43 +02:00
Hendiadyoin1 10d9bb93be Kernel: Handle multiple regions in sys$msync 2022-07-15 12:42:43 +02:00
Hendiadyoin1 d783389877 Kernel+LibC: Add posix_fallocate syscall 2022-07-15 12:42:43 +02:00
Hendiadyoin1 ad904cdcab Kernel: Use find_last_split_view to get the executable name in do_exec 2022-07-15 12:42:43 +02:00
sin-ack fbc771efe9 Everywhere: Use default StringView constructor over nullptr
While null StringViews are just as bad, these prevent the removal of
StringView(char const*) as that constructor accepts a nullptr.

No functional changes.
2022-07-12 23:11:35 +02:00
sin-ack 3f3f45580a Everywhere: Add sv suffix to strings relying on StringView(char const*)
Each of these strings would previously rely on StringView's char const*
constructor overload, which would call __builtin_strlen on the string.
Since we now have operator ""sv, we can replace these with much simpler
versions. This opens the door to being able to remove
StringView(char const*).

No functional changes.
2022-07-12 23:11:35 +02:00
Idan Horowitz c1fe844da4 Kernel: Stop leaking first thread on errors in sys$fork
Until the thread is first set as Runnable at the end of sys$fork, its
state is Invalid, and as a result, the Finalizer which is searching for
Dying threads will never find it if the syscall short-circuits due to
an error condition like OOM. This also meant the parent Process of the
thread would be leaked as well.
2022-07-10 22:17:21 +03:00
gggggg-gggggg d728017578 Kernel+LibC+LibCore: Pass fcntl extra argument as pointer-sized variable
The extra argument to fcntl is a pointer in the case of F_GETLK/F_SETLK
and we were pulling out a u32, leading to pointer truncation on x86_64.
Among other things, this fixes Assistant on x86_64 :^)
2022-07-10 20:09:11 +02:00
Idan Horowitz 68980bf711 Kernel: Stop reporting POLLHUP exclusively when available in sys$poll
As per Dr. Posix, unlike POLLERR and POLLNVAL, POLLHUP is only mutually
exclusive with POLLOUT, all other events may be reported together with
it.
2022-07-10 14:24:34 +02:00
Idan Horowitz 275e5cdb64 Kernel: Report POLLNVAL events in sys$poll instead of returning EBADF
As required by Dr. Posix.
2022-07-10 14:24:34 +02:00
Idan Horowitz e32f6903f6 Kernel: Stop providing POLLRDHUP events in sys$poll by default
Dr. Posix specifies that only POLLERR, POLLHUP & POLLNVAL are provided
by default.
2022-07-10 14:24:34 +02:00
Idan Horowitz 5ca46abb51 Kernel: Set POLLHUP on WriteHangUp in sys$poll instead of POLLNVAL
POLLNVAL signifies an invalid fd, not a write hang up.
2022-07-10 14:24:34 +02:00
Idan Horowitz a6f237a247 Kernel: Accept SHUT_RD and SHUT_WR as shutdown() how values
The previous check for valid how values assumed this field was a bitmap
and that SHUT_RDWR was simply a bitwise or of SHUT_RD and SHUT_WR,
which is not the case.
2022-07-10 14:24:34 +02:00
Tim Schumacher cf0ad3715e Kernel: Implement sigsuspend using a SignalBlocker
`sigsuspend` was previously implemented using a poll on an empty set of
file descriptors. However, this broke quite a few assumptions in
`SelectBlocker`, as it verifies at least one file descriptor to be
ready after waking up and as it relies on being notified by the file
descriptor.

A bare-bones `sigsuspend` may also be implemented by relying on any of
the `sigwait` functions, but as `sigsuspend` features several (currently
unimplemented) restrictions on how returns work, it is a syscall on its
own.
2022-07-08 22:27:38 +00:00
Tim Schumacher edbffb3c7a Kernel: Unblock SignalBlocker if a signal was just unmarked as pending
When updating the signal mask, there is a small frame where we might set
up the receiving process for handing the signal and therefore remove
that signal from the list of pending signals before SignalBlocker has a
chance to block. In turn, this might cause SignalBlocker to never notice
that the signal arrives and it will never unblock once blocked.

Track the currently handled signal separately and include it when
determining if SignalBlocker should be unblocking.
2022-07-08 22:27:38 +00:00
Tim Schumacher 5efa8e507b Kernel: Implement an axallowed mount option
Similar to `W^X` and `wxallowed`, this allows for anonymous executable
mappings.
2022-07-08 22:27:38 +00:00
Tim Schumacher add4dd3589 Kernel: Do a POSIX-correct signal handler reset on exec 2022-07-05 20:58:38 +03:00
Andrew Kaster 455038d6fc Kernel: Add sysconf for IOV_MAX 2022-06-19 09:05:35 +02:00
Timon Kruiper a4534678f9 Kernel: Implement InterruptDisabler using generic Processor functions
Now that the code does not use architectural specific code, it is moved
to the generic Arch directory and the paths are modified accordingly.
2022-06-02 13:14:12 +01:00
Liav A 58acdce41f Kernel/FileSystem: Simplify even more the mount syscall
As with the previous commit, we put a distinction between filesystems
that require a file description and those which don't, but now in a much
more readable mechanism - all initialization properties as well as the
create static method are grouped to create the FileSystemInitializer
structure. Then when we need to initialize an instance, we iterate over
a table of these structures, checking for matching structure and then
validating the given arguments from userspace against the requirements
to ensure we can create a valid instance of the requested filesystem.
2022-05-29 19:31:02 +01:00
Liav A 4c588441e3 Kernel: Simplify mount syscall flow for regular calls
We do this by putting a distinction between two types of filesystems -
the first type is backed in RAM, and includes TmpFS, ProcFS, SysFS,
DevPtsFS and DevTmpFS. Because these filesystems are backed in RAM,
trying to mount them doesn't require source open file description.
The second type is filesystems that are backed by a file, therefore the
userspace program has to open them (hence it has a open file description
on them) and provide the appropriate source open file description.
By putting this distinction, we can early check if the user tried to
mount the second type of filesystems without a valid file description,
and fail with EBADF then.
Otherwise, we can proceed to either mount either type of filesystem,
provided that the fs_type is valid.
2022-05-29 19:31:02 +01:00
Peter Elliott f6943c85b0 Kernel: Fix EINVAL when mmaping with address and no MAP_FIXED
The current behavior accidently trys to allocate 0 bytes when a non-null
address is provided and MAP_FIXED is specified. This is clearly a bug.
2022-05-23 00:13:26 +02:00
Ariel Don 8a854ba309 Kernel+LibC: Implement futimens(3)
Implement futimes() in terms of utimensat(). Now, utimensat() strays
from POSIX compliance because it also accepts a combination of a file
descriptor of a regular file and an empty path. utimensat() then uses
this file descriptor instead of the path to update the last access
and/or modification time of a file. That being said, its prior behavior
remains intact.

With the new behavior of utimensat(), `path` must point to a valid
string; given a null pointer instead of an empty string, utimensat()
sets `errno` to `EFAULT` and returns a failure.
2022-05-21 18:15:00 +02:00
Ariel Don 9a6bd85924 Kernel+LibC+VFS: Implement utimensat(3)
Create POSIX utimensat() library call and corresponding system call to
update file access and modification times.
2022-05-21 18:15:00 +02:00
Tim Schumacher 098af0f846 Kernel: Properly define IOV_MAX 2022-05-05 20:47:38 +02:00
Timon Kruiper feba7bc8a8 Kernel: Move Kernel/Arch/x86/SafeMem.h to Kernel/Arch/SafeMem.h
The file does not contain any specific architectural code, thus it can
be moved to the Kernel/Arch directory.
2022-05-03 21:53:36 +02:00
Andrew Kaster f08e91f67e Kernel: Don't check pledges or veil against code coverage data files
Coverage tools like LLVM's source-based coverage or GNU's --coverage
need to be able to write out coverage files from any binary, regardless
of its security posture. Not ignoring these pledges and veils means we
can't get our coverage data out without playing some serious tricks.

However this is pretty terrible for normal exeuction, so only skip these
checks when we explicitly configured userspace for coverage.
2022-05-02 01:46:18 +02:00
Andreas Kling b85c8a0b80 Kernel: Add FIOCLEX and FIONCLEX ioctls
These allow you to turn the close-on-exec flag on/off via ioctl().
2022-04-26 14:32:12 +02:00
sin-ack bc7c8879c5 Kernel+LibC+LibCore: Implement the unlinkat(2) syscall 2022-04-23 10:43:32 -07:00
Tim Schumacher a1686db2de Kernel: Skip setting region name if none is given to mmap
This keeps us from accidentally overwriting an already set region name,
for example when we are mapping a file (as, in this case, the file name
is already stored in the region).
2022-04-12 01:52:21 +02:00
Idan Horowitz e84bbfed44 Kernel: Remove big lock from sys$mkdir
This syscall doesn't access any unprotected shared data.
2022-04-09 23:46:02 +02:00
Idan Horowitz 165a23b68c Kernel: Remove big lock from sys$rename
This syscall doesn't access any unprotected shared data.
2022-04-09 23:46:02 +02:00
Idan Horowitz 5c064d3e8e Kernel: Remove big lock from sys$rmdir
This syscall doesn't access any unprotected shared data.
2022-04-09 23:46:02 +02:00
Idan Horowitz d4ce43cf45 Kernel: Remove big lock from sys$statvfs
This syscall doesn't access any unprotected shared data.
2022-04-09 23:46:02 +02:00
Idan Horowitz 4ae93179f1 Kernel: Remove big lock from sys$symlink
This syscall doesn't access any unprotected shared data.
2022-04-09 23:46:02 +02:00
Idan Horowitz 1474b18070 Kernel: Remove big lock from sys$link
This syscall doesn't access any unprotected shared data.
2022-04-09 23:46:02 +02:00
Idan Horowitz fa360f7d88 Kernel: Remove big lock from sys$unlink
This syscall doesn't access any unprotected shared data.
2022-04-09 23:46:02 +02:00
Idan Horowitz 5a96260e25 Kernel: Remove big lock from sys$setsockopt
This syscall doesn't access any unprotected shared data.
2022-04-09 23:46:02 +02:00
Idan Horowitz c2372242b1 Kernel: Remove big lock from sys$getsockopt
This syscall doesn't access any unprotected shared data.
2022-04-09 23:46:02 +02:00
Idan Horowitz 849c227f72 Kernel: Remove big lock from sys$shutdown
This syscall doesn't access any unprotected shared data.
2022-04-09 23:46:02 +02:00
Idan Horowitz e620487b66 Kernel: Remove big lock from sys$connect
This syscall doesn't access any unprotected shared data.
2022-04-09 23:46:02 +02:00
Idan Horowitz 9547a8e8a2 Kernel: Remove big lock from sys$close
This syscall doesn't access any unprotected shared data.
2022-04-09 23:46:02 +02:00
Idan Horowitz 0297349922 Kernel: Remove big lock from sys$chown
This syscall doesn't access any unprotected shared data.
2022-04-09 23:46:02 +02:00
Idan Horowitz 8458313e8a Kernel: Remove big lock from sys$fchown
This syscall doesn't access any unprotected shared data.
2022-04-09 23:46:02 +02:00
Idan Horowitz f986a3b886 Kernel: Remove big lock from sys$bind
This syscall doesn't access any unprotected shared data.
2022-04-09 23:46:02 +02:00
Luke Wilde 1682b0b6d8 Kernel: Remove big lock from sys$set_coredump_metadata
The only requirement for this syscall is to make
Process::m_coredump_properties SpinlockProtected.
2022-04-09 21:51:16 +02:00
Jelle Raaijmakers cc411b328c Kernel: Remove big lock from sys$accept4
The only thing we needed to check is whether `socket.accept()` returns
a socket, and if not, we go back to blocking again.
2022-04-09 17:53:18 +02:00
Andreas Kling 9b9b05eabf Kernel: Make sys$mmap() round requested VM size to page size multiple
This fixes an issue where File::mmap() overrides would fail because they
were expecting to be called with a size evenly divisible by PAGE_SIZE.
2022-04-05 22:26:37 +02:00
Andreas Kling e3e1d79a7d Kernel: Remove unused ShouldDeallocateVirtualRange parameters
Since there is no separate virtual range allocator anymore, this is
no longer used for anything.
2022-04-05 01:15:22 +02:00
Andreas Kling 63ddbaf68a Kernel: Tweak broken dbgln_if() in sys$fork() after RegionTree changes 2022-04-04 11:05:49 +02:00
Andreas Kling 12b612ab14 Kernel: Mark sys$adjtime() as not needing the big lock
This syscall works on global kernel state and so doesn't need protection
from threads in the same process.
2022-04-04 00:42:18 +02:00
Andreas Kling 4306422f29 Kernel: Mark sys$clock_settime() as not needing the big log
This syscall ends up disabling interrupts while changing the time,
and the clock is a global resource anyway, so preventing threads in the
same process from running wouldn't solve anything.
2022-04-04 00:42:18 +02:00
Andreas Kling 55814f6e0e Kernel: Mark sys$sched_{set,get}param() as not needing the big lock
Both of these syscalls take the scheduler lock while accessing the
thread priority, so there's no reliance on the process big lock.
2022-04-04 00:42:18 +02:00
Andreas Kling 9250ac0c24 Kernel: Randomize non-specific VM allocations done by sys$execve()
Stuff like TLS regions, main thread stacks, etc. All deserve to be
randomized unless the ELF requires specific placement. :^)
2022-04-04 00:42:18 +02:00
Andreas Kling 36d829b97c Kernel: Mark sys$listen() as not needing the big lock
This syscall already performs the necessary locking and so doesn't
need to rely on the process big lock.
2022-04-03 22:22:22 +02:00
Andreas Kling e103c5fe2d Kernel: Don't hog file descriptor table lock in sys$bind()
We don't need to hold the lock across the entire syscall. Once we've
fetched the open file description we're interested in, we can let go.
2022-04-03 22:20:34 +02:00
Andreas Kling 85ceab1fec Kernel: Don't hog file descriptor table lock in sys$listen()
We don't need to hold the lock across the entire syscall. Once we've
fetched the open file description we're interested in, we can let go.
2022-04-03 22:18:57 +02:00
Andreas Kling bc4282c773 Kernel: Mark sys$sendfd() and sys$recvfd() as not needing the big lock
These syscalls already perform the necessary locking and don't rely on
the process big lock.
2022-04-03 22:06:03 +02:00
Andreas Kling 858b196c59 Kernel: Unbreak ASLR in the new RegionTree world
Functions that allocate and/or place a Region now take a parameter
that tells it whether to randomize unspecified addresses.
2022-04-03 21:51:58 +02:00
Andreas Kling 07f3d09c55 Kernel: Make VM allocation atomic for userspace regions
This patch move AddressSpace (the per-process memory manager) to using
the new atomic "place" APIs in RegionTree as well, just like we did for
MemoryManager in the previous commit.

This required updating quite a few places where VM allocation and
actually committing a Region object to the AddressSpace were separated
by other code.

All you have to do now is call into AddressSpace once and it'll take
care of everything for you.
2022-04-03 21:51:58 +02:00
Andreas Kling ffe2e77eba Kernel: Add Memory::RegionTree to share code between AddressSpace and MM
RegionTree holds an IntrusiveRedBlackTree of Region objects and vends a
set of APIs for allocating memory ranges.

It's used by AddressSpace at the moment, and will be used by MM soon.
2022-04-03 21:51:58 +02:00
Andreas Kling 02a95a196f Kernel: Use AddressSpace region tree for range allocation
This patch stops using VirtualRangeAllocator in AddressSpace and instead
looks for holes in the region tree when allocating VM space.

There are many benefits:

- VirtualRangeAllocator is non-intrusive and would call kmalloc/kfree
  when used. This new solution is allocation-free. This was a source
  of unpleasant MM/kmalloc deadlocks.

- We consolidate authority on what the address space looks like in a
  single place. Previously, we had both the range allocator *and* the
  region tree both being used to determine if an address was valid.
  Now there is only the region tree.

- Deallocation of VM when splitting regions is no longer complicated,
  as we don't need to keep two separate trees in sync.
2022-04-03 21:51:58 +02:00
Andreas Kling 2617adac52 Kernel: Store AddressSpace memory regions in an IntrusiveRedBlackTree
This means we never need to allocate when inserting/removing regions
from the address space.
2022-04-03 21:51:58 +02:00
Tim Schumacher 4ba39c8d63 Kernel: Implement f_basetype in statvfs 2022-04-03 19:15:14 +02:00
Idan Horowitz 086969277e Everywhere: Run clang-format 2022-04-01 21:24:45 +01:00
Ali Mohammad Pur d6ce3e63e2 Kernel: Disallow elevating pledge promises with no_error set
8233da3398 introduced a not-so-subtle bug
where an application with an existing pledge set containing `no_error`
could elevate its pledge set by pledging _anything_, this commit makes
sure that no new promise is accepted.
2022-03-29 12:11:56 +02:00
Ali Mohammad Pur 8233da3398 Kernel: Add a 'no_error' pledge promise
This makes pledge() ignore promises that would otherwise cause it to
fail with EPERM, which is very useful for allowing programs to run under
a "jail" so to speak, without having them termiate early due to a
failing pledge() call.
2022-03-26 21:34:56 +04:30
Liav A b5ef900ccd Kernel: Don't assume paths of TTYs and pseudo terminals anymore
The obsolete ttyname and ptsname syscalls are removed.
LibC doesn't rely on these anymore, and it helps simplifying the Kernel
in many places, so it's an overall an improvement.

In addition to that, /proc/PID/tty node is removed too as it is not
needed anymore by userspace to get the attached TTY of a process, as
/dev/tty (which is already a character device) represents that as well.
2022-03-22 20:26:05 +01:00
int16 256744ebdf Kernel: Make mmap validation functions return ErrorOr<void> 2022-03-22 12:20:19 +01:00
int16 4b96d9c813 Kernel: Move mmap validation functions to Process 2022-03-22 12:20:19 +01:00
int16 479929b06c Kernel: Check wxallowed mount flag when validating mmap call 2022-03-22 12:20:19 +01:00
Brian Gianforcaro 03342876b8 Revert "Kernel: Use an ArmedScopeGuard to revert changes after failed mmap"
This reverts commit 790d620b39.
2022-03-12 21:45:57 -08:00
Andreas Kling 7b3642d08c Kernel: Mark sys$lseek() as not needing the big lock
This syscall doesn't access any data that was implicitly protected by
the big lock.
2022-03-09 16:43:00 +01:00
Andreas Kling 09e644f0ba Kernel: Mark sys$emuctl() as not needing the big lock
This syscall doesn't do anything at all, and definitely doesn't need the
big lock. :^)
2022-03-09 16:43:00 +01:00
Andreas Kling b4fefedd1d Kernel: Mark sys$chmod() as not needing the big lock
This syscall doesn't access any data that was implicitly protected by
the big lock.
2022-03-09 16:43:00 +01:00
Andreas Kling aa381c4a67 Kernel: Mark sys$fchmod() as not needing the big lock
This syscall doesn't access any data that was implicitly protected by
the big lock.
2022-03-09 16:43:00 +01:00
Andreas Kling d074aae422 Kernel: Mark sys$dup2() as not needing the big lock
This syscall doesn't access any data that was implicitly protected by
the big lock.
2022-03-09 16:43:00 +01:00
Andreas Kling 8aad9e7448 Kernel: Mark sys$ftruncate() as not needing the big lock
This syscall doesn't access any data that was implicitly protected by
the big lock.
2022-03-09 16:43:00 +01:00
Andreas Kling 69a6a4d927 Kernel: Mark sys$fstatvfs() as not needing the big lock
This syscall doesn't access any data that was implicitly protected by
the big lock.
2022-03-09 16:43:00 +01:00
Hendiadyoin1 790d620b39 Kernel: Use an ArmedScopeGuard to revert changes after failed mmap 2022-03-08 15:58:51 -08:00
Andreas Kling 6354a9a030 Kernel: Mark sys$fsync() as not needing the big lock
This syscall doesn't access any data that was implicitly protected by
the big lock.
2022-03-08 00:19:49 +01:00
Andreas Kling ef45ff4703 Kernel: Mark sys$readlink() as not needing the big lock
This syscall doesn't access any data that was implicitly protected by
the big lock.
2022-03-08 00:19:49 +01:00
Andreas Kling 2688ee28ff Kernel: Mark sys$stat() as not needing the big lock
This syscall doesn't access any data that was implicitly protected by
the big lock.
2022-03-08 00:19:49 +01:00
Andreas Kling be7ec52ed0 Kernel: Mark sys$fstat() as not needing the big lock
This syscall doesn't access any data that was implicitly protected by
the big lock.
2022-03-08 00:19:49 +01:00
Andreas Kling 23822febd2 Kernel: Mark sys$fchdir() as not needing the big lock
This syscall doesn't access any data that was implicitly protected by
the big lock.
2022-03-08 00:19:49 +01:00
Andreas Kling 156ab0c47d Kernel: Mark sys$chdir() as not needing the big lock
This syscall doesn't access any data that was implicitly protected by
the big lock.
2022-03-08 00:19:49 +01:00
Andreas Kling 7597bef771 Kernel: Mark sys$getcwd() as not needing the big lock
This syscall doesn't access any data that was implicitly protected by
the big lock.
2022-03-08 00:19:49 +01:00
Andreas Kling f630d0f095 Kernel: Mark sys$realpath() as not needing the big lock
This syscall doesn't access any data that was implicitly protected by
the big lock.
2022-03-08 00:19:49 +01:00
Andreas Kling 580d89f093 Kernel: Put Process unveil state in a SpinlockProtected container
This makes path resolution safe to perform without holding the big lock.
2022-03-08 00:19:49 +01:00
Andreas Kling 24f02bd421 Kernel: Put Process's current directory in a SpinlockProtected
Also let's call it "current_directory" instead of "cwd" everywhere.
2022-03-08 00:19:49 +01:00
Andreas Kling 7543c34d07 Kernel: Mark sys$anon_create() as not needing the big lock
This syscall is already safe for no-big-lock since it doesn't access any
unprotected data.
2022-03-08 00:19:49 +01:00
Andreas Kling baa6ff5649 Kernel: Wrap HIDManagement keymap data in SpinlockProtected
This serializes access to the current keymap data everywhere in the
kernel, allowing to mark sys$setkeymap() as not needing the big lock.
2022-03-07 16:35:23 +01:00
Ali Mohammad Pur 6608812e4b Kernel: Over-align the FPUState on the stack in sigreturn
The stack is misaligned at this point for some reason, this is a hack
that makes the resulting object "correctly" aligned, thus avoiding a
KUBSAN error.
2022-03-04 20:07:05 +01:00
Ali Mohammad Pur 88d7bf7362 Kernel: Save and restore FPU state on signal dispatch on i386/x86_64 2022-03-04 20:07:05 +01:00
Ali Mohammad Pur 4bd01b7fe9 Kernel: Add support for SA_SIGINFO
We currently don't really populate most of the fields, but that can
wait :^)
2022-03-04 20:07:05 +01:00
Ali Mohammad Pur 585054d68b Kernel: Comment the living daylights out of signal trampoline/sigreturn
Mere mortals like myself cannot understand more than two lines of
assembly without a million comments explaining what's happening, so do
that and make sure no one has to go on a wild stack state chase when
hacking on these.
2022-03-04 20:07:05 +01:00
Ali Mohammad Pur 848eaf2220 Kernel: Reject sigaction() with SA_SIGINFO
We can't handle this, so let sigaction() fail instead of PANIC()'ing
later when we try to dispatch a signal with SA_SIGINFO set.
2022-03-04 20:07:05 +01:00
Ali Mohammad Pur cf63447044 Kernel: Move signal handlers from being thread state to process state
POSIX requires that sigaction() and friends set a _process-wide_ signal
handler, so move signal handlers and flags inside Process.
This also fixes a "pid/tid confusion" FIXME, as we can now send the
signal to the process and let that decide which thread should get the
signal (which is the thread with tid==pid, but that's now the Process's
problem).
Note that each thread still retains its signal mask, as that is local to
each thread.
2022-03-04 20:07:05 +01:00
Lucas CHOLLET 839d3d9f74 Kernel: Add getrusage() syscall
Only the two timeval fields are maintained, as required by the POSIX
standard.
2022-02-28 20:09:37 +01:00
Idan Horowitz 011bd06053 Kernel: Set CS selector when initializing thread context on x86_64
These are not technically required, since the Thread constructor
already sets these, but they are set on i686, so let's try and keep
consistent behaviour between the different archs.
2022-02-27 00:38:00 +02:00
Brian Gianforcaro d05fa14e52 Kernel: Use TRY() when validating clock_id in TimeManagement
Gets rid of a bit of code duplication, and makes the API more consistent
with the style we are moving towards.
2022-02-21 15:47:51 -08:00
Brian Gianforcaro 70f3fa2dd2 Kernel: Set new process name in do_exec before waiting for the tracer
While investigating why gdb is failing when it calls `PT_CONTINUE`
against Serenity I noticed that the names of the programs in the
System Monitor didn't make sense. They were seemingly stale.

After inspecting the kernel code, it became apparent that the sequence
occurs as follows:

    1. Debugger calls `fork()`
    2. The forked child calls `PT_TRACE_ME`
    3. The `PT_TRACE_ME` instructs the forked process to block in the
       kernel waiting for a signal from the tracer on the next call
       to `execve(..)`.
    4. Debugger waits for forked child to spawn and stop, and then it
       calls `PT_ATTACH` followed by `PT_CONTINUE` on the child.
    5. Currently the `PT_CONTINUE` fails because of some other yet to
       be found bug.
    6. The process name is set immediately AFTER we are woken up by
       the `PT_CONTINUE` which never happens in the case I'm debugging.

This chain of events leaves the process suspended, with the name  of
the original (forked) process instead of the name we inherit from
the `execve(..)` call.

To avoid such confusion in the future, we set the new name before we
block waiting for the tracer.
2022-02-19 18:04:32 -08:00
Jakub Berkop 895a050e04 Kernel: Fixed argument passing for profiling_enable syscall
Arguments larger than 32bit need to be passed as a pointer on a 32bit
architectures. sys$profiling_enable has u64 event_mask argument,
which means that it needs to be passed as an pointer. Previously upper
32bits were filled by garbage.
2022-02-19 11:37:02 +01:00
Ali Mohammad Pur a1cb2c371a AK+Kernel: OOM-harden most parts of Trie
The only part of Unveil that can't handle OOM gracefully is the
String::formatted() use in the node metadata.
2022-02-15 18:03:02 +02:00
Jakub Berkop 4916c892b2 Kernel/Profiling: Add profiling to read syscall
Syscalls to read can now be profiled, allowing us to monitor
filesystem usage by different applications.
2022-02-14 11:38:13 +01:00
Idan Horowitz bd821982e0 Kernel: Use StringView::for_each_split_view() in sys$pledge
This let's us avoid the fallible Vector allocation that split_view()
entails.
2022-02-14 11:35:20 +01:00
Idan Horowitz e384f62ee2 Kernel: Make master TLS region WeakPtr construction OOM-fallible 2022-02-14 11:35:20 +01:00
Idan Horowitz c8ab7bde3b Kernel: Use try_make_weak_ptr() instead of make_weak_ptr() 2022-02-13 23:02:57 +01:00
Idan Horowitz d6ea6c39a7 AK+Kernel: Rename try_make_weak_ptr to make_weak_ptr_if_nonnull
This matches the likes of the adopt_{own, ref}_if_nonnull family and
also frees up the name to allow us to eventually add OOM-fallible
versions of these functions.
2022-02-13 23:02:57 +01:00
Andrew Kaster b4a7d148b1 Kernel: Expose maximum argument limit in sysconf
Move the definitions for maximum argument and environment size to
Process.h from execve.cpp. This allows sysconf(_SC_ARG_MAX) to return
the actual argument maximum of 128 KiB to userspace.
2022-02-13 22:06:54 +02:00
Idan Horowitz 57bce8ab97 Kernel: Set up Regions before adding them to a Process's AddressSpace
This reduces the amount of time in which not fully-initialized Regions
are present inside an AddressSpace's region tree.
2022-02-11 17:49:46 +02:00
Lenny Maiorani c6acf64558 Kernel: Change static constexpr variables to constexpr where possible
Function-local `static constexpr` variables can be `constexpr`. This
can reduce memory consumption, binary size, and offer additional
compiler optimizations.

These changes result in a stripped x86_64 kernel binary size reduction
of 592 bytes.
2022-02-09 21:04:51 +00:00
Andreas Kling cda56f8049 Kernel: Robustify and rename Inode bound socket API
Rename the bound socket accessor from socket() to bound_socket().
Also return RefPtr<LocalSocket> instead of a raw pointer, to make it
harder for callers to mess up.
2022-02-07 13:02:34 +01:00
sin-ack 24fd8fb16f Kernel: Ensure socket is suitable for writing in sys$sendmsg
Previously we would return a bytes written value of 0 if the writing end
of the socket was full. Now we either exit with EAGAIN if the socket
description is non-blocking, or block until the description can be
written to.

This is mostly a copy of the conditions in sys$write but with the "total
nwritten" parts removed as sys$sendmsg does not have that.
2022-02-07 12:21:45 +01:00
Andreas Kling 04539d4930 Kernel: Propagate sys$profiling_enable() buffer allocation failure
Caught a kernel panic when enabling profiling of all threads when there
was very little memory available.
2022-02-06 01:25:32 +01:00
Andreas Kling 3845c90e08 Kernel: Remove unnecessary includes from Thread.h
...and deal with the fallout by adding missing includes everywhere.
2022-01-30 16:21:59 +01:00
Idan Horowitz e28af4a2fc Kernel: Stop using HashMap in Mutex
This commit removes the usage of HashMap in Mutex, thereby making Mutex
be allocation-free.

In order to achieve this several simplifications were made to Mutex,
removing unused code-paths and extra VERIFYs:
 * We no longer support 'upgrading' a shared lock holder to an
   exclusive holder when it is the only shared holder and it did not
   unlock the lock before relocking it as exclusive. NOTE: Unlike the
   rest of these changes, this scenario is not VERIFY-able in an
   allocation-free way, as a result the new LOCK_SHARED_UPGRADE_DEBUG
   debug flag was added, this flag lets Mutex allocate in order to
   detect such cases when debugging a deadlock.
 * We no longer support checking if a Mutex is locked by the current
   thread when the Mutex was not locked exclusively, the shared version
   of this check was not used anywhere.
 * We no longer support force unlocking/relocking a Mutex if the Mutex
   was not locked exclusively, the shared version of these functions
   was not used anywhere.
2022-01-29 16:45:39 +01:00
Andreas Kling d748a3c173 Kernel: Only lock process file descriptor table once in sys$poll()
Grab the OpenFileDescriptions mutex once and hold on to it while
populating the SelectBlocker::FDVector.
2022-01-29 02:17:12 +01:00
Andreas Kling b56646e293 Kernel: Switch process file descriptor table from spinlock to mutex
There's no reason for this to use a spinlock. Instead, let's allow
threads to block if someone else is using the descriptor table.
2022-01-29 02:17:09 +01:00
Andreas Kling 8ebec2938c Kernel: Convert process file descriptor table to a SpinlockProtected
Instead of manually locking in the various member functions of
Process::OpenFileDescriptions, simply wrap it in a SpinlockProtected.
2022-01-29 02:17:06 +01:00
Andreas Kling b27b22a68c Kernel: Allocate entire SelectBlocker::FDVector at once
Use try_ensure_capacity() + unchecked_append() instead of repeatedly
doing try_append().
2022-01-28 23:41:18 +01:00
Andreas Kling 31c1094577 Kernel: Don't mess with thread state in Process::do_exec()
We were marking the execing thread as Runnable near the end of
Process::do_exec().

This was necessary for exec in processes that had never been scheduled
yet, which is a specific edge case that only applies to the very first
userspace process (normally SystemServer). At this point, such threads
are in the Invalid state.

In the common case (normal userspace-initiated exec), making the current
thread Runnable meant that we switched away from its current state:
Running. As the thread is indeed running, that's a bogus change!
This created a short time window in which the thread state was bogus,
and any attempt to block the thread would panic the kernel (due to a
bogus thread state in Thread::block() leading to VERIFY_NOT_REACHED().)

Fix this by not touching the thread state in Process::do_exec()
and instead make the first userspace thread Runnable directly after
calling Process::exec() on it in try_create_userspace_process().

It's unfortunate that exec() can be called both on the current thread,
and on a new thread that has never been scheduled. It would be good to
not have the latter edge case, but fixing that will require larger
architectural changes outside the scope of this fix.
2022-01-27 11:18:25 +01:00
Brian Gianforcaro e954b4bdd4 Kernel: Return error from sys$execve() when called with zero arguments
There are many assumptions in the stack that argc is not zero, and
argv[0] points to a valid string. The recent pwnkit exploit on Linux
was able to exploit this assumption in the `pkexec` utility
(a SUID-root binary) to escalate from any user to root.

By convention `execve(..)` should always be called with at least one
valid argument, so lets enforce that semantic to harden the system
against vulnerabilities like pwnkit.

Reference: https://www.qualys.com/2022/01/25/cve-2021-4034/pwnkit.txt
2022-01-26 13:05:59 +01:00
Idan Horowitz d1433c35b0 Kernel: Handle OOM failures in find_shebang_interpreter_for_executable 2022-01-26 02:37:03 +02:00
Idan Horowitz 8cf0e4a5e4 Kernel: Eliminate allocations from generate_auxiliary_vector 2022-01-26 02:37:03 +02:00
Idan Horowitz a6f0ab358a Kernel: Make AddressSpace::find_regions_intersecting OOM-fallible 2022-01-26 02:37:03 +02:00
Idan Horowitz e23d320bb9 Kernel: Fail gracefully due to OOM on HashTable set in sys$setgroups 2022-01-26 02:37:03 +02:00
Idan Horowitz 8dfd124718 Kernel: Replace String with NonnullOwnPtr<KString> in sys$getkeymap 2022-01-25 08:06:02 +01:00
Liav A 69f054616d Kernel: Add CommandLine option to disable or enable the PC speaker
By default, we disable the PC speaker as it's quite annoying when using
the text mode console.
2022-01-23 00:40:54 +00:00
Jelle Raaijmakers df73e8b46b Kernel: Allow program headers to align on multiples of PAGE_SIZE
These checks in `sys$execve` could trip up the system whenever you try
to execute an `.so` file. For example, double-clicking `libwasm.so` in
Terminal crashes the kernel.

This changes the program header alignment checks to reflect the same
checks in LibELF, and passes the requested alignment on to
`::try_allocate_range()`.
2022-01-23 00:11:56 +02:00
Idan Horowitz 0adee378fd Kernel: Stop using LibKeyboard's CharacterMap in HIDManagement
This was easily done, as the Kernel and Userland don't actually share
any of the APIs exposed by it, so instead the Kernel APIs were moved to
the Kernel, and the Userland APIs stayed in LibKeyboard.

This has multiple advantages:
 * The non OOM-fallible String is not longer used for storing the
   character map name in the Kernel
 * The kernel no longer has to link to the userland LibKeyboard code
 * A lot of #ifdef KERNEL cruft can be removed from LibKeyboard
2022-01-21 18:25:44 +01:00
Andreas Kling 0e08763483 Kernel: Wrap much of sys$execve() in a block scope
Since we don't return normally from this function, let's make it a
little extra difficult to accidentally leak something by leaving it on
the stack in this function.
2022-01-13 23:57:33 +01:00
Andreas Kling 0e72b04e7d Kernel: Perform exec-into-new-image directly in sys$execve()
This ensures that everything allocated on the stack in Process::exec()
gets cleaned up. We had a few leaks related to the parsing of shebang
(#!) executables that get fixed by this.
2022-01-13 23:57:33 +01:00
Idan Horowitz cfb9f889ac LibELF: Accept Span instead of Pointer+Size in validate_program_headers 2022-01-13 22:40:25 +01:00
Idan Horowitz 3e959618c3 LibELF: Use StringBuilders instead of Strings for the interpreter path
This is required for the Kernel's usage of LibELF, since Strings do not
expose allocation failure.
2022-01-13 22:40:25 +01:00
Andreas Kling 8ad46fd8f5 Kernel: Stop leaking executable path in successful sys$execve()
Since we don't return from sys$execve() when it's successful, we have to
take special care to tear down anything we've allocated.

Turns out we were not doing this for the full executable path itself.
2022-01-13 16:15:37 +01:00
Idan Horowitz 40159186c1 Kernel: Remove String use-after-free in generate_auxiliary_vector
Instead we generate the random bytes directly in
make_userspace_context_for_main_thread if requested.
2022-01-13 00:20:08 -08:00
Idan Horowitz e72bbca9eb Kernel: Fix OOB write in sys$uname
Since this was only out of bounds of the specific field, not of the
whole struct, and because setting the hostname requires root privileges
this was not actually a security vulnerability.
2022-01-13 00:20:08 -08:00
Idan Horowitz 50d6a6a186 Kernel: Convert hostname to KString 2022-01-13 00:20:08 -08:00
Idan Horowitz bc85b64a38 Kernel: Replace usages of String::formatted with KString in sys$exec 2022-01-12 16:09:09 +02:00
Idan Horowitz dba0840942 Kernel: Remove outdated FIXME comment in sys$sethostname 2022-01-12 16:09:09 +02:00
Daniel Bertalan 182016d7c0 Kernel+LibC+LibCore+UE: Implement fchmodat(2)
This function is an extended version of `chmod(2)` that lets one control
whether to dereference symlinks, and specify a file descriptor to a
directory that will be used as the base for relative paths.
2022-01-12 14:54:12 +01:00
Andreas Kling a62bdb0761 Kernel: Delay Process data unprotection in sys$pledge()
Don't unprotect the protected data area until we've validated the pledge
syscall inputs.
2022-01-02 18:08:02 +01:00
circl 63760603f3 Kernel+LibC+LibCore: Add lchown and fchownat functions
This modifies sys$chown to allow specifying whether or not to follow
symlinks and in which directory.

This was then used to implement lchown and fchownat in LibC and LibCore.
2022-01-01 15:08:49 +01:00
Daniel Bertalan 1d2f78682b Kernel+AK: Eliminate a couple of temporary String allocations 2021-12-30 14:16:03 +01:00
Brian Gianforcaro 54b9a4ec1e Kernel: Handle promise violations in the syscall handler
Previously we would crash the process immediately when a promise
violation was found during a syscall. This is error prone, as we
don't unwind the stack. This means that in certain cases we can
leak resources, like an OwnPtr / RefPtr tracked on the stack. Or
even leak a lock acquired in a ScopeLockLocker.

To remedy this situation we move the promise violation handling to
the syscall handler, right before we return to user space. This
allows the code to follow the normal unwind path, and grantees
there is no longer any cleanup that needs to occur.

The Process::require_promise() and Process::require_no_promises()
functions were modified to return ErrorOr<void> so we enforce that
the errors are always propagated by the caller.
2021-12-29 18:08:15 +01:00
Brian Gianforcaro 0f7fe1eb08 Kernel: Use Process::require_no_promises instead of REQUIRE_NO_PROMISES
This change lays the foundation for making the require_promise return
an error hand handling the process abort outside of the syscall
implementations, to avoid cases where we would leak resources.

It also has the advantage that it makes removes a gs pointer read
to look up the current thread, then process for every syscall. We
can instead go through the Process this pointer in most cases.
2021-12-29 18:08:15 +01:00
Brian Gianforcaro bad6d50b86 Kernel: Use Process::require_promise() instead of REQUIRE_PROMISE()
This change lays the foundation for making the require_promise return
an error hand handling the process abort outside of the syscall
implementations, to avoid cases where we would leak resources.

It also has the advantage that it makes removes a gs pointer read
to look up the current thread, then process for every syscall. We
can instead go through the Process this pointer in most cases.
2021-12-29 18:08:15 +01:00
Brian Gianforcaro b5367bbf31 Kernel: Clarify why ftruncate() & pread() are passed off_t const*
I fell into this trap and tried to switch the syscalls to pass by
the `off_t` by register. I think it makes sense to add a clarifying
comment for future readers of the code, so they don't fall into the
same trap. :^)
2021-12-29 05:54:04 -08:00
Brian Gianforcaro 737a11389c Kernel: Fix info leak from sockaddr_un in socket syscalls
In `sys$accept4()` and `get_sock_or_peer_name()` we were not
initializing the padding of the `sockaddr_un` struct, leading to
an kernel information leak if the
caller looked back at it's contents.

Before Fix:

    37.766 Clipboard(11:11): accept4 Bytes:
    2f746d702f706f7274616c2f636c6970626f61726440eac130e7fbc1e8abbfc
    19c10ffc18440eac15485bcc130e7fbc1549feaca6c9deaca549feaca1bb0bc
    03efdf62c0e056eac1b402d7acd010ffc14602000001b0bc030100000050bf0
    5c24602000001e7fbc1b402d7ac6bdc

After Fix:

    0.603 Clipboard(11:11): accept4 Bytes:
    2f746d702f706f7274616c2f636c6970626f617264000000000000000000000
    000000000000000000000000000000000000000000000000000000000000000
    000000000000000000000000000000000000000000000000000000000000000
    0000000000000000000000000000000
2021-12-29 03:41:32 -08:00
Daniel Bertalan fcdd202741 Kernel: Return the actual number of CPU cores that we have
... instead of returning the maximum number of Processor objects that we
can allocate.

Some ports (e.g. gdb) rely on this information to determine the number
of worker threads to spawn. When gdb spawned 64 threads, the kernel
could not cope with generating backtraces for it, which prevented us
from debugging it properly.

This commit also removes the confusingly named
`Processor::processor_count` function so that this mistake can't happen
again.
2021-12-29 03:17:41 -08:00
Idan Horowitz d7ec5d042f Kernel: Port Process to ListedRefCounted 2021-12-29 12:04:15 +01:00
Guilherme Goncalves 33b78915d3 Kernel: Propagate overflow errors from Memory::page_round_up
Fixes #11402.
2021-12-28 23:08:50 +01:00
Daniel Bertalan 52beeebe70 Kernel: Remove the KString::try_create(String::formatted(...)) pattern
We can now directly create formatted KStrings with KString::formatted.

:^)
2021-12-28 01:55:22 -08:00
Guilherme Gonçalves da6aef9fff Kernel: Make msync return EINVAL when regions are too large
As a small cleanup, this also makes `page_round_up` verify its
precondition with `page_round_up_would_wrap` (which callers are expected
to call), rather than having its own logic.

Fixes #11297.
2021-12-23 17:43:12 -08:00
Daniel Bertalan 8e3d1a42e3 Kernel+UE+LibC: Store address as void* in SC_m{re,}map_params
Most other syscalls pass address arguments as `void*` instead of
`uintptr_t`, so let's do that here too. Besides improving consistency,
this commit makes `strace` correctly pretty-print these arguments in
hex.
2021-12-23 23:08:10 +01:00
Daniel Bertalan 77f9272aaf Kernel+UE: Add MAP_FIXED_NOREPLACE mmap() flag
This feature was introduced in version 4.17 of the Linux kernel, and
while it's not specified by POSIX, I think it will be a nice addition to
our system.

MAP_FIXED_NOREPLACE provides a less error-prone alternative to
MAP_FIXED: while regular fixed mappings would cause any intersecting
ranges to be unmapped, MAP_FIXED_NOREPLACE returns EEXIST instead. This
ensures that we don't corrupt our process's address space if something
is already at the requested address.

Note that the more portable way to do this is to use regular
MAP_ANONYMOUS, and check afterwards whether the returned address matches
what we wanted. This, however, has a large performance impact on
programs like Wine which try to reserve large portions of the address
space at once, as the non-matching addresses have to be unmapped
separately.
2021-12-23 23:08:10 +01:00
Andreas Kling 1d08b671ea Kernel: Enter new address space before destroying old in sys$execve()
Previously we were assigning to Process::m_space before actually
entering the new address space (assigning it to CR3.)

If a thread was preempted by the scheduler while destroying the old
address space, we'd then attempt to resume the thread with CR3 pointing
at a partially destroyed address space.

We could then crash immediately in write_cr3(), right after assigning
the new value to CR3. I am hopeful that this may have been the bug
haunting our CI for months. :^)
2021-12-23 01:18:26 +01:00
Daniel Bertalan ce1bf3724e Kernel: Replace intersecting ranges in mmap when MAP_FIXED is specified
This behavior is mandated by POSIX and is used by software like Wine
after reserving large chunks of the address range.
2021-12-22 00:02:36 -08:00
Martin Bříza 86b249f02f Kernel: Implement sysconf(_SC_SYMLOOP_MAX)
Not much to say here, this is an implementation of this call that
accesses the actual limit constant that's used by the VirtualFileSystem
class.

As a side note, this is required for my eventual Qt port.
2021-12-21 12:54:11 -08:00
Liav A 5a649d0fd5 Kernel: Return EINVAL when specifying -1 for setuid and similar syscalls
For setreuid and setresuid syscalls, -1 means to set the current
uid/euid/gid/egid value, to be more convenient for programming.
However, for other syscalls where we pass only one argument, there's no
justification to specify -1.

This behavior is identical to how Linux handles the value -1, and is
influenced by the fact that the manual pages for the group of one
argument syscalls that handle ID operations is ambiguous about this
topic.
2021-12-20 11:32:16 +01:00
Hendiadyoin1 18013f3c06 Kernel: Remove a redundant check in Process::remap_range_as_stack
We already VERIFY that we have carved something out, so we don't need to
check that again.
2021-12-18 10:31:18 -08:00
Hendiadyoin1 2d28b441bf Kernel: Collapse a redundant boolean conditional return statement in …
validate_mmap_prot
2021-12-18 10:31:18 -08:00
Hendiadyoin1 f38d32535c Kernel: Access OpenFileDescriptions::max_open() statically in Syscalls 2021-12-18 10:31:18 -08:00
Hendiadyoin1 c860e0ab95 Kernel: Add implicit auto qualifiers in Syscalls 2021-12-18 10:31:18 -08:00
Hendiadyoin1 f5b495d92c Kernel: Remove else after return in Process::do_write 2021-12-18 10:31:18 -08:00
Andreas Kling 32aa623eff Kernel: Fix 4-byte uninitialized memory leak in sys$sigaltstack()
It was possible to extract 4 bytes of uninitialized kernel stack memory
on x86_64 by looking in the padding of stack_t.
2021-12-18 11:30:10 +01:00
Andreas Kling 0ae8702692 Kernel: Make File::stat() & friends return Error<struct stat>
Instead of making the caller provide a stat buffer, let's just return
one as a value.
2021-12-18 11:30:10 +01:00
Andreas Kling abf2204402 Kernel: Use copy_typed_from_user() in more places :^) 2021-12-18 11:30:10 +01:00
Andreas Kling 39d9337db5 Kernel: Make sys${ftruncate,pread} take off_t as const pointer
These syscalls don't write back to the off_t value (unlike sys$lseek)
so let's take Userspace<off_t const*> instead of Userspace<off_t*>.
2021-12-18 11:30:10 +01:00
Jean-Baptiste Boric 23257cac52 Kernel: Remove sys$select() syscall
Now that the userland has a compatiblity wrapper for select(), the
kernel doesn't need to implement this syscall natively. The poll()
interface been around since 1987, any code still using select()
should be slapped silly.

Note: the SerenityOS source tree mostly uses select() and not poll()
despite SerenityOS having support for poll() since early 2019...
2021-12-12 21:48:50 +01:00
Jean-Baptiste Boric 2177c2a30b Kernel: Split off sys$poll() into Syscalls/poll.cpp 2021-12-12 21:48:50 +01:00
Idan Horowitz 762e047ec9 Kernel+LibC: Implement sigtimedwait()
This includes a new Thread::Blocker called SignalBlocker which blocks
until a signal of a matching type is pending. The current Blocker
implementation in the Kernel is very complicated, but cleaning it up is
a different yak for a different day.
2021-12-12 08:34:19 +02:00
Idan Horowitz 81a76a30a1 Kernel: Preserve pending signals across execve(2)s
As required by posix. Also rename Thread::clear_signals to
Thread::reset_signals_for_exec since it doesn't actually clear any
pending signals, but rather does execve related signal book-keeping.
2021-12-12 08:34:19 +02:00
Idan Horowitz 0ca1231d8f Kernel: Inherit alternative signal stack on fork(2)
A child process created via fork(2) inherits a copy of its parent's
alternate signal stack settings.
2021-12-12 08:34:19 +02:00
Idan Horowitz 92a6c91f4e Kernel: Preserve signal mask across fork(2) and execve(2)
A child created via fork(2) inherits a copy of its parent's signal
mask; the signal mask is preserved across execve(2).
2021-12-12 08:34:19 +02:00
Ben Wiederhake 0e6e1092f0 Kernel: Make ptrace return an error on error
Returning 'result.error().code()' erroneously creates an
ErrorOr<FlatPtr> of the positive errno code, which breaks our
error-returning convention.

This seems to be due to a forgotten minus-sign during the refactoring in
9e51e295cf. This latent bug was never
discovered, because currently the error-handling paths are rarely
exercised.
2021-12-05 22:59:09 +01:00
Ben Wiederhake 0f8483f09c Kernel: Implement new ptrace function PT_PEEKBUF
This enables the tracer to copy large amounts of data in a much saner
way.
2021-12-05 22:59:09 +01:00
Ben Wiederhake 3e223185b3 Kernel+strace: Remove unnecessary indirection for PEEK
Also, remove incomplete, superfluous check.
Incomplete, because only the byte at the provided address was checked;
this misses the last bytes of the "jerk page".
Superfluous, because it is already correctly checked by peek_user_data
(which calls copy_from_user).

The caller/tracer should not typically attempt to read non-userspace
addresses, we don't need to "hot-path" it either.
2021-12-05 22:59:09 +01:00
Idan Horowitz 265764ff2f Kernel: Add support for the POLLWRBAND poll event 2021-12-05 12:53:29 +01:00
Idan Horowitz f415218afe Kernel+LibC: Implement sigaltstack()
This is required for compiling wine for serenity
2021-12-01 21:44:11 +02:00
Idan Horowitz d5d0eb45bf Kernel: Clear up some comments in the sys$mprotect implementation 2021-12-01 21:44:11 +02:00
Idan Horowitz f27bbec7b2 Kernel: Move incorrect early return in sys$mprotect
Since we're iterating over multiple regions that interesect with the
requested range, just one of them having the requested access flags
is not enough to finish the syscall early.
2021-12-01 21:44:11 +02:00
Idan Horowitz 4ca39c7110 Kernel: Move the expand_range_to_page_boundaries helper to MemoryManager
This helper can (and will) be used in more parts of the kernel besides
the mmap-family of syscalls.
2021-12-01 21:44:11 +02:00
Idan Horowitz fc13d0782f LibC: Make the madvise advice field a value instead of a bitfield
The advices are almost always exclusive of one another, and while POSIX
does not define madvise, most other unix-like and *BSD systems also only
accept a singular value per call.
2021-12-01 21:44:11 +02:00
Hendiadyoin1 c7b90fa7d3 Kernel: Don't rewrite the whole file on sys$msync 2021-12-01 09:47:46 +01:00
Hendiadyoin1 259f78545a Kernel: Allow flushing of partial regions in sys$msync 2021-12-01 09:47:46 +01:00
Hendiadyoin1 49d6ad6633 Kernel: Handle more error cases in sys$msync 2021-12-01 09:47:46 +01:00
Ben Wiederhake 33079c8ab9 Kernel+UE+LibC: Remove unused dbgputch syscall
Everything uses the dbgputstr syscall anyway, so there is no need to
keep supporting it.
2021-11-24 22:56:39 +01:00
Jelle Raaijmakers 46ad5f2a17 Kernel: Fix futex syscall return values
We were returning `int`s from two functions that caused `ErrorOr` to
not recognize the error codes as a special case. For example,
`ETIMEDOUT` was returned as the positive number 66 resulting in all
kinds of defective behavior.

As a result, SDL2's timer subsystem was not working at all, since the
`SDL_MUTEX_TIMEDOUT` value was never returned.
2021-11-24 19:44:57 +01:00
Andreas Kling dd6e73176d Kernel: Make sys$mmap() interpret 0-alignment as page-sized alignment
This allows userspace to get a sane default behavior without having to
specify the page size.
2021-11-23 11:44:42 +01:00
Andreas Kling f99af1bef0 Kernel: Make sure OpenFileDescription is kept alive while read() blocks
It's not safe to store OpenFileDescription in a raw pointer when
blocking, since another thread may decide to close the corresponding
file descriptor.
2021-11-21 20:22:48 +01:00
Andreas Kling f2c3a41a8f Kernel: Make UserOrKernelBuffer::for_user_buffer() return ErrorOr<T>
This simplifies EFAULT propagation with TRY(). :^)
2021-11-21 20:22:48 +01:00
Itamar 38ddf301f6 Kernel+LibC: Fix ptrace for 64-bit
This makes the types used in the PT_PEEK and PT_POKE actions
suitable for 64-bit platforms as well.
2021-11-20 21:22:24 +00:00
Andreas Kling e08d213830 Kernel: Use DistinctNumeric for filesystem ID's
This patch adds the FileSystemID type, which is a distinct u32.
This prevents accidental conversion from arbitrary integers.
2021-11-18 21:11:30 +01:00
Andreas Kling 32aa37d5dc Kernel+LibC: Add msync() system call
This allows userspace to trigger a full (FIXME) flush of a shared file
mapping to disk. We iterate over all the mapped pages in the VMObject
and write them out to the underlying inode, one by one. This is rather
naive, and there's lots of room for improvement.

Note that shared file mappings are currently not possible since mmap()
returns ENOTSUP for PROT_WRITE+MAP_SHARED. That restriction will be
removed in a subsequent commit. :^)
2021-11-17 19:34:15 +01:00
Andrew Kaster f1d8978804 AK+Kernel: Remove implicit conversion from Userspace<T*> to FlatPtr
This feels like it was a refactor transition kind of conversion. The
places that were relying on it can easily be changed to explicitly ask
for the ptr() or a new vaddr() method on Userspace<T*>.

FlatPtr can still implicitly convert to Userspace<T> because the
constructor is not explicit, but there's quite a few more places that
are relying on that conversion.
2021-11-16 00:13:22 +01:00
Andrew Kaster 194456efdc Kernel: Remove unnecessary StringBuilder from sys$create_thread()
A series of refactors changed Threads to always have a name, and to
store their name as a KString. Before the refactors a StringBuilder was
used to format the default thread name for a non-main thread, but it is
since unused. Remove it and the AK/String related header includes from
the thread syscall implementation file.
2021-11-16 00:13:22 +01:00
Daniel Bertalan 648a139af3 Kernel+LibC: Pass off_t to pread() via a pointer
`off_t` is a 64-bit signed integer, so passing it in a register on i686
is not the best idea.

This fix gets us one step closer to making the LLVM port work.
2021-11-13 10:04:46 +01:00
Andreas Kling 88b6428c25 AK: Make Vector::try_* functions return ErrorOr<void>
Instead of signalling allocation failure with a bool return value
(false), we now use ErrorOr<void> and return ENOMEM as appropriate.
This allows us to use TRY() and MUST() with Vector. :^)
2021-11-10 21:58:58 +01:00
Ben Wiederhake ad5061bb7a Kernel: Make (f)statvfs report filesystem ID correctly 2021-11-10 16:13:10 +01:00
Ben Wiederhake 631447da57 Kernel: Fix TOCTOU in fstatvfs
In particular, fstatvfs used to assume that a file that was earlier
opened using some path will forever be at that path. This is wrong, and
in the meantime new mounts and new filesystems could take up the
filename or directories, leading to a completely inaccurate result.
This commit improves the situation:
- All filesystem information is now always accurate.
- The mount flags *might* be erroneously zero, if the custody for the
  open file is not available. I don't know when that might happen, but
  it is definitely not the typical case.
2021-11-10 16:13:10 +01:00
Andreas Kling 79fa9765ca Kernel: Replace KResult and KResultOr<T> with Error and ErrorOr<T>
We now use AK::Error and AK::ErrorOr<T> in both kernel and userspace!
This was a slightly tedious refactoring that took a long time, so it's
not unlikely that some bugs crept in.

Nevertheless, it does pass basic functionality testing, and it's just
real nice to finally see the same pattern in all contexts. :^)
2021-11-08 01:10:53 +01:00
Brian Gianforcaro 9f6eabd73a Kernel: Move TTY subsystem to use KString instead of AK::String
This is minor progress on removing the `AK::String` API from the Kernel
in the interest of improving OOM safety.
2021-11-02 11:34:31 +01:00
Ben Wiederhake c05c5a7ff4 Kernel: Clarify ambiguous {File,Description}::absolute_path
Found due to smelly code in InodeFile::absolute_path.

In particular, this replaces the following misleading methods:

File::absolute_path
This method *never* returns an actual path, and if called on an
InodeFile (which is impossible), it would VERIFY_NOT_REACHED().

OpenFileDescription::try_serialize_absolute_path
OpenFileDescription::absolute_path
These methods do not guarantee to return an actual path (just like the
other method), and just like Custody::absolute_path they do not
guarantee accuracy. In particular, just renaming the method made a
TOCTOU bug obvious.

The new method signatures use KResultOr, just like
try_serialize_absolute_path() already did.
2021-10-31 12:06:28 +01:00
Ben Wiederhake 88ca12f037 Kernel: Enable early-returns from VFS::for_each_mount 2021-10-31 12:06:28 +01:00
Ben Wiederhake 735da58d44 Kernel: Avoid OpenFileDescription::absolute_path 2021-10-31 12:06:28 +01:00
James Mintram 0fbeac6011 Kernel: Split SmapDisabler so header is platform independent
A new header file has been created in the Arch/ folder while the
implementation has been moved into a CPP living in the X86 folder.
2021-10-15 21:48:45 +01:00
Rodrigo Tobar e1093c3403 Kernel: Implement pread syscall
The OpenFileDescription class already offers the necessary functionlity,
so implementing this was only a matter of following the structure for
`read` while handling the additional `offset` argument.
2021-10-13 16:10:50 +02:00
Rodrigo Tobar 8936b111a7 Kernel: Factor out common code from read/readv syscalls
Having these bits of code factored out not only prevents duplication
now, but will also allow us to implement pread without repeating
ourselves (too much).
2021-10-13 16:10:50 +02:00
Rodrigo Tobar bf4e536f00 Kernel: Correctly interpret ioctl's FIONBIO user value
Values in `ioctl` are given through a pointer, but ioctl's FIONBIO
implementation was interpreting this pointer as an integer directly.
This meant that programs using `ioctl` to set a file descriptor in
blocking mode met with incorrect behavior: they passed a non-null
pointer pointing to a value of 0, but the kernel interpreted the pointer
as a non-zero integer, thus making the file non-blocking.

This commit fixes this behavior by reading the value from the userspace
pointer and using that to set the non-blocking flag on the file
descriptor.

This bug was found while trying to run the openssl tool on serenity,
which used `ioctl` to ensure newly-created sockets are in blocking mode.
2021-10-11 10:46:01 -07:00
Nico Weber 1cdb12e920 Kernel: Fix -Wunreachable-code warnings from clang 2021-10-08 23:33:46 +02:00
Brian Gianforcaro 0223faf6f4 Kernel: Access MemoryManager static functions statically
SonarCloud flagged this "Code Smell", where we are accessing these
static methods as if they are instance methods. While it is technically
possible, it is very confusing to read when you realize they are static
functions.
2021-10-02 18:16:15 +02:00
Nico Weber 5a951d6258 Kernel: Fix a few typos 2021-10-01 00:51:49 +01:00
Eric Seifert 8924b1f532 Kernel: Allow PROT_NONE in mmap and mprotect for stack regions
To allow for userspace guard pages (ruby uses this).
Redundant since serenity creates them automatically,
but should be allowed anyway.
2021-09-23 04:14:41 +00:00
Ben Wiederhake 12247fe9b4 Kernel: Use AK::Variant default initialization where appropriate 2021-09-21 04:22:52 +04:30
Eric Seifert edbc5489a8 Kernel: Add support for O_NONBLOCK in pipe syscall
While working on a port, I saw a pipe creation fail due to missing
nonblock support in pipe syscall.
2021-09-19 12:20:16 +02:00
Itamar bb1ad759c5 Kernel: Allow calling sys$waitid on traced, non-child processes
Previously, attempting to call sys$waitid on non-child processes
returned ECHILD.

That prevented debugging non-child processes by attaching to them during
runtime (as opposed to forking and debugging the child, which is what
was previously supported).

We now allow calling sys$waitid on a any process that is being traced
by us, even if it's not our child.
2021-09-16 23:47:46 +02:00
Brian Gianforcaro e8ec1e908d Kernel: Only instantiate main_program_metadata in the scope it's needed
pvs-studio flagged this as a potential perf optimization.
2021-09-16 17:17:13 +02:00
Andreas Kling b6efd66d56 Kernel: Use move semantics in sys$sendfd()
Avoid an unnecessary NonnullRefPtr<OpenFileDescription> copy.
2021-09-15 21:09:47 +02:00
Liav A 8d0dbdeaac Kernel+Userland: Introduce a new way to reboot and poweroff the machine
This change removes the halt and reboot syscalls, and create a new
mechanism to change the power state of the machine.
Instead of how power state was changed until now, put a SysFS node as
writable only for the superuser, that with a defined value, can result
in either reboot or poweroff.
In the future, a power group can be assigned to this node (which will be
the GroupID responsible for power management).

This opens an opportunity to permit to shutdown/reboot without superuser
permissions, so in the future, a userspace daemon can take control of
this node to perform power management operations without superuser
permissions, if we enforce different UserID/GroupID on that node.
2021-09-12 11:52:16 +02:00
Liav A 9132596b8e Kernel: Move ACPI and BIOS code into the new Firmware directory
This will somwhat help unify them also under the same SysFS directory in
the commit.
Also, it feels much more like this change reflects the reality that both
ACPI and the BIOS are part of the firmware on x86 computers.
2021-09-12 11:52:16 +02:00
TheFightingCatfish a81b21c1a7 Kernel+LibC: Implement fsync 2021-09-12 11:24:02 +02:00
Liav A 04ba31b8c5 Kernel+Userland: Remove loadable kernel moduless
These interfaces are broken for about 9 months, maybe longer than that.
At this point, this is just a dead code nobody tests or tries to use, so
let's remove it instead of keeping a stale code just for the sake of
keeping it and hoping someone will fix it.

To better justify this, I read that OpenBSD removed loadable kernel
modules in 5.7 release (2014), mainly for the same reason we do -
nobody used it so they had no good reason to maintain it.
Still, OpenBSD had LKMs being effectively working, which is not the
current state in our project for a long time.
An arguably better approach to minimize the Kernel image size is to
allow dropping drivers and features while compiling a new image.
2021-09-11 19:05:00 +02:00
Linus Groh f646d49ac1 Kernel: Add _SC_HOST_NAME_MAX 2021-09-11 00:28:39 +02:00
Andreas Kling dd82f68326 Kernel: Use KString all the way in sys$execve()
This patch converts all the usage of AK::String around sys$execve() to
using KString instead, allowing us to catch and propagate OOM errors.

It also required changing the kernel CommandLine helper class to return
a vector of KString for the userspace init program arguments.
2021-09-09 21:25:10 +02:00
Andreas Kling 524ef5e475 Kernel: Add KBuffer::bytes() and use it
(Instead of hand-wrapping { data(), size() } in a bunch of places.)
2021-09-08 20:16:00 +02:00
Liav A 3d5ddbab74 Kernel: Rename DevFS => DevTmpFS
The current implementation of DevFS resembles the linux devtmpfs, and
not the traditional DevFS, so let's rename it to better represent the
direction of the development in regard to this filesystem.

The abbreviation for DevTmpFS is still "dev", because it doesn't add
value as a commandline option to make it longer.

In quick summary - DevFS in unix OSes is simply a static filesystem, so
device nodes are generated and removed by the kernel code. DevTmpFS
is a "modern reinvention" of the DevFS, so it is much more like a TmpFS
in the sense that not only it's stored entirely in RAM, but the userland
is responsible to add and remove devices nodes as it sees fit, and no
kernel code is directly being involved to keep the filesystem in sync.
2021-09-08 00:42:20 +02:00
Andreas Kling a01b19c878 Kernel: Remove KBuffer::try_copy() in favor of try_create_with_bytes()
These were already equivalent, so let's only have one of them.
2021-09-07 16:22:29 +02:00
Andreas Kling b300f9aa2f Kernel: Convert KBuffer::copy() => KBuffer::try_copy()
This was a weird KBuffer API that assumed failure was impossible.
This patch converts it to a modern KResultOr<NonnullOwnPtr<KBuffer>> API
and updates the two clients to the new style.
2021-09-07 15:36:39 +02:00
Andreas Kling 250b52d6e5 Kernel: Make KBuffer::try_create_with_bytes() return KResultOr 2021-09-07 15:22:24 +02:00
Andreas Kling ed5d04b0ea Kernel: Use KResultOr and TRY() for FIFO 2021-09-07 13:58:16 +02:00