Commit graph

247 commits

Author SHA1 Message Date
Andreas Kling deff554096 Kernel+LibSystem: Add a 4th syscall argument
Let's allow passing 4 function arguments to a syscall. The 4th argument
goes into ESI or RSI.
2021-07-25 14:08:50 +02:00
Brian Gianforcaro a3787b9db7 Kernel: Remove another ARCH ifdef using RegisterState::flags() 2021-07-23 19:02:25 +02:00
Gunnar Beutner 31f30e732a Everywhere: Prefix hexadecimal numbers with 0x
Depending on the values it might be difficult to figure out whether a
value is decimal or hexadecimal. So let's make this more obvious. Also
this allows copying and pasting those numbers into GNOME calculator and
probably also other apps which auto-detect the base.
2021-07-22 08:57:01 +02:00
Brian Gianforcaro 120b9bc05b Kernel: Conditionally acquire the big lock based on syscall metadata 2021-07-20 03:21:14 +02:00
Brian Gianforcaro 354e18a5a0 Kernel: Move validate_syscall_preconditions outside of the big lock
Now that we hold the space lock for the duration of the validation
it should be safe to move the validation outside the big lock.
2021-07-20 03:21:14 +02:00
Brian Gianforcaro 27e1120dff Kernel: Move syscall precondition validates to MM
Move these to MM to simplify the flow of the syscall handler.

While here, also make sure we hold the process space lock for
the duration of the validation to avoid potential issues where
another thread attempts to modify the process space during the
validation. This will allow us to move the validation out of the
big process lock scope in a future change.

Additionally utilize the new no_lock variants of functions to avoid
unnecessary recursive process space spinlock acquisitions.
2021-07-20 03:21:14 +02:00
Brian Gianforcaro af543328ea Kernel: Instrument syscalls with their process big lock requirements
Currently all syscalls run under the Process:m_big_lock, which is an
obvious bottleneck. Long term we would like to remove the big lock and
replace it with the required fine grained locking.

To facilitate this goal we need a way of gradually decomposing the big
lock into the all of the required fine grained locks. This commit
introduces instrumentation to the syscall table, allowing the big lock
requirement to be toggled on/off per syscall.

Eventually when we are finished, no syscall will required the big lock,
and we'll be able to remove all of this instrumentation.
2021-07-20 03:21:14 +02:00
Brian Gianforcaro 308396bca1 Kernel: No lock validate_user_stack variant, switch to Space as argument
The entire process is not needed, just require the user to pass in the
Space. Also provide no_lock variant to use when you already have the
VM/Space lock acquired, to avoid unnecessary recursive spinlock
acquisitions.
2021-07-20 03:21:14 +02:00
Brian Gianforcaro 1cffecbe8d Kernel: Push ARCH specific ifdef's down into RegisterState functions
The non CPU specific code of the kernel shouldn't need to deal with
architecture specific registers, and should instead deal with an
abstract view of the machine. This allows us to remove a variety of
architecture specific ifdefs and helps keep the code slightly more
portable.

We do this by exposing the abstract representation of instruction
pointer, stack pointer, base pointer, return register, etc on the
RegisterState struct.
2021-07-19 08:46:55 +02:00
Tom d7e5521a04 Kernel: Ignore subsequent calls to Process::die
It's possible that another thread might try to exit the process just
about the same time another thread does the same, or a crash happens.
Also, we may not be able to kill all other threads instantly as they
may be blocked in the kernel (though in this case they would get killed
before ever returning back to user mode. So keep track of whether
Process::die was already called and ignore it on subsequent calls.

Fixes #8485
2021-07-14 12:30:41 +02:00
Tom fa8fe40266 Revert "Kernel: Make sure threads which don't do any syscalls are t..."
This reverts commit 3c3a1726df.

We cannot blindly kill threads just because they're not executing in a
system call. Being blocked (including in a page fault) needs proper
unblocking and potentially kernel stack cleanup before we can mark a
thread as Dying.

Fixes #8691
2021-07-13 20:23:10 +02:00
Hendiadyoin1 9b7e48c6bd Kernel: Replace raw asm functions with naked ones 2021-07-05 16:40:00 +02:00
Gunnar Beutner 52f9aaa823 Kernel: Use the GS segment for the per-CPU struct
Right now we're using the FS segment for our per-CPU struct. On x86_64
there's an instruction to switch between a kernel and usermode GS
segment (swapgs) which we could use.

This patch doesn't update the rest of the code to use swapgs but it
prepares for that by using the GS segment instead of the FS segment.
2021-07-02 23:33:17 +02:00
Gunnar Beutner e37576440d Kernel: Fix stack alignment on x86_64
These were already properly aligned (as far as I can tell).
2021-06-30 15:13:30 +02:00
Gunnar Beutner b6435372cc Kernel: Implement syscall entry for x86_64 2021-06-28 22:29:28 +02:00
Gunnar Beutner 233ef26e4d Kernel+Userland: Add x86_64 registers to RegisterState/PtraceRegisters 2021-06-27 15:46:42 +02:00
Hendiadyoin1 62f9377656 Kernel: Move special sections into Sections.h
This also removes a lot of CPU.h includes infavor for Sections.h
2021-06-24 00:38:23 +02:00
Hendiadyoin1 7ca3d413f7 Kernel: Pull apart CPU.h
This does not add any functional changes
2021-06-24 00:38:23 +02:00
Gunnar Beutner 3c3a1726df Kernel: Make sure threads which don't do any syscalls are terminated
Steps to reproduce:

$ cat loop.c
int main() { for (;;); }
$ gcc -o loop loop.c
$ ./loop

Terminating this process wasn't previously possible because we only
checked whether the thread should be terminated on syscall exit.
2021-06-19 12:55:00 +02:00
Ali Mohammad Pur 50349de38c Meta: Disable -Wmaybe-uninitialized
It's prone to finding "technically uninitialized but can never happen"
cases, particularly in Optional<T> and Variant<Ts...>.
The general case seems to be that it cannot infer the dependency
between Variant's index (or Optional's boolean state) and a particular
alternative (or Optional's buffer) being untouched.
So it can flag cases like this:
```c++
if (index == StaticIndexForF)
    new (new_buffer) F(move(*bit_cast<F*>(old_buffer)));
```
The code in that branch can _technically_ make a partially initialized
`F`, but that path can never be taken since the buffer holding an
object of type `F` and the condition being true are correlated, and so
will never be taken _unless_ the buffer holds an object of type `F`.

This commit also removed the various 'diagnostic ignored' pragmas used
to work around this warning, as they no longer do anything.
2021-06-09 23:05:32 +04:30
Gunnar Beutner 42d667645d Kernel: Make sure we free the thread stack on thread exit
This adds two new arguments to the thread_exit system call which let
a thread unmap an arbitrary VM range on thread exit. LibPthread
uses this functionality to unmap the thread stack.

Fixes #7267.
2021-05-29 15:53:08 +02:00
Gunnar Beutner 4fca9ee060 Kernel: Allow building the kernel with -O0
Unfortunately the kernel doesn't run with -O0 but at least it can be
successfully built with this change.
2021-05-28 19:52:22 +01:00
Gunnar Beutner 55ae52fdf8 Kernel: Enable building the kernel with -flto
GCC with -flto is more aggressive when it comes to inlining and
discarding functions which is why we must mark some of the functions
as NEVER_INLINE (because they contain asm labels which would be
duplicated in the object files if the compiler decides to inline
the function elsewhere) and __attribute__((used)) for others so
that GCC doesn't discard them.
2021-04-29 20:26:36 +02:00
Brian Gianforcaro 1682f0b760 Everything: Move to SPDX license identifiers in all files.
SPDX License Identifiers are a more compact / standardized
way of representing file license information.

See: https://spdx.dev/resources/use/#identifiers

This was done with the `ambr` search and replace tool.

 ambr --no-parent-ignore --key-from-file --rep-from-file key.txt rep.txt *
2021-04-22 11:22:27 +02:00
Andreas Kling c6b7b98b64 Kernel: Don't consider kernel memory regions for syscall origin check
We should never enter the syscall handler from a kernel address.
2021-04-20 23:38:27 +02:00
Gunnar Beutner f033416893 Kernel+LibC: Clean up how assertions work in the kernel and LibC
This also brings LibC's abort() function closer to the spec.
2021-04-18 11:11:15 +02:00
Brian Gianforcaro a973b22359 Kernel: Suppress maybe-uninitialized' warning s_syscall_table in gcc-10.3.0 2021-04-14 21:49:54 +02:00
Brian Gianforcaro 1d7a0ab5ea Kernel: Mark s_syscall_table const so it ends up in ro_data. 2021-04-12 23:56:57 +02:00
Hendiadyoin1 0d934fc991 Kernel::CPU: Move headers into common directory
Alot of code is shared between i386/i686/x86 and x86_64
and a lot probably will be used for compatability modes.
So we start by moving the headers into one Directory.
We will probalby be able to move some cpp files aswell.
2021-03-21 09:35:23 +01:00
Andreas Kling a166a65eff Kernel: Don't return -EFOO when return type is KResultOr<...> 2021-03-15 09:09:04 +01:00
Andreas Kling 73e06a1983 Kernel: Convert klog() => AK::Format in a handful of places 2021-03-12 15:22:35 +01:00
Andreas Kling adb2e6be5f Kernel: Make the kernel compile & link for x86_64
It's now possible to build the whole kernel with an x86_64 toolchain.
There's no bootstrap code so it doesn't work yet (obviously.)
2021-03-04 18:25:01 +01:00
Andreas Kling dce030eefc Kernel: Use RDTSC instead of get_fast_random() for syscall stack noise
This was the original approach before we switched to get_fast_random()
which wasn't fast enough, so we added a buffer.

Unfortunately that buffer is racy and we can actually skid past the end
of it and continue fetching "random" offsets from the adjacent memory
for a while, until we run out of kernel data segment and trip a fault.

Instead of making this even more convoluted, let's just go back to the
pleasantly simple (RDTSC & 0xff) approach. :^)

Fixes #4912.
2021-03-02 14:25:38 +01:00
Andreas Kling 14aa8e3708 Kernel: Oops, SC_abort was actually calling sys$exit_thread() 2021-03-01 19:47:16 +01:00
Andreas Kling 261b30e120 Kernel: Detach any attached thread tracer on sys$abort() 2021-03-01 13:57:20 +01:00
Andreas Kling ac71775de5 Kernel: Make all syscall functions return KResultOr<T>
This makes it a lot easier to return errors since we no longer have to
worry about negating EFOO errors and can just return them flat.
2021-03-01 13:54:32 +01:00
Andreas Kling 4aa58aaab5 Kernel: Don't disable interrupts while exiting a thread or process
This was another vestige from a long time ago, when exiting a thread
would mutate global data structures that were only protected by the
interrupt flag.
2021-02-25 19:36:36 +01:00
Andreas Kling 5d180d1f99 Everywhere: Rename ASSERT => VERIFY
(...and ASSERT_NOT_REACHED => VERIFY_NOT_REACHED)

Since all of these checks are done in release builds as well,
let's rename them to VERIFY to prevent confusion, as everyone is
used to assertions being compiled out in release.

We can introduce a new ASSERT macro that is specifically for debug
checks, but I'm doing this wholesale conversion first since we've
accumulated thousands of these already, and it's not immediately
obvious which ones are suitable for ASSERT.
2021-02-23 20:56:54 +01:00
Andreas Kling cc0f5917d3 Kernel: Slap a handful more things with UNMAP_AFTER_INIT 2021-02-20 00:00:19 +01:00
Andreas Kling 4188373020 Kernel: Fix TOCTOU in syscall entry region validation
We were doing stack and syscall-origin region validations before
taking the big process lock. There was a window of time where those
regions could then be unmapped/remapped by another thread before we
proceed with our syscall.

This patch closes that window, and makes sys$get_stack_bounds() rely
on the fact that we now know the userspace stack pointer to be valid.

Thanks to @BenWiederhake for spotting this! :^)
2021-02-14 11:47:14 +01:00
Andreas Kling 10b7f6b77e Kernel: Mark handle_crash() as [[noreturn]] 2021-02-14 11:47:14 +01:00
Andreas Kling 3131281747 Kernel: Panic on syscall from process with IOPL != 0
If this happens then the kernel is in an undefined state, so we should
rather panic than attempt to limp along.
2021-02-14 10:51:17 +01:00
Andreas Kling f1b5def8fd Kernel: Factor address space management out of the Process class
This patch adds Space, a class representing a process's address space.

- Each Process has a Space.
- The Space owns the PageDirectory and all Regions in the Process.

This allows us to reorganize sys$execve() so that it constructs and
populates a new Space fully before committing to it.

Previously, we would construct the new address space while still
running in the old one, and encountering an error meant we had to do
tedious and error-prone rollback.

Those problems are now gone, replaced by what's hopefully a set of much
smaller problems and missing cleanups. :^)
2021-02-08 18:27:28 +01:00
Andreas Kling 823186031d Kernel: Add a way to specify which memory regions can make syscalls
This patch adds sys$msyscall() which is loosely based on an OpenBSD
mechanism for preventing syscalls from non-blessed memory regions.

It works similarly to pledge and unveil, you can call it as many
times as you like, and when you're finished, you call it with a null
pointer and it will stop accepting new regions from then on.

If a syscall later happens and doesn't originate from one of the
previously blessed regions, the kernel will simply crash the process.
2021-02-02 20:13:44 +01:00
Tom 0bd558081e Kernel: Track previous mode when entering/exiting traps
This allows us to determine what the previous mode (user or kernel)
was, e.g. in the timer interrupt. This is used e.g. to determine
whether a signal handler should be set up.

Fixes #5096
2021-01-27 21:12:24 +01:00
Andreas Kling 6bfbc5f5f5 Kernel: Don't allow modifying IOPL via sys$ptrace() or sys$sigreturn()
It was possible to overwrite the entire EFLAGS register since we didn't
do any masking in the ptrace and sigreturn syscalls.

This made it trivial to gain IO privileges by raising IOPL to 3 and
then you could talk to hardware to do all kinds of nasty things.

Thanks to @allesctf for finding these issues! :^)

Their exploit/write-up: https://github.com/allesctf/writeups/blob/master/2020/hxpctf/wisdom2/writeup.md
2020-12-22 19:38:25 +01:00
Tom c455fc2030 Kernel: Change wait blocking to Process-only blocking
This prevents zombies created by multi-threaded applications and brings
our model back to closer to what other OSs do.

This also means that SIGSTOP needs to halt all threads, and SIGCONT needs
to resume those threads.
2020-12-12 21:28:12 +01:00
Tom da5cc34ebb Kernel: Fix some issues related to fixes and block conditions
Fix some problems with join blocks where the joining thread block
condition was added twice, which lead to a crash when trying to
unblock that condition a second time.

Deferred block condition evaluation by File objects were also not
properly keeping the File object alive, which lead to some random
crashes and corruption problems.

Other problems were caused by the fact that the Queued state didn't
handle signals/interruptions consistently. To solve these issues we
remove this state entirely, along with Thread::wait_on and change
the WaitQueue into a BlockCondition instead.

Also, deliver signals even if there isn't going to be a context switch
to another thread.

Fixes #4336 and #4330
2020-12-12 21:28:12 +01:00
Tom 046d6855f5 Kernel: Move block condition evaluation out of the Scheduler
This makes the Scheduler a lot leaner by not having to evaluate
block conditions every time it is invoked. Instead evaluate them as
the states change, and unblock threads at that point.

This also implements some more waitid/waitpid/wait features and
behavior. For example, WUNTRACED and WNOWAIT are now supported. And
wait will now not return EINTR when SIGCHLD is delivered at the
same time.
2020-11-30 13:17:02 +01:00
Andreas Kling 1951dfa46a Kernel: Convert dbg() to dbgln() in Syscall.cpp 2020-11-23 16:08:42 +01:00