Commit graph

195 commits

Author SHA1 Message Date
Shannon Booth b3bf5c4ea8 AK: Add BOM handling to String::from_utf8_with_replacement_character 2024-08-12 06:38:58 -04:00
Shannon Booth 033ea0e7fb AK: Add String::from_utf8_with_replacement_character
This takes a byte sequence and converts it to a UTF-8 string with the
replacement character.
2024-08-10 10:39:43 +02:00
Timothy Flynn 29879a69a4 AK: Construct Strings from StringBuilder without re-allocating the data
Currently, invoking StringBuilder::to_string will re-allocate the string
data to construct the String. This is wasteful both in terms of memory
and speed.

The goal here is to simply hand the string buffer over to String, and
let String take ownership of that buffer. To do this, StringBuilder must
have the same memory layout as Detail::StringData. This layout is just
the members of the StringData class followed by the string itself.

So when a StringBuilder is created, we reserve sizeof(StringData) bytes
at the front of the buffer. StringData can then construct itself into
the buffer with placement new.

Things to note:
* StringData must now be aware of the actual capacity of its buffer, as
  that can be larger than the string size.
* We must take care not to pass ownership of inlined string buffers, as
  these live on the stack.
2024-07-20 06:45:49 +02:00
Timothy Flynn 0c14a9417a AK: Replace converting to and from UTF-16 with simdutf
The one behavior difference is that we will now actually fail on invalid
code units with Utf16View::to_utf8(AllowInvalidCodeUnits::No). It was
arguably a bug that this wasn't already the case.
2024-07-18 14:46:25 +02:00
Timothy Flynn fe3fde2411 AK+LibUnicode: Implement a case-insensitive variant of find_byte_offset
The existing String::find_byte_offset is case-sensitive. This variant
allows performing searches using Unicode-aware case folding.
2024-06-01 07:37:54 +02:00
Jess ecb7d4b40f LibJS: Throw RangeError in StringPrototype::repeat if OOM
currently crashes with an assertion failure in `String::repeated` if
malloc can't serve a `count * input_size` sized request, so add
`String::repeated_with_error` to propagate the error.
2024-04-20 19:23:46 -04:00
Timothy Flynn de80f544d8 AK: Disallow calling String methods that return a view on rvalues
This prevents, for example:

    StringView view = "foo"_string.bytes_as_string_view();

This prevents a class of potential UAF.
2024-04-04 11:23:21 +02:00
Andreas Kling a88799c032 AK: Remove excessive hashing caused by FlyString table
Before this change, the global FlyString table looked like this:

    HashMap<StringView, Detail::StringBase>

After this change, we have:

    HashTable<Detail::StringData const*, FlyStringTableHashTraits>

The custom hash traits are used to extract the stored hash from
StringData which avoids having to rehash the StringView repeatedly like
we did before.

This necessitated a handful of smaller changes to make it work.
2024-03-24 13:28:24 +01:00
Dan Klishch fa52f68142 AK: Store data in FlyString as StringBase
Unfortunately, it is not clear to me how to split this commit into
several atomic ones.
2024-01-21 16:16:15 -07:00
Dan Klishch e7700e16ee AK: Forward substring creation with shared superstring to StringBase 2024-01-21 16:16:15 -07:00
Dan Klishch 7506736869 AK: Stop using ShortString in String::from_code_point
Refactor it to use StringBase::replace_with_new_short_string instead.
2024-01-21 16:16:15 -07:00
Dan Klishch d6290c4684 AK: Move String::hash() and String::String() to StringBase 2024-01-21 16:16:15 -07:00
Dan Klishch 1b09a1851e AK: Move String::~String() and String::destroy_string() to StringBase 2024-01-21 16:16:15 -07:00
Dan Klishch 54d149bc25 AK: Move String::bytes() and String::operator==(String) to StringBase
The idea is to eventually get rid of protected state in StringBase. To
do this, we first need to remove all references to m_data and
m_short_string from String.
2024-01-21 16:16:15 -07:00
Dan Klishch 4364a28d3d AK: Move data fields from AK::String to a newly created AK::StringBase
This starts separating memory management of string data and string
utilities like `String::formatted`. This would also allow to reuse the
same storage in `DeprecatedString` in the future.
2024-01-21 16:16:15 -07:00
Dan Klishch 7f8d69ee2f AK: Remove explicit String::operator!= in favor of defaulted one 2024-01-21 16:16:15 -07:00
Andreas Kling 3c039903fb LibTextCodec+AK: Don't validate UTF-8 strings twice
UTF8Decoder was already converting invalid data into replacement
characters while converting, so we know for sure we have valid UTF-8
by the time conversion is finished.

This patch adds a new StringBuilder::to_string_without_validation()
and uses it to make UTF8Decoder avoid half the work it was doing.
2023-12-30 13:49:50 +01:00
Andreas Kling a285e36041 LibJS+AK: Make String.prototype.repeat() way faster
Instead of using a StringBuilder, add a String::repeated(String, N)
overload that takes advantage of knowing it's already all UTF-8.

This makes the following microbenchmark go 4x faster:

    "foo".repeat(100_000_000)

And for single character strings, we can even go 10x faster:

    "x".repeat(100_000_000)
2023-12-30 13:49:50 +01:00
Shannon Booth cdf84a3e36 AK: Implement StringView::to_number<T> from String::to_number<T>
Do exactly what String does, then use StringView's implementation as
String's new one. This should allow us to call to_number on a
StringView.
2023-12-23 20:41:07 +01:00
Ali Mohammad Pur 5e1499d104 Everywhere: Rename {Deprecated => Byte}String
This commit un-deprecates DeprecatedString, and repurposes it as a byte
string.
As the null state has already been removed, there are no other
particularly hairy blockers in repurposing this type as a byte string
(what it _really_ is).

This commit is auto-generated:
  $ xs=$(ack -l \bDeprecatedString\b\|deprecated_string AK Userland \
    Meta Ports Ladybird Tests Kernel)
  $ perl -pie 's/\bDeprecatedString\b/ByteString/g;
    s/deprecated_string/byte_string/g' $xs
  $ clang-format --style=file -i \
    $(git diff --name-only | grep \.cpp\|\.h)
  $ gn format $(git ls-files '*.gn' '*.gni')
2023-12-17 18:25:10 +03:30
Shannon Booth 5f2f26451d AK: Disallow String::from_utf8 on FlyString and String 2023-12-10 09:45:03 +01:00
Bastiaan van der Plaat 4a7d3115c9 AK: Add String to number floating point support 2023-12-04 19:54:43 +00:00
Shannon Booth 6b32a1f18f AK+LibUnicode: Expose TrailingCodePointTransformation in to_titlecase
Relocating the definition of this enum from LibUnicode to AK.
2023-11-28 17:15:27 -05:00
Tim Schumacher a2f60911fe AK: Rename GenericTraits to DefaultTraits
This feels like a more fitting name for something that provides the
default values for Traits.
2023-11-09 10:05:51 -05:00
Andreas Kling 0902f552a3 AK: Bring some missing DeprecatedString API over to String
Specifically, case sensitivity parameters for starts/ends with,
and the equals_ignoring_ascii_case() helper.
2023-11-04 21:28:30 +01:00
Andreas Kling 1e820385d9 AK: Add case-insensitive hashing for the new String classes
Bringing over this functionality from DeprecatedString.
2023-09-06 11:29:03 -04:00
Lucas CHOLLET fde26c53f0 AK: Remove the API to explicitly construct short strings
Now that ""_string is infallible, the only benefit of explicitly
constructing a short string is the ability to do it at compile-time. But
we never do that, so let's simplify the API and remove this
implementation detail from it.
2023-08-08 07:37:21 +02:00
Andreas Kling 34344120f2 AK: Make "foo"_string infallible
Stop worrying about tiny OOMs.

Work towards #20405.
2023-08-07 16:03:27 +02:00
aryanbaburajan a94c0eea94 AK: Add trim_ascii_whitespace method to String 2023-08-06 22:21:10 +02:00
Andrew Kaster 3533d3e452 AK: Enable consteval workaround for Android NDK
Android isn't shipping clang-15 yet in any NDK, so use the existing
workaround on that platform.
2023-07-19 04:22:28 -06:00
Andrew Kaster bfd6deed1e AK+Meta: Disable consteval completely when building for oss-fuzz
This was missed in 02b74e5a70

We need to disable consteval in AK::String as well as AK::StringView,
and we need to disable it when building both the tools build and the
fuzzer build.
2023-06-29 15:55:54 -06:00
Hendiadyoin1 ca0106ba1d AK: Forbid from_utf8 and from_deprecated_{...} with unintended types
Calling `from_utf8` with a DeprecatedString will hide the fact that we
have a DeprecatedString, while using `from_deprecated_string` with a
StringView will silently and needlessly allocate a DeprecatedString,
so let's forbid that.
2023-06-13 01:49:02 +02:00
Timothy Flynn d6b786b3fe AK: Use consteval String factories on macOS
Xcode 14.3 ships with clang 15, which supports our usage of consteval to
validate short strings at compile time.
2023-05-08 20:54:31 -06:00
thankyouverycool 9a03e4dd73 AK: Add count() helper to String 2023-04-30 05:48:14 +02:00
Andreas Kling d517e7fb3a AK: Make FlyString::hash() use the cached hash in StringData if possible
This avoids rehashing the string every time.
2023-03-09 21:54:59 +01:00
Timothy Flynn 1393ed2000 AK+LibUnicode: Implement String::equals_ignoring_case without allocating
We currently fully casefold the left- and right-hand sides to compare
two strings with case-insensitivity. Now, we casefold one code point at
a time, storing the result in a view for comparison, until we exhaust
both strings.
2023-03-08 18:57:53 +00:00
Timothy Flynn 515fca4f7a AK: Make String::contains(code_point) handle non-ASCII
We currently only accept a char, instead of a full code point.
2023-03-08 14:16:47 +00:00
Timothy Flynn f882581e91 AK: Make String::{starts,ends}_with(code_point) handle non-ASCII
We currently pass the code point to StringView::{starts,ends}_with,
which actually accepts a single char, thus cannot handle non-ASCII
code points.
2023-03-08 14:16:47 +00:00
Timothy Flynn da0d000909 AK: Ensure short String instances are valid UTF-8
We are currently only validating long strings.
2023-03-03 11:46:42 -05:00
Linus Groh 45dc3d8a3e AK: Add String::ends_with{,_bytes}() 2023-03-03 11:02:21 +00:00
Ali Mohammad Pur 79e4027480 AK: Add two starts_with{bytes,}() APIs to String 2023-02-28 15:52:24 +03:30
Timothy Flynn 5eec76b441 AK: Use the same consteval condition on _short_string as its factory
This fixes the build with Apple Clang.
2023-02-25 22:25:05 +01:00
Linus Groh 85414d9338 AK: Add operator""_{short_,}string to create a String from a literal
We briefly discussed this when adding the new String type but couldn't
settle on a name. However, having to use String::from_utf8() on every
literal string is a bit unwieldy, so let's have these options available!

Naming-wise '_string' is not as short as 'sv' but should be relatively
clear; it also matches '_bigint' and '_ubigint' in length.
'_short_string' may be longer than the actual string itself, but it's
still an improvement over the static function :^)

Since our C++ source files are UTF-8 encoded anyway, it should be
impossible to create a string literal with invalid UTF-8, so including
that in the name is not as important as in the function that can receive
arbitrary data.
2023-02-25 20:51:49 +01:00
Andrew Kaster 0ea697ace5 AK: Add String::from_stream method
The caller is responsible for determining how long the string is that
they want to read.
2023-02-21 10:57:44 +01:00
Andreas Kling e08c55dd8d AK: Make String const-correct internally 2023-02-21 00:54:04 +01:00
nipos c31b547fae AK: Use constexpr instead of consteval on OpenBSD 2023-02-04 16:11:54 -07:00
Timothy Flynn c59268d15b AK: Add String::trim 2023-01-28 00:13:46 +00:00
Timothy Flynn cccaa94767 AK: Add String::join 2023-01-28 00:13:46 +00:00
Timothy Flynn c35b1371a3 AK: Add an overload of String::find_byte_offset for StringView 2023-01-27 18:00:17 +00:00
Timothy Flynn 76fd5f2756 AK: Add convenience substring wrappers to String to exclude a length
These overloads exist on other string classes and are used throughout
the code base.
2023-01-24 16:23:50 -05:00