class UtfString implements ArrayAccess, Stringable (View source)

Implementation for UTF-8 strings.

The subscript operator in PHP, when used with string will return a byte and not a character. Because in UTF-8 strings a character may occupy more than one byte, the subscript operator may return an invalid character.

Because the lexer relies on the subscript operator this class had to be implemented.

Implements array-like access for UTF-8 strings.

In this library, this class should be used to parse UTF-8 queries.

Properties

string $str

The raw, multi-byte string.

int $byteIdx

The index of current byte.

int $charIdx

The index of current character.

int $byteLen

The length of the string (in bytes).

int $charLen

The length of the string (in characters).

static protected array<int|string,int> $asciiMap

A map of ASCII binary values to their ASCII code This is to improve performance and avoid calling ord($byte)

Methods

__construct(string $str)

No description

bool
offsetExists(int $offset)

Checks if the given offset exists.

string|null
offsetGet(int $offset)

Gets the character at given offset.

void
offsetSet(int $offset, string $value)

Sets the value of a character.

void
offsetUnset(int $offset)

Unsets an index.

static int
getCharLength(string $byte)

Gets the length of an UTF-8 character.

int
length()

Returns the length in characters of the string.

string
__toString()

Returns the contained string.

Details

__construct(string $str)

No description

Parameters

string $str

the string

bool offsetExists(int $offset)

Checks if the given offset exists.

Parameters

int $offset

the offset to be checked

Return Value

bool

string|null offsetGet(int $offset)

Gets the character at given offset.

Parameters

int $offset

the offset to be returned

Return Value

string|null

void offsetSet(int $offset, string $value)

Sets the value of a character.

Parameters

int $offset

the offset to be set

string $value

the value to be set

Return Value

void

Exceptions

Exception

void offsetUnset(int $offset)

Unsets an index.

Parameters

int $offset

the value to be unset

Return Value

void

Exceptions

Exception

static int getCharLength(string $byte)

Gets the length of an UTF-8 character.

According to RFC 3629, a UTF-8 character can have at most 4 bytes. However, this implementation supports UTF-8 characters containing up to 6 bytes.

Parameters

string $byte

the byte to be analyzed

Return Value

int

See also

https://tools.ietf.org/html/rfc3629

int length()

Returns the length in characters of the string.

Return Value

int

string __toString()

Returns the contained string.

Return Value

string