Please, update your browser.

Text Characters Analyzer

Examine the text in depth: display whitespace characters, non-printable characters, Unicode characters and their code points, as well as the text's byte values in hexadecimal format. Example

Text length: 0
UTF-8 bytes

Letters That Aren't What They Seem

Some spammers exploit visually similar letters from different alphabets, as seen in the word Sаlе, where the Cyrillic characters а and е resemble their Latin counterparts. This service is designed to expose such misleading texting methods. You can also check your own text for occasionally mixing characters. Just copy and paste the text into the box above.

Weird Unicode

In Unicode, each character is represented by a code point. A code point is a unique number assigned to each character in the Unicode standard, which encompasses a wide range of characters from various writing systems, symbols, and emojis. Code points can be encoded in different ways (e.g., UTF-8, UTF-16, UTF-32), which determine how the code points are represented in bytes.

Some characters are simple. For example, the character A has a code point of U+0041, which is represented as bytes 0x00 0x41 in UTF-16BE (big-endian), 0x41 0x00 in UTF-16LE (little-endian), and 0x41 in UTF-8.

The Korean letter has a code point of U+BC14 and is represented as the bytes EB B0 94 in UTF-8.

The thumbs up emoji 👍 has one Unicode code point U+1F44D, and is encoded as the bytes sequence F0 9F 91 8D in UTF-8.

This one emoji 👨🏼‍👩🏽‍👧🏾‍👦 consists of four characters, each with a different skin tone modifier. It consists of the following code points U+1F468 U+1F3FC U+200D U+1F469 U+1F3FD U+200D U+1F467 U+1F3FE U+200D U+1F466, which means:

U+1F468: Man

U+1F3FC: Medium-light skin tone modifier

U+200D: Zero Width Joiner (used to connect characters)

U+1F469: Woman

U+1F3FD: Medium skin tone modifier

U+200D: Zero Width Joiner

U+1F467: Girl

U+1F3FE: Medium-dark skin tone modifier

U+200D: Zero Width Joiner

U+1F466: Boy

What is really weird is that Unicode zero-width characters, such as the Zero Width Space U+200B and Zero Width Joiner U+200D, are often overlooked due to their invisibility in text. However, they can be exploited to hide text or obfuscate code, including in SQL injection attacks. Malicious actors may embed these invisible characters within legitimate strings to bypass security filters, making detection difficult. This highlights the need for developers and security professionals to implement strong validation and sanitization measures to protect against such vulnerabilities.

#UTF #Unicode #Non-ASCII #Hexadecimal Almost AI-free