Spec Justification For € To Ÿ In Utf-8 Documents Browser Behaviour Wanted

April 21, 2024 Post a Comment

The HTML 4.01 spec says for hexadecimal character references Numeric character references specify the code position of a character in the document character set. So if the docu

Solution 1:

I found the answer to my question. It's in the tokenization section of the parsing algorithm in HTML5 for consume a character reference, which defines the mapping for these characters.

Solution 2:

As I have done here as well, I'll quote Wikipedia again:

Numeric references always refer to Unicode code points, regardless of the page's encoding. Using numeric references that refer to permanently undefined characters and control characters is forbidden, with the exception of the linefeed, tab, and carriage return characters. That is, characters in the hexadecimal ranges 00–08, 0B–0C, 0E–1F, 7F, and 80–9F cannot be used in an HTML document, not even by reference, so , for example, is not allowed. However, for backward compatibility with early HTML authors and browsers that ignored this restriction, raw characters and numeric character references in the 80–9F range are interpreted by some browsers as representing the characters mapped to bytes 80–9F in the Windows-1252 encoding.

So it seems to be a legacy issue.

Learn Html5

Spec Justification For € To Ÿ In Utf-8 Documents Browser Behaviour Wanted

Solution 1:

Solution 2:

Post a Comment for "Spec Justification For € To Ÿ In Utf-8 Documents Browser Behaviour Wanted"