Encoding_Aliases | |
Encoding_Info | |
Endianness_Shift | |
Normalization_Map |
Character_Width | |
Encoding_ID |
Character_Encoding | |
Encoding_Alias | |
Encoding_Data | |
Known_Character_Encoding | |
Latin_1 | |
UCS_2 | |
UCS_2_BE | |
UCS_2_LE | |
UCS_4 | |
UCS_4_BE | |
UCS_4_LE | |
Unified_Encoding_Record | |
US_ASCII | |
UTF_16 | |
UTF_16_BE | |
UTF_16_LE | |
UTF_8 |
= | |
ASCII_Compatible | |
Encoding_By_Name | |
Fixed_Width | |
Name | |
Subset_Of_UCS_2 | |
Width |
517 | Encoding_Aliases : constant |
---|---|
518 | array (Positive range <>) of Encoding_Alias := |
519 | [(To_Unbounded_String ("ANSI-X3.4-1968"), ASCII_ID), |
520 | (To_Unbounded_String ("ASCII"), ASCII_ID), |
521 | (To_Unbounded_String ("CP-850"), CP_850_ID), |
522 | (To_Unbounded_String ("CP850"), CP_850_ID), |
523 | (To_Unbounded_String ("IBM850"), CP_850_ID), |
524 | (To_Unbounded_String ("ISO-8859-1"), Latin_1_ID), |
525 | (To_Unbounded_String ("LATIN-1"), Latin_1_ID), |
526 | (To_Unbounded_String ("LATIN1"), Latin_1_ID), |
527 | (To_Unbounded_String ("UCS-2BE"), UCS_2_BE_ID), |
528 | (To_Unbounded_String ("UCS-2LE"), UCS_2_LE_ID), |
529 | (To_Unbounded_String ("UCS-4BE"), UCS_4_BE_ID), |
530 | (To_Unbounded_String ("UCS-4LE"), UCS_4_LE_ID), |
531 | (To_Unbounded_String ("US-ASCII"), ASCII_ID), |
532 | (To_Unbounded_String ("UTF-16BE"), UTF_16_BE_ID), |
533 | (To_Unbounded_String ("UTF-16LE"), UTF_16_LE_ID), |
534 | (To_Unbounded_String ("UTF-32BE"), UCS_4_BE_ID), |
535 | (To_Unbounded_String ("UTF-32LE"), UCS_4_LE_ID), |
536 | (To_Unbounded_String ("UTF-8"), UTF_8_ID), |
537 | (To_Unbounded_String ("WINDOWS-1252"), Windows_1252_ID)]; |
499 | Encoding_Info : constant array (Encoding_ID) of Encoding_Data := |
---|---|
500 | [(To_Unbounded_String ("UCS-4BE"), 12001, 4, False, False), |
501 | (To_Unbounded_String ("UCS-4LE"), 12000, 4, False, False), |
502 | (To_Unbounded_String ("UCS-2BE"), 1201, 2, False, True), |
503 | (To_Unbounded_String ("UCS-2LE"), 1200, 2, False, True), |
504 | (To_Unbounded_String ("UTF-16BE"), 0, 0, False, False), |
505 | (To_Unbounded_String ("UTF-16LE"), 0, 0, False, False), |
506 | (To_Unbounded_String ("UTF-8"), 65001, 0, True, False), |
507 | (To_Unbounded_String ("US-ASCII"), 20127, 1, True, True), |
508 | (To_Unbounded_String ("ISO-8859-1"), 28591, 1, True, True), |
509 | (To_Unbounded_String ("IBM850"), 850, 1, True, True), |
510 | (To_Unbounded_String ("WINDOWS-1252"), 1252, 1, True, True)]; |
574 | Endianness_Shift : constant array (System.Bit_Order) of Natural := |
---|---|
575 | [System.High_Order_First => 0, |
576 | System.Low_Order_First => 1]; |
539 | Normalization_Map : constant Character_Mapping := |
---|---|
540 | To_Mapping |
541 | (From => "abcdefghijklmnopqrstuvwxyz_", |
542 | To => "ABCDEFGHIJKLMNOPQRSTUVWXYZ-"); |
421 | subtype Character_Width is Positive range 1 .. 4; |
---|
486 | type Encoding_ID is ( |
---|---|
487 | UCS_4_BE_ID, |
488 | UCS_4_LE_ID, |
489 | UCS_2_BE_ID, |
490 | UCS_2_LE_ID, |
491 | UTF_16_BE_ID, |
492 | UTF_16_LE_ID, |
493 | UTF_8_ID, |
494 | ASCII_ID, |
495 | Latin_1_ID, |
496 | CP_850_ID, |
497 | Windows_1252_ID); |
413 | type Character_Encoding (Known : Boolean := False) is private; |
---|
512 | type Encoding_Alias is record |
---|---|
513 | Name : Unbounded_String; |
514 | ID : Encoding_ID; |
515 | end record; |
478 | type Encoding_Data is record |
---|---|
479 | Name : Unbounded_String; |
480 | Windows_Number : Unsigned_16; |
481 | Width : Natural range 0 .. 4; -- 0 for variable |
482 | ASCII_Compatible : Boolean; |
483 | Subset_Of_UCS_2 : Boolean; |
484 | end record; |
0 for variable
415 | subtype Known_Character_Encoding is Character_Encoding (True); |
---|
424 | UCS_4_BE, UCS_4_LE, UCS_4, UCS_2_BE, UCS_2_LE, UCS_2, UTF_16_BE, |
---|---|
425 | UTF_16_LE, UTF_16, UTF_8, US_ASCII, Latin_1 : constant |
426 | Known_Character_Encoding; |
These are the encodings that this package knows some characteristics of. They are therefore handled more efficiently than other encodings in some situations. "BE" and "LE" means big-endian and little-endian. UCS_4, UCS_2 and UTF_16 will have the machine's native byte order.
544 | type Unified_Encoding_Record (Known : Boolean; OS : Known_OS) is record |
---|---|
545 | case Known is |
546 | when True => |
547 | Which : Encoding_ID; |
548 | when False => |
549 | case OS is |
550 | when Linux | MacOS => |
551 | Name : Unbounded_String; |
552 | -- Names of encodings are confined to ASCII. |
553 | when Windows => |
554 | Number : Unsigned_16; |
555 | end case; |
556 | end case; |
557 | end record; |
Names of encodings are confined to ASCII.
432 | overriding function "=" (Left, Right : Character_Encoding) return Boolean; |
---|
Redefining "=" shouldn't be necessary, but it works around a bug in GCC-Gnat 3.4.0.
452 | function ASCII_Compatible |
---|---|
453 | (Encoding : Character_Encoding) |
454 | return Boolean; |
Returns true if and only if Encoding is a Known_Character_Encoding and is compatible with ASCII so that all the characters in ASCII are encoded with the same numbers in this encoding as in ASCII, and any valid ASCII text is also a valid text in this encoding.
440 | function Encoding_By_Name (Name : String) return Character_Encoding; |
---|
442 | function Fixed_Width (Encoding : Character_Encoding) return Boolean; |
---|
Returns true if Encoding is a Known_Character_Encoding and uses the same number of bytes for all characters, false otherwise.
436 | function Name |
---|---|
437 | (Encoding : Character_Encoding) |
438 | return Unbounded_String; |
460 | function Subset_Of_UCS_2 |
---|---|
461 | (Encoding : Character_Encoding) |
462 | return Boolean; |
Returns true if and only if Encoding is a Known_Character_Encoding and all the characters that can be represented in this encoding can also be represented in UCS-2 (but mostly with different numbers). (Note that UCS-2 itself fits the criteria.)
446 | function Width |
---|---|
447 | (Encoding : Known_Character_Encoding) |
448 | return Character_Width; |
Returns the number of bytes used for one character in this encoding. Propagates Constraint_Error if the character width is variable.