Wednesday, March 4, 2009

Unicode characterset in Oracle database.

Before starting this post let's have an idea about unicode. Unicode is a Universal encoding scheme which is designed to include far more characters than the normal character set, in fact, Unicode wants to be able to list ALL characters. So, with unicode support in oracle data from any languages can be stored and retrieved from oracle.

Oracle supports unicode within many of the character sets starting from Oracle 7.

Below is the list of character sets that is used to support unicode in oracle.

1)AL24UTFFSS: This character set was the first Unicode character set supported by Oracle. The AL24UTFFSS encoding scheme was based on the Unicode 1.1 standard, which is now obsolete. This unicode character set was used between oracle version 7.2 to 8.1.


2)UTF8:
UTF8 was the UTF-8 encoded character set in Oracle8 and 8i. It followed the
Unicode 2.1 standard between Oracle 8.0 and 8.1.6, and was upgraded to Unicode
version 3.0 for oracle versions 8.1.7, 9i, 10g and 11g. If supplementary characters are inserted into in a UTF8 database encoded with Unicode version 3.0, then the actual data will be treated as 2 separate undefined characters, occupying 6 bytes in storage. So for fully support of supplementary characters use AL32UTF8 character set instead of UTF8.

3)UTFE: UTFE has the same properties as UTF8 on ASCII based platforms. As of UTF8 it is used in different oracle versions.


4)AL32UTF8:
This is the UTF-8 encoded character set introduced in Oracle9i.
In Oracle 9.2 AL32UTF8 implemented unicode 3.1,
in 10.1 it implemented the Unicode 3.2 standard,
in Oracle 10.2 it supports the Unicode 4.01 standard and
in Oracle 11.1 it supports the Unicode 5.0.


AL32UTF8 was introduced to provide support for the newly defined supplementary characters. All supplementary characters are stored as 4 bytes in AL32UTF8. As while designed UTF-8 there was no concept of supplementary characters therefore UTF8 has a maximum of 3 bytes per character.

5)AL16UTF16: This is the first UTF-16 encoded character set in Oracle. It was introduced in Oracle9i as the default national character set (NLS_NCHAR_CHARACTERSET). It also provides support for the newly defined supplementary characters. All supplementary characters are stored as 4 bytes.
As with AL32UTF8, the plan is to keep enhancing AL16UTF16 as
necessary to support future version of the Unicode standard.

AL16UTF16 cannot be used as a database character set (NLS_CHARACTERSET), it is only used as the national character set (NLS_NCHAR_CHARACTERSET).

Like, AL32UTF8
In Oracle 9.0 AL16UTF16 implemented unicode 3.0,
in Oracle 9.2 it implemented unicode 3.1,
in 10.1 it implemented the Unicode 3.2 standard,
in Oracle 10.2 it supports the Unicode 4.01 standard and
in Oracle 11.1 it supports the Unicode 5.0.


Related Documents
What is NLS_LANG environmental variable?What is database character set and how to check it
Different ways to set up NLS parameters
What is national character set / NLS_NCHAR_CHARACTERSET?
Which datatypes use the National Character Set?
What is character set and character set encoding

No comments:

Post a Comment