This discussion has been locked.

You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Cyrillic scripts

Hi,

how is it possible to add Cyrillic scripts in one of my c-files (working with mdk version 3.4)?

const char text_cyrillic = {"Cyrillic script"};

best regards
Arne

Parents

0 David Helstedt over 16 years ago in reply to Mike Kleshov

it seems that uvision is not able to understand utf-8, which is very disappointing.

Babelstone is a free editor, which is able to store the text in utf-8 (with or without BOM). But uvision shows only a few strange characters....

This editor is also able to store the cyrillic text in your mentioned hexadecimal notation (\012 or \x12). The only problem is that the code size for the webpage will be enlarged.
Cancel
Vote up 0 Vote down

Cancel

Reply

0 David Helstedt over 16 years ago in reply to Mike Kleshov

it seems that uvision is not able to understand utf-8, which is very disappointing.

Babelstone is a free editor, which is able to store the text in utf-8 (with or without BOM). But uvision shows only a few strange characters....

This editor is also able to store the cyrillic text in your mentioned hexadecimal notation (\012 or \x12). The only problem is that the code size for the webpage will be enlarged.
Cancel
Vote up 0 Vote down

Cancel

Children

0 David Helstedt over 16 years ago in reply to David Helstedt

maybe the problem is, that uvision only supports the 8bit unicode and all cyrillic scripts starts at unicode (hex: Ð). So that you will need more than 8bits to show the correct letters.
Cancel
Vote up 0 Vote down

Cancel
0 David Helstedt over 16 years ago in reply to David Helstedt

hex 0x0410;
Cancel
Vote up 0 Vote down

Cancel
0 ImPer Westermark over 16 years ago in reply to David Helstedt
Hasn't that been the conclusion from post one?

But the nice thing with UTF8 is that a program that doesn't support UTF8 can normally work as a safe container for UTF8 text. You can't edit charactesr outside 7-bit ASCII, but you can load them and save them without destroying them. Opening a UTF8 file without a BOM in a plain 8-bit text editor will show all 7-bit characters as expected. For any extended character, you will get two, three or four "noise" characters displayed.

An example is what happens if I write national characters in this post - the Keil forum claims UTF8 support but doesn't.

Opening an UTF8 file in uvision could look like:

const char str[] = "This is a string with extended charactesr: Ã¥Ã¤Ã¶Ã…Ã„Ã–ÑŽÑÐ¸Ð¡ÐŽÎµÎ¸ÏŽÏŠÏŠÃÃ˜ÃžÃŸÅ'Å Å'ÈŸÉ³Î£Î˜Ê„Ï Óá¿·âˆâˆâ–¤â—â¡½ï¿½ï¿½"

As long as the compiler is 8-bit safe, this really doesn't matter. You will get a perfect UTF8 text stord in the character constant. The only thing that will not work is that strlen() will return number of non-zero bytes, instead of number of characters. Not important for sending out text to a web browser.
Cancel
Vote up 0 Vote down

Cancel
0 Mike Kleshov over 16 years ago in reply to David Helstedt

maybe the problem is, that uvision only supports the 8bit unicode

No, I think the problem is that uVision creators never thought that someone might want to use their text editor with different encodings.
I keep wondering: why everyone keeps reinventing their own text editor and/or IDE? There must be a better way, and I don't mean Eclipse...
Cancel
Vote up 0 Vote down

Cancel
0 David Helstedt over 16 years ago in reply to ImPer Westermark

>Opening an UTF8 file in uvision could look like:
correct - your example is very similar.

As long as the compiler is 8-bit safe, this really doesn't matter. You will get a perfect UTF8 text stord in the character constant. The only thing that will not work is that strlen() will return number of non-zero bytes, instead of number of characters. Not important for sending out text to a web browser.

I know what you mean, uvision don't erase any kind of information but is not able to interpret the text correct (unicode format).

But the web browser also shows these characters (from your example - not the correct unicode). I've tested the page in IE7 and firefox. Other pages in the www using cyrillic scripts, will be shown in the correct way. And of course I'm using the content-header with charset utf-8 (http header).

So the only thing I don't understand is that the web browser won't show the correct unicode.
Cancel
Vote up 0 Vote down

Cancel
0 ImPer Westermark over 16 years ago in reply to David Helstedt

Have you made really sure that the web pages that gets sent out specifies the UTF8 encoding? If they don't, then the web browser will not know that there are any UTF8 multi-byte characters to display. It may default to the ISO-8859-1 character set instead.

It really is important to note that a byte is just a binary storage cell capable of storing a value between 0 and 255. To display one or more bytes as specific characters, you must make sure that the renderer is informed about what character set to use, and also supports it.

The support for different character sets in uvision is irrelevant in relation to your possibilities of selecting character sets for use in the web browser.

In short: You must make sure that
1) The web page data contains UTF8 data.
2) The web page mentions that it is using UTF8 data.
Cancel
Vote up 0 Vote down

Cancel
0 John Linq over 16 years ago in reply to Mike Kleshov

I don't think I really understand this thread/subject. Just want to provide some information. (might be totally useless.)

====================================
RealView Compiler Reference Guide
Character sets and identifiers

www.keil.com/.../armccref_cihdigag.htm

# Source files are compiled according to the currently selected locale. You might have to select a different locale, with the --locale command-line option, if the source file contains non-ASCII characters. See Invoking the ARM compiler in the Compiler User Guide for more information.

# The ARM compiler supports multibyte character sets, such as Unicode.

# Other properties of the source character set are host-specific.
====================================
--multibyte_chars, --no_multibyte_chars

www.keil.com/.../armccref_CHDCECBH.htm

This option enables or disables processing for multibyte character sequences in comments, string literals, and character constants.
====================================

Cyrillic, are basically the characters from ISO 8859-5 moved upward by 864 positions.
( en.wikipedia.org/.../Cyrillic_characters_in_Unicode )

I guess that, host supports ISO 8859-5, but does not support Cyrillic; so thought KEIL supports unicode, but not able to support Cyrillic.
Cancel
Vote up 0 Vote down

Cancel