This discussion has been locked.

You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Printf problem with special characters like: æ Æ Ø ø å Â

Hello everyone,

I have a problem storing special characters with uVision editor.
When I store the following line which stand in a example.c file:

 printf("æ Æ Ø ø å Â");

it is converted to incorrect ASCII values.

I can solved the problem by replacing the line and use the correct ASCII value as hex codes:

 printf("\x91 \x92 \x9D \x9B \x86 \xB6");

Then of course it is working fine but not very readable.

Why converts uVision these special characters and why do I not have this problem with other C-editors like UltaEdit?

(I use uVision2 v2.4)

Parents

0 Andy Neil over 20 years ago

"I have a problem storing special characters with uVision editor."

No. You have a misunderstanding of exactly what ASCII defines and does not define.
This is nothing to do with Keil, uVision, 'C', printf, ...

"it is converted to incorrect ASCII values."

They are not valid ASCII codes at all - so what you actually get is undefined!

ASCII defines only code 0-127 (7 bits); there are many extensions to ASCII that use code 128-255 - but you need to define precisely which extension you're using if you want to be sure of what you'll get.

printf does not directly support this.
Cancel
Up 0 Down

Cancel

Reply

0 Andy Neil over 20 years ago

"I have a problem storing special characters with uVision editor."

No. You have a misunderstanding of exactly what ASCII defines and does not define.
This is nothing to do with Keil, uVision, 'C', printf, ...

"it is converted to incorrect ASCII values."

They are not valid ASCII codes at all - so what you actually get is undefined!

ASCII defines only code 0-127 (7 bits); there are many extensions to ASCII that use code 128-255 - but you need to define precisely which extension you're using if you want to be sure of what you'll get.

printf does not directly support this.
Cancel
Up 0 Down

Cancel

Children

0 Mik Kleshov over 20 years ago in reply to Andy Neil

I think it has something to do with the so called translation environment and execution environment, as defined in C standards. The ISO C 1999 standard only requires the toolset to implement 92 printable characters (with no mention of the ASCII encoding). Anything beyond that is an extension of the standard. The developers of the toolset are free to implement those extensions as they please.
Hopefully this answers the "Why ... ?" question.

Regards,
- mike
Cancel
Up 0 Down

Cancel
0 erik malund over 20 years ago in reply to Mik Kleshov

The codes for (and even the presence of) ("æ Æ Ø ø å Â") vary from keyboard to keyboard. In countries where they are commonly used they are implemented as substitutions (if my memory serves me correctly in Denmark '$' print as "Ø") Anything abve 7f and certainly æ Æ Ø ø å Â is NOT ASCII (remember 'A' im ASCII stan for 'American').

Erik
Cancel
Up 0 Down

Cancel
0 Andy Neil over 20 years ago in reply to Andy Neil

Further explanation on why this happens (due to the many different symbol sets available):
http://www.8052.com/forum/read.phtml?id=61805
Cancel
Up 0 Down

Cancel
0 Andy Neil over 20 years ago in reply to Andy Neil

If you have MS Internet Explorer 6, right-click on this page, then click 'Encoding' in the pop-up menu, then try choosing some different encodings.
You will see that the characters displayed change with different encodings; that's nothing to do with Keil, nothing to do with their compiler, nothing to do with the 'C' programming language, and nothing to do with printf.

As I said originally, codes >127 are not defined by ASCII - so if you want a specific interpretation, you are going to have to specify that interpretation as well as sending the codes.
How you do this will depend on your output device; eg, you may need to send an escape sequence to select the required symbol set.
Cancel
Up 0 Down

Cancel
0 Michel Ketelaars over 20 years ago in reply to Andy Neil

First of all, I thank everyone for there reactions concerning my problem.

Andy, You are right about the thing that it had nothing to do with printf and the C-programming language. And the story about >127 is clear to me.

I did more research on it and found out that when I type a special character in uVision by pressing for example:

[Alt-145] (0x91) gives me the character: æ.

When I store my file example.c and watch it with a Hex-edittor I see character: µ which is my case [Alt-230] (0xE6) in stead of æ.

So, with other words uVision stores 'keyboard-pressed' [Alt-145] to 'file viewing' [Alt-230].

Maby it was already clear to you, but if not, this is what is happening.
Cancel
Up 0 Down

Cancel
0 HansBernhard Broeker over 20 years ago in reply to Michel Ketelaars

When I store my file example.c and watch it with a Hex-edittor I see character: µ which is my case [Alt-230] (0xE6) in stead of .

If it's a hex-editor you're looking in, then you shouldn't be seeing 'µ' at all --- you should be seeing 0xe6 itself. What character that display as depends on the non-ASCII encoding your hex-editor is configured to use.

So, with other words uVision stores 'keyboard-pressed' [Alt-145] to 'file viewing' [Alt-230].

No, it doesn't. You still don't get it, it seems. You still believe that the key on your keyboard labelled 'æ' must always generate the character code 145 (0x91). But that belief is wrong. Pressing that key will generate whatever code æ has in the encoding Windows is using as you type that. In the case at hand, that happens to be 230 (0xe6).

I.e. the keyboard-pressed code was 230 (0xe6) already.

This has nothing in particular to do with Keil tools. You'll almost certainly have the same behaviour if you edit your source file in any other Windows editor.

To make matters worse, Windows can use independent, and *different* non-ASCII character encodings for DOS and native Windows programs, and particularly the DOS encodings (called "code pages") are compatible to nothing else but DOS itself.

Try looking at the same file with non-ASCII characters, both in Windows' "notepad.exe" and DOS "edit.com", to see the differences.
Cancel
Up 0 Down

Cancel
0 John Donaldson over 20 years ago in reply to HansBernhard Broeker

"So, with other words uVision stores 'keyboard-pressed' [Alt-145] to 'file viewing' [Alt-230].

No, it doesn't. You still don't get it, it seems. You still believe that the key on your keyboard labelled 'æ' must always generate the character code 145 (0x91). But that belief is wrong."

No, that is *not* what he is saying. He is saying, quite correctly, that if he holds down the ALT key in uvision then presses the keys '1' followed by '4' followed by '5' on his numeric keypad he sees the character 'æ'. If he than saves the file and views it in a hex editor he sees that the value 0xE6 (230) has been stored rather than the value 0x91 (145).

Stefan
Cancel
Up 0 Down

Cancel
0 Michel Ketelaars over 20 years ago in reply to HansBernhard Broeker

You still believe that the key on your keyboard labelled 'æ' must always generate the character code 145 (0x91).

I do not have a key labeled 'æ'on my keyboard. I generate this character by pressing [Alt-145]. So Yes I expect that it is stored as code 145, but obvious it is not.

I try your experment and yes, you are right.
When I press [Alt-145] in "notepad.exe", store it and look it in with "edit.com". It gives code 230 (0xE6).

So your conclusion "It has nothing to do with Keil" is correct.

This leads me to the question "Is there a way to make DOS and Windows codetables the same?".
Cancel
Up 0 Down

Cancel