Re: Re: Ido not understand UTF8 in Windows

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

Re: Re: Ido not understand UTF8 in Windows

Luis Fernando Del Aguila Mejía
Ok,
The steps are :
1) Change the Font Type at Lucida Console Font : http://www.conoce3000.com/fig01.jpg
2) Compile the program with fpc 2.4. :
3) change to UTF8, with chcp 65001 command
4) run the program
 
 
 
The error ocurred only when use Lucida console Font.
 
 
 

_______________________________________________
fpc-pascal maillist  -  [hidden email]
http://lists.freepascal.org/mailman/listinfo/fpc-pascal
Reply | Threaded
Open this post in threaded view
|

Re[2]: Re: Ido not understand UTF8 in Windows

José Mejuto
Hello FPC-Pascal,

Friday, February 19, 2010, 3:22:15 AM, you wrote:

LFDAM> Ok,
LFDAM> The steps are :
LFDAM> 1) Change the Font Type at Lucida Console Font :
LFDAM> http://www.conoce3000.com/fig01.jpg
LFDAM> 2) Compile the program with fpc 2.4. :
LFDAM> 3) change to UTF8, with chcp 65001 command
LFDAM> 4) run the program
LFDAM> Step 01 to 04 : http://www.conoce3000.com/fig02.jpg
LFDAM> The source code : http://www.conoce3000.com/Prueba005.pp
LFDAM> The error ocurred only when use Lucida console Font.

The problem is that by default in Windows stdout is opened in 7 bits
mode (text), this means that writeln when emits some information in
UTF8 with any char >127 will enter in some kind of error reported from
rtl or Windows kernel.

I do not know how to inform the fpc to "reopen" the stdout handle in
binary mode, but you can "hack it" by (at least for tests):

-------------------------
uses classes,windows;

var
 s: string;
 OutputStream: TStream;
Begin
 Writeln('code page UTF8 - 65001 en Windows');
 OutputStream := THandleStream.Create(GetStdHandle(STD_OUTPUT_HANDLE));
 s:='camión';
 OutputStream.write(s[1],Length(s));
 OutputStream.free;
End.
-------------------------

--
Best regards,
 JoshyFun

_______________________________________________
fpc-pascal maillist  -  [hidden email]
http://lists.freepascal.org/mailman/listinfo/fpc-pascal
Reply | Threaded
Open this post in threaded view
|

Re: Re[2]: Re: Ido not understand UTF8 in Windows

Tomas Hajny
On 19 Feb 10, at 17:27, JoshyFun wrote:

> Friday, February 19, 2010, 3:22:15 AM, you wrote:
>
> LFDAM> Ok,
> LFDAM> The steps are :
> LFDAM> 1) Change the Font Type at Lucida Console Font :
> LFDAM> http://www.conoce3000.com/fig01.jpg
> LFDAM> 2) Compile the program with fpc 2.4. :
> LFDAM> 3) change to UTF8, with chcp 65001 command
> LFDAM> 4) run the program
> LFDAM> Step 01 to 04 : http://www.conoce3000.com/fig02.jpg
> LFDAM> The source code : http://www.conoce3000.com/Prueba005.pp
> LFDAM> The error ocurred only when use Lucida console Font.
>
> The problem is that by default in Windows stdout is opened in 7 bits
> mode (text), this means that writeln when emits some information in
> UTF8 with any char >127 will enter in some kind of error reported from
> rtl or Windows kernel.

No, this can't work that way, otherwise output of any accented
character in one of the Windows codepages would result in the same
error.

Tomas

_______________________________________________
fpc-pascal maillist  -  [hidden email]
http://lists.freepascal.org/mailman/listinfo/fpc-pascal
Reply | Threaded
Open this post in threaded view
|

Re[4]: Re: Ido not understand UTF8 in Windows

José Mejuto
Hello Tomas,

Friday, February 19, 2010, 11:55:39 PM, you wrote:

TH> No, this can't work that way, otherwise output of any accented
TH> character in one of the Windows codepages would result in the same
TH> error.

Well I do not know the writeln internals, but if writeln writes 7
bytes and windows returns -6 characters- output rtl could understand
that a write error happends.

--
Best regards,
 JoshyFun

_______________________________________________
fpc-pascal maillist  -  [hidden email]
http://lists.freepascal.org/mailman/listinfo/fpc-pascal
Reply | Threaded
Open this post in threaded view
|

Re[4]: Re: Ido not understand UTF8 in Windows

José Mejuto
In reply to this post by Tomas Hajny
Hello Tomas,

Friday, February 19, 2010, 11:55:39 PM, you wrote:

TH> No, this can't work that way, otherwise output of any accented
TH> character in one of the Windows codepages would result in the same
TH> error.

Tested the "wrong" return of stdout:

code page UTF8 - 65001 en Windows
Length of string: 7
camión -> Returned written: 6

Source code:
-------------------------------------
uses classes,windows;
var
 s: ansistring;
 OutputStream: TStream;
Begin
 Writeln('code page UTF8 - 65001 en Windows');
 OutputStream := THandleStream.Create(GetStdHandle(STD_OUTPUT_HANDLE));
 s:='cami'+#$C3+#$B3+'n'; //camión
 writeln('Length of string: ',Length(s));
 writeln(' -> Returned written: ',OutputStream.write(s[1],Length(s)));
 OutputStream.free;
End.


--
Best regards,
 JoshyFun

_______________________________________________
fpc-pascal maillist  -  [hidden email]
http://lists.freepascal.org/mailman/listinfo/fpc-pascal
Reply | Threaded
Open this post in threaded view
|

Re: Re[4]: Re: Ido not understand UTF8 in Windows

Tomas Hajny
On Sat, February 20, 2010 01:15, JoshyFun wrote:

> Hello Tomas,
>
> Friday, February 19, 2010, 11:55:39 PM, you wrote:
>
> TH> No, this can't work that way, otherwise output of any accented
> TH> character in one of the Windows codepages would result in the same
> TH> error.
>
> Tested the "wrong" return of stdout:
>
> code page UTF8 - 65001 en Windows
> Length of string: 7
> camión -> Returned written: 6
>
> Source code:
> -------------------------------------
> uses classes,windows;
> var
>  s: ansistring;
>  OutputStream: TStream;
> Begin
>  Writeln('code page UTF8 - 65001 en Windows');
>  OutputStream := THandleStream.Create(GetStdHandle(STD_OUTPUT_HANDLE));
>  s:='cami'+#$C3+#$B3+'n'; //camión
>  writeln('Length of string: ',Length(s));
>  writeln(' -> Returned written: ',OutputStream.write(s[1],Length(s)));
>  OutputStream.free;
> End.

OK, this seems to be the problem. The underlying Win32 API (WriteFile) is
requested to write 7 bytes to a file. However those 7 bytes correspond to
only 6 characters in UTF-8, and the Win32 API (apparently) returns the
number of written _characters_ rather than the number of written _bytes_.
The Windows implementation of do_write (which is an internal wrapper
around the platform specific API for writing to a file) currently assumes
that the returned number is again number of bytes (equally to the provided
parameter), which is OK for simple single byte codepages, but not OK for
UTF-8, and it returns this number without any changes. The System routine
for file I/O compares the number of bytes requested to be written to the
number returned as actually written and they do not match, it is
interpreted as an I/O error.

Please, post a bug report about this. I guess that fixing it may require
little bit more thinking. One simple way to fix it would be just changing
the Windows implementation of do_write so that it only checks for an error
value returned by WriteFile and if no error is indicated, the original
length of buffer is returned regardless of the value returned by
WriteFile. However, the information about the actually written
_characters_ may be useful in certain cases, so I'm not sure if it isn't
better to preserve it somehow and possibly extend implementation for other
platforms to also get this value.

Tomas


_______________________________________________
fpc-pascal maillist  -  [hidden email]
http://lists.freepascal.org/mailman/listinfo/fpc-pascal
Reply | Threaded
Open this post in threaded view
|

Re: Re[4]: Re: Ido not understand UTF8 in Windows

Michael Van Canneyt


On Sat, 20 Feb 2010, Tomas Hajny wrote:

> On Sat, February 20, 2010 01:15, JoshyFun wrote:
>> Hello Tomas,
>>
>> Friday, February 19, 2010, 11:55:39 PM, you wrote:
>>
>> TH> No, this can't work that way, otherwise output of any accented
>> TH> character in one of the Windows codepages would result in the same
>> TH> error.
>>
>> Tested the "wrong" return of stdout:
>>
>> code page UTF8 - 65001 en Windows
>> Length of string: 7
>> camión -> Returned written: 6
>>
>> Source code:
>> -------------------------------------
>> uses classes,windows;
>> var
>>  s: ansistring;
>>  OutputStream: TStream;
>> Begin
>>  Writeln('code page UTF8 - 65001 en Windows');
>>  OutputStream := THandleStream.Create(GetStdHandle(STD_OUTPUT_HANDLE));
>>  s:='cami'+#$C3+#$B3+'n'; //camión
>>  writeln('Length of string: ',Length(s));
>>  writeln(' -> Returned written: ',OutputStream.write(s[1],Length(s)));
>>  OutputStream.free;
>> End.
>
> OK, this seems to be the problem. The underlying Win32 API (WriteFile) is
> requested to write 7 bytes to a file. However those 7 bytes correspond to
> only 6 characters in UTF-8, and the Win32 API (apparently) returns the
> number of written _characters_ rather than the number of written _bytes_.
I fail to see how this can be an FPC problem.

See

http://msdn.microsoft.com/en-us/library/aa365747(VS.85).aspx
and
http://msdn.microsoft.com/en-us/library/aa363858(VS.85).aspx

For an explanation. It states clearly that the number of bytes is returned.
If it does return the number of characters, then that is a bug in the Microsoft call,
not in FPC.

Michael.
_______________________________________________
fpc-pascal maillist  -  [hidden email]
http://lists.freepascal.org/mailman/listinfo/fpc-pascal
Reply | Threaded
Open this post in threaded view
|

Re[6]: Re: Ido not understand UTF8 in Windows

José Mejuto
Hello FPC-Pascal,

Saturday, February 20, 2010, 3:21:51 PM, you wrote:

MVC> I fail to see how this can be an FPC problem.
MVC> See
MVC> http://msdn.microsoft.com/en-us/library/aa365747(VS.85).aspx
MVC> and
MVC> http://msdn.microsoft.com/en-us/library/aa363858(VS.85).aspx
MVC> For an explanation. It states clearly that the number of bytes is returned.
MVC> If it does return the number of characters, then that is a bug in the Microsoft call,
MVC> not in FPC.

Yes, it states that lpNumberOfBytesWritten returns the amount of
written _bytes_ which clearly fails in this case, but also states that
if return value is non-zero no error happends :-? Rewritting using
plain WinAPI outputs:

------------------------------------------

code page UTF8 - 65001 en Windows
Length of string: 7
camión
Retcode: TRUE Returned written bytes: 6

------------------------------------------
uses classes,windows;
var
 s: ansistring;
 OutputH: THandle;
 retcode: LongBool;
 writ: LongWord;
Begin
 Writeln('code page UTF8 - 65001 en Windows');
 OutputH :=  GetStdHandle(STD_OUTPUT_HANDLE);
 s:='cami'+#$C3+#$B3+'n';
 writeln('Length of string: ',Length(s));
 retcode:=WriteFile(OutputH,s[1],Length(s),writ,nil);
 writeln('');
 writeln('Retcode: ',retcode,' Returned written bytes: ',writ);
End.
------------------------------------------

--
Best regards,
 JoshyFun

_______________________________________________
fpc-pascal maillist  -  [hidden email]
http://lists.freepascal.org/mailman/listinfo/fpc-pascal