Convert codepages back to UTF8

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

Convert codepages back to UTF8

AlexeyT
LazUtils.LConvEncoding can convert utf8 to codepage (not many codepages)
and vice versa.

FPC 3 can convert utf8 to codepage - via SetCodePage(s, codepage, true).
But how can FPC convert back - codepage to utf8? Does such way exist?

--
Regards,
Alexey

_______________________________________________
fpc-pascal maillist  -  [hidden email]
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
Reply | Threaded
Open this post in threaded view
|

Re: Convert codepages back to UTF8

Free Pascal - General mailing list
Alexey Tor. <[hidden email]> schrieb am Mo., 27. Mai 2019, 13:15:
LazUtils.LConvEncoding can convert utf8 to codepage (not many codepages)
and vice versa.

FPC 3 can convert utf8 to codepage - via SetCodePage(s, codepage, true).
But how can FPC convert back - codepage to utf8? Does such way exist?

Use CP_UTF8 as code page for SetCodePage or assign the string to a UTF8String variable. 

Regards, 
Sven 

_______________________________________________
fpc-pascal maillist  -  [hidden email]
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
Reply | Threaded
Open this post in threaded view
|

Re: Convert codepages back to UTF8

Martok
Am 27.05.2019 um 14:30 schrieb Sven Barth via fpc-pascal:

> Alexey Tor. <[hidden email]
> <mailto:[hidden email]>> schrieb am Mo., 27. Mai 2019, 13:15:
>
>     LazUtils.LConvEncoding can convert utf8 to codepage (not many codepages)
>     and vice versa.
>
>     FPC 3 can convert utf8 to codepage - via SetCodePage(s, codepage, true).
>     But how can FPC convert back - codepage to utf8? Does such way exist?
>
>
> Use CP_UTF8 as code page for SetCodePage or assign the string to a UTF8String
> variable.

Although be advised that if your SystemCodePage is not a Unicode codepage, there
will be data loss due to (sometimes unexpected) internal conversions, regardless
of the current dynamic string code page.


--
Regards,
Martok


_______________________________________________
fpc-pascal maillist  -  [hidden email]
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
Reply | Threaded
Open this post in threaded view
|

Re: Convert codepages back to UTF8

Graeme Geldenhuys-6
On 27/05/2019 2:13 pm, Martok wrote:
> there
> will be data loss due to (sometimes unexpected) internal conversions,

Surely that must be a bug then.  Converting anything to a UTF-x encoding
should be lossless as Unicode is the only standard that supports ALL
languages.

Regards,
  Graeme

--
fpGUI Toolkit - a cross-platform GUI toolkit using Free Pascal
http://fpgui.sourceforge.net/

My public PGP key:  http://tinyurl.com/graeme-pgp
_______________________________________________
fpc-pascal maillist  -  [hidden email]
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
Reply | Threaded
Open this post in threaded view
|

Re: Convert codepages back to UTF8

Free Pascal - General mailing list
In reply to this post by Martok
Martok <[hidden email]> schrieb am Mo., 27. Mai 2019, 15:14:
Am 27.05.2019 um 14:30 schrieb Sven Barth via fpc-pascal:
> Alexey Tor. <[hidden email]
> <mailto:[hidden email]>> schrieb am Mo., 27. Mai 2019, 13:15:
>
>     LazUtils.LConvEncoding can convert utf8 to codepage (not many codepages)
>     and vice versa.
>
>     FPC 3 can convert utf8 to codepage - via SetCodePage(s, codepage, true).
>     But how can FPC convert back - codepage to utf8? Does such way exist?
>
>
> Use CP_UTF8 as code page for SetCodePage or assign the string to a UTF8String
> variable.

Although be advised that if your SystemCodePage is not a Unicode codepage, there
will be data loss due to (sometimes unexpected) internal conversions, regardless
of the current dynamic string code page.

As Graeme wrote that shouldn't be the case when converting to UTF-8. And for everything else you need to either use string variables with the correct static encoding or RawByteString to avoid conversions. 

Regards, 
Sven 

_______________________________________________
fpc-pascal maillist  -  [hidden email]
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
Reply | Threaded
Open this post in threaded view
|

Re: Convert codepages back to UTF8

Martok
>     Although be advised that if your SystemCodePage is not a Unicode codepage, there
>     will be data loss due to (sometimes unexpected) internal conversions, regardless
>     of the current dynamic string code page.
>
>
> As Graeme wrote that shouldn't be the case when converting to UTF-8. And for
> everything else you need to either use string variables with the correct static
> encoding or RawByteString to avoid conversions.

As I wrote: "if your SystemCodePage is not a Unicode codepage". If it is,
everything mostly works.
And even RawByteString gets unexpected roundtrip-conversions on some operations,
which breaks in funny ways if the SystemCodePage can't represent some characters
in the RBS. I once spent most of a day debugging seemingly random data
corruption until I realized the corrupted bytes were #$81, #$90 etc and the
non-LCL program used CP 1252.

More interesting for Alexey regarding the followup question: the result of any
string operation is in the DefaultSystemCodepage, such as:

  s:= 'abc';
  SetCodePage(RawByteString(s), CP_UTF8, true);
  WriteLn(s, ' ',StringCodePage(s));                   // abc 65001
  s:= s + 'd';
  WriteLn(s, ' ',StringCodePage(s));                   // abcd 1252


--
Regards,
Martok

_______________________________________________
fpc-pascal maillist  -  [hidden email]
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal