cwstrings unit and UTF8Decode()

classic Classic list List threaded Threaded
19 messages Options
Reply | Threaded
Open this post in threaded view
|

cwstrings unit and UTF8Decode()

Graeme Geldenhuys-6
Hi,

I'm using FPC 2.6.4 primarily. Am I correct in that UTF8Decode and most
(if not all) UTF8-to-UTF16 conversions don't function correctly (or not
at all) if you don't include the cwstrings unit in your project? I
referring to Unix-based OSes here. I believe Windows automatically
include the WideString Manager for you.

Regards,
  - Graeme -

_______________________________________________
fpc-pascal maillist  -  [hidden email]
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
Reply | Threaded
Open this post in threaded view
|

Re: cwstrings unit and UTF8Decode()

Bart-48
On 3/25/16, Graeme Geldenhuys <[hidden email]> wrote:

> I'm using FPC 2.6.4 primarily. Am I correct in that UTF8Decode and most
> (if not all) UTF8-to-UTF16 conversions don't function correctly (or not
> at all) if you don't include the cwstrings unit in your project? I
> referring to Unix-based OSes here.

If you're using LazUtf8 (or use LCL) then cwstring will be used in your app.
And I guess that Utf8ToUtf16 from Lazutf8 does not depend on a WS
manager, but I may be terribly wrong about that.

Bart
_______________________________________________
fpc-pascal maillist  -  [hidden email]
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
Reply | Threaded
Open this post in threaded view
|

Re: cwstrings unit and UTF8Decode()

Michael Van Canneyt
In reply to this post by Graeme Geldenhuys-6


On Fri, 25 Mar 2016, Graeme Geldenhuys wrote:

> Hi,
>
> I'm using FPC 2.6.4 primarily. Am I correct in that UTF8Decode and most
> (if not all) UTF8-to-UTF16 conversions don't function correctly (or not
> at all) if you don't include the cwstrings unit in your project? I
> referring to Unix-based OSes here. I believe Windows automatically
> include the WideString Manager for you.
>

Yes, this is correct.

Michael.
_______________________________________________
fpc-pascal maillist  -  [hidden email]
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
Reply | Threaded
Open this post in threaded view
|

Re: cwstrings unit and UTF8Decode()

Michael Van Canneyt


On Fri, 25 Mar 2016, Michael Van Canneyt wrote:

>
>
> On Fri, 25 Mar 2016, Graeme Geldenhuys wrote:
>
>> Hi,
>>
>> I'm using FPC 2.6.4 primarily. Am I correct in that UTF8Decode and most
>> (if not all) UTF8-to-UTF16 conversions don't function correctly (or not
>> at all) if you don't include the cwstrings unit in your project? I
>> referring to Unix-based OSes here. I believe Windows automatically
>> include the WideString Manager for you.
>>
>
> Yes, this is correct.

Correction, this particular function does not depend on cwstrings.
All the other widestring (uppercase, compare etc) functions do.

Michael.
_______________________________________________
fpc-pascal maillist  -  [hidden email]
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
Reply | Threaded
Open this post in threaded view
|

Re: cwstrings unit and UTF8Decode()

Felipe Monteiro de Carvalho
In reply to this post by Bart-48
On Fri, Mar 25, 2016 at 1:20 PM, Bart <[hidden email]> wrote:
> If you're using LazUtf8 (or use LCL) then cwstring will be used in your app.
> And I guess that Utf8ToUtf16 from Lazutf8 does not depend on a WS
> manager, but I may be terribly wrong about that.

As far as I remember, lazutf8 doesn't depending on cwstring for
(most?) of its funcionality.

--
Felipe Monteiro de Carvalho
_______________________________________________
fpc-pascal maillist  -  [hidden email]
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
Reply | Threaded
Open this post in threaded view
|

Re: cwstrings unit and UTF8Decode()

Michael Van Canneyt


On Fri, 25 Mar 2016, Felipe Monteiro de Carvalho wrote:

> On Fri, Mar 25, 2016 at 1:20 PM, Bart <[hidden email]> wrote:
>> If you're using LazUtf8 (or use LCL) then cwstring will be used in your app.
>> And I guess that Utf8ToUtf16 from Lazutf8 does not depend on a WS
>> manager, but I may be terribly wrong about that.
>
> As far as I remember, lazutf8 doesn't depending on cwstring for
> (most?) of its funcionality.

Look at the sources

uses
   {$IFDEF UTF8_RTL}
   {$ifdef unix}
   cwstring, // UTF8 RTL on Unix requires this. Must be used although it  pulls in clib.
   {$endif}

Michael.
_______________________________________________
fpc-pascal maillist  -  [hidden email]
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
Reply | Threaded
Open this post in threaded view
|

Re: cwstrings unit and UTF8Decode()

Graeme Geldenhuys-6
In reply to this post by Bart-48
On 2016-03-25 12:20, Bart wrote:
> If you're using LazUtf8 (or use LCL) then cwstring will be used in your app.


I don't use LCL at all, pure RTL & FCL code only. Based on the fact that
LCL's code also requires "cwstrings" I assume my original assumptions is
correct, that if I want to do any UTF8-to-UTF16 conversions, use
UTF8Decode etc, my applications (or frameworks) require "cwstrings" for now.


Regards,
  - Graeme -

--
fpGUI Toolkit - a cross-platform GUI toolkit using Free Pascal
http://fpgui.sourceforge.net/

My public PGP key:  http://tinyurl.com/graeme-pgp
_______________________________________________
fpc-pascal maillist  -  [hidden email]
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
Reply | Threaded
Open this post in threaded view
|

Re: cwstrings unit and UTF8Decode()

Felipe Monteiro de Carvalho
In reply to this post by Michael Van Canneyt
On Fri, Mar 25, 2016 at 2:01 PM, Michael Van Canneyt
<[hidden email]> wrote:
> Look at the sources

Which proves me right, or do I miss something?

--
Felipe Monteiro de Carvalho
_______________________________________________
fpc-pascal maillist  -  [hidden email]
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
Reply | Threaded
Open this post in threaded view
|

Re: cwstrings unit and UTF8Decode()

Graeme Geldenhuys-6
In reply to this post by Michael Van Canneyt
On 2016-03-25 12:23, Michael Van Canneyt wrote:
>> > Yes, this is correct.
> Correction, this particular function does not depend on cwstrings.
> All the other widestring (uppercase, compare etc) functions do.


Ok, thanks for that.

Is there an easy way to see when a RTL function requires cwstrings to
function correctly? Is it mentioned in the RTL documentation? Is looking
at the RTL source code the only way to find that out?  Or does the
compiler in some way give a compilation hint that some RTL functions
will not function (because I might have left out cwstrings in a project).

Regards,
  - Graeme -

--
fpGUI Toolkit - a cross-platform GUI toolkit using Free Pascal
http://fpgui.sourceforge.net/

My public PGP key:  http://tinyurl.com/graeme-pgp
_______________________________________________
fpc-pascal maillist  -  [hidden email]
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
Reply | Threaded
Open this post in threaded view
|

Re: cwstrings unit and UTF8Decode()

Martin Schreiber-2
In reply to this post by Graeme Geldenhuys-6
On Friday 25 March 2016 14:48:18 Graeme Geldenhuys wrote:

> On 2016-03-25 12:20, Bart wrote:
> > If you're using LazUtf8 (or use LCL) then cwstring will be used in your
> > app.
>
> I don't use LCL at all, pure RTL & FCL code only. Based on the fact that
> LCL's code also requires "cwstrings" I assume my original assumptions is
> correct, that if I want to do any UTF8-to-UTF16 conversions, use
> UTF8Decode etc, my applications (or frameworks) require "cwstrings" for
> now.
>
You can use the MSEgui functions in lib/common/msestrings.pas (stringtoutf8(),
stringtoutf8ansi(), utf8tostring(), utf8tostringansi(). AFAIK both LCL and
Free Pascal RTL also have such functions.

Martin
_______________________________________________
fpc-pascal maillist  -  [hidden email]
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
Reply | Threaded
Open this post in threaded view
|

Re: cwstrings unit and UTF8Decode()

Marco van de Voort
In reply to this post by Graeme Geldenhuys-6
In our previous episode, Graeme Geldenhuys said:
> >> > Yes, this is correct.
> > Correction, this particular function does not depend on cwstrings.
> > All the other widestring (uppercase, compare etc) functions do.
>
> Ok, thanks for that.
 
> Is there an easy way to see when a RTL function requires cwstrings to
> function correctly? Is it mentioned in the RTL documentation? Is looking
> at the RTL source code the only way to find that out?

Yes, I think so. But in this case because utf8 to utf16 doesn't require
tables, it makes more sense it doesn't need some unicode library
implementation.

As soon as it starts interpreting/comparing/mutating characters, you need
tables, and those can be better taken from the OS (or be at least optional
for small files that only want to use sysutils to remove a file or so)
_______________________________________________
fpc-pascal maillist  -  [hidden email]
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
Reply | Threaded
Open this post in threaded view
|

Re: cwstrings unit and UTF8Decode()

Michael Van Canneyt
In reply to this post by Felipe Monteiro de Carvalho


On Fri, 25 Mar 2016, Felipe Monteiro de Carvalho wrote:

> On Fri, Mar 25, 2016 at 2:01 PM, Michael Van Canneyt
> <[hidden email]> wrote:
>> Look at the sources
>
> Which proves me right, or do I miss something?

"lazutf8 doesn't depending" when it is in the uses clause,
sounds a bit strange to me :-)

Michael.
_______________________________________________
fpc-pascal maillist  -  [hidden email]
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
Reply | Threaded
Open this post in threaded view
|

Re: cwstrings unit and UTF8Decode()

Graeme Geldenhuys-6
In reply to this post by Martin Schreiber-2
On 2016-03-25 14:06, Martin Schreiber wrote:
> You can use the MSEgui functions in lib/common/msestrings.pas

Thanks, but doesn't MSEgui also use cwstrings?

Regards,
  - Graeme -

_______________________________________________
fpc-pascal maillist  -  [hidden email]
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
Reply | Threaded
Open this post in threaded view
|

Re: cwstrings unit and UTF8Decode()

Martin Schreiber-2
On Friday 25 March 2016 15:37:36 Graeme Geldenhuys wrote:
> On 2016-03-25 14:06, Martin Schreiber wrote:
> > You can use the MSEgui functions in lib/common/msestrings.pas
>
> Thanks, but doesn't MSEgui also use cwstrings?
>
Not for utf-8 <-> utf-16 conversion. The MSEgui version of cwstring also maps
unicodemanager conversion functions with cp_utf8 to the internal MSEgui
functions instead to call iconv.

Martin
_______________________________________________
fpc-pascal maillist  -  [hidden email]
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
Reply | Threaded
Open this post in threaded view
|

Re: cwstrings unit and UTF8Decode()

Felipe Monteiro de Carvalho
In reply to this post by Michael Van Canneyt
On Fri, Mar 25, 2016 at 3:16 PM, Michael Van Canneyt
<[hidden email]> wrote:
> "lazutf8 doesn't depending" when it is in the uses clause, sounds a bit
> strange to me :-)

Important part you are forgetting: {$IFDEF UTF8_RTL}

I don't know why it is needed in the utf-8 RTL, since I haven't used
this RTL yet, but in the RTL that I am using it doesn't depend in that
unit :)

Anyway, what I meant is that the routines themselves are Pascal
implementations of the Unicode standard. We even have
uppercase/lowercase tables. So we depend as little as possible on
system stuff. More reliable, more cross-platform and some routines
actually are several times faster than system ones.

--
Felipe Monteiro de Carvalho
_______________________________________________
fpc-pascal maillist  -  [hidden email]
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
Reply | Threaded
Open this post in threaded view
|

Re: cwstrings unit and UTF8Decode()

Bart-48
On 3/25/16, Felipe Monteiro de Carvalho
<[hidden email]> wrote:

> Important part you are forgetting: {$IFDEF UTF8_RTL}
>
> I don't know why it is needed in the utf-8 RTL, since I haven't used
> this RTL yet, but in the RTL that I am using it doesn't depend in that
> unit :)

It's just a define to signal that all strings in LCL are UTF8 and when
offered to RTL their codepage is CP_UTF8.
Whe DisableUtf8RTL is defined than all strings are CP_ACP.

The name of the define may indeed be a little misleading, but it's short.
We have to cater for 3 different situations:
- default: we set DefaultSystemCodepage to CP_UTF8 (on Windows): UTF8_RTL
- DisableUtf8RTL defined: ACP_RTL
- Fpc without cp-string: NO_CP_RTL
(See ($lazarus)\components\lazutils\lazutils_defines.inc)

Bart
_______________________________________________
fpc-pascal maillist  -  [hidden email]
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
Reply | Threaded
Open this post in threaded view
|

Re: cwstrings unit and UTF8Decode()

Juha Manninen
On Fri, Mar 25, 2016 at 7:14 PM, Bart <[hidden email]> wrote:
> It's just a define to signal that all strings in LCL are UTF8 and when
> offered to RTL their codepage is CP_UTF8.

Not only in LCL. Package LazUtils / unit LazUTF8 can be used also without LCL.
  http://wiki.freepascal.org/Better_Unicode_Support_in_Lazarus#Using_UTF-8_in_non_LCL_programs

It means that when Graeme finally switches to FPC 3.x and he uses
LazUTF8 in his code, he gets cwstring as an extra bonus.

Regards,
Juha
_______________________________________________
fpc-pascal maillist  -  [hidden email]
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
Reply | Threaded
Open this post in threaded view
|

Re: cwstrings unit and UTF8Decode()

Graeme Geldenhuys-6
In reply to this post by Michael Van Canneyt
On 2016-03-25 12:23, Michael Van Canneyt wrote:
> Correction, this particular function does not depend on cwstrings.

When you say "this particular function" you are referring to the
UTF8Decode() function correct?

The documentation page for UTF8Decode has explicitly removed the
reference [that it requires a widestring manager] that was there before...

http://www.freepascal.org/docs-html/current/rtl/system/utf8decode.html

But, it does mention that it uses the low-level Utf8ToUnicode()
function. Now lets see that function's documentation.

http://www.freepascal.org/docs-html/current/rtl/system/utf8tounicode.html

And here it mentions that a widestring manager IS required for it to
function.

So if UTF8Decode depends on UTF8ToUnicode, then by definition UTF8Decode
also depends on a widestring manager.

Regards,
  - Graeme -

--
fpGUI Toolkit - a cross-platform GUI toolkit using Free Pascal
http://fpgui.sourceforge.net/

My public PGP key:  http://tinyurl.com/graeme-pgp
_______________________________________________
fpc-pascal maillist  -  [hidden email]
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
Reply | Threaded
Open this post in threaded view
|

Re: cwstrings unit and UTF8Decode()

Michael Van Canneyt


On Fri, 25 Mar 2016, Graeme Geldenhuys wrote:

> On 2016-03-25 12:23, Michael Van Canneyt wrote:
>> Correction, this particular function does not depend on cwstrings.
>
> When you say "this particular function" you are referring to the
> UTF8Decode() function correct?
>
> The documentation page for UTF8Decode has explicitly removed the
> reference [that it requires a widestring manager] that was there before...
>
> http://www.freepascal.org/docs-html/current/rtl/system/utf8decode.html
>
> But, it does mention that it uses the low-level Utf8ToUnicode()
> function. Now lets see that function's documentation.
>
> http://www.freepascal.org/docs-html/current/rtl/system/utf8tounicode.html
>
> And here it mentions that a widestring manager IS required for it to
> function.

This is wrong, I will correct that.

Encoding/Decoding UTF-8 to/from UTF16 is just shuffling bits.

Michael.
_______________________________________________
fpc-pascal maillist  -  [hidden email]
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal