Weird string behavior

classic Classic list List threaded Threaded
37 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Weird string behavior

Santiago A.
Hello:

I'm working on windows XP, FPC 3.0.0 from stable Lazarus 1.6.

I've come across this issue: When I concatenate two strings in UTF8 they
are converted to ansi (Win-1252) .
A bug?
Am I missing something?

I have attached a demo.


--
Saludos

Santiago A.


_______________________________________________
fpc-pascal maillist  -  [hidden email]
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal

testconvertstr.lpr (1K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Weird string behavior

Bart-48
On 7/21/16, Santiago A. <[hidden email]> wrote:

> I've come across this issue: When I concatenate two strings in UTF8 they
> are converted to ansi (Win-1252) .

You have declared all string variables as plain "string", which is the
same as AnsiString(CP_ACP). So all string variables have the encoding
of your active codepage.

Declare Utf8StrA and related as Utf8String.
In DisplayBytes do not use "String" as parametertype, since this will
again automatically convert things.
The AnsiToUtf8 is not necessary anymore if done this way:

procedure DisplayBytes(S:RawByteString);
var
  i:Integer;
begin
  Write('  ');
  for i:=1 to length(s) do
    write(ord(s[i]),' ');
  writeln;
end;

//-----------------------------------
// body
//-----------------------------------
var
  AnsiStrA:string;
  AnsiStrB:string;
  Utf8StrA: utf8string;
  Utf8StrB:utf8string;
  Utf8StrConcat:utf8string;
begin
  AnsiStrA:=' ';
  AnsiStrA[1]:=#243; // o acute win-1252
  AnsiStrB:='A';

  Write('AnsiStrA: ');DisplayBytes(AnsiStrA); // 243
  Write('AnsiStrB: ');DisplayBytes(AnsiStrB); // 65


  Utf8StrA:=(AnsiStrA); // 195 179
  Utf8StrB:=(AnsiStrB); // 65

  writeln;
  Write('Utf8StrA: ');DisplayBytes(Utf8StrA); // 195 179
  Write('Utf8StrB: ');DisplayBytes(Utf8StrB); // 65

  Write('Utf8StrA+Utf8StrB: ');DisplayBytes(Utf8StrA+Utf8StrB);

  writeln;
  Write('Utf8StrA again: ');DisplayBytes(Utf8StrA); // 195 179
  Write('Utf8StrB again: ');DisplayBytes(Utf8StrB); // 65


  Utf8StrConcat:=Utf8StrA+Utf8StrB;
  writeln;
  Write('Utf8StrConcat: ');DisplayBytes(Utf8StrConcat);
end.

Bart
_______________________________________________
fpc-pascal maillist  -  [hidden email]
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
Reply | Threaded
Open this post in threaded view
|

Re: Weird string behavior

Santiago A.
El 22/07/2016 a las 0:32, Bart escribió:

> On 7/21/16, Santiago A. <[hidden email]> wrote:
>
>> I've come across this issue: When I concatenate two strings in UTF8 they
>> are converted to ansi (Win-1252) .
> You have declared all string variables as plain "string", which is the
> same as AnsiString(CP_ACP). So all string variables have the encoding
> of your active codepage.
>
> Declare Utf8StrA and related as Utf8String.
> In DisplayBytes do not use "String" as parametertype, since this will
> again automatically convert things.
> The AnsiToUtf8 is not necessary anymore if done this way:

var
  AnsiStrA:string;  // AnsiString(CP_ACP)
  AnsiStrB:string;  // AnsiString(CP_ACP)
  Utf8StrA: string; // AnsiString(CP_ACP)
  Utf8StrB: string; // AnsiString(CP_ACP)
  Utf8StrConcat:string; // AnsiString(CP_ACP)
begin
  AnsiStrA:=' ';
  AnsiStrA[1]:=#243; // o acute win-1252
  AnsiStrB:='A';

  // AnsiStrA is AnsiString(CP_ACP)
  // AnsiStrB is AnsiString(CP_ACP)

  Utf8StrA:=AnsiToUtf8(AnsiStrA); // 195 179
  Utf8StrB:=AnsiToUtf8(AnsiStrB); // 65

  // is Utf8StrA now utf8string? or something similar like Ansistring(UTF_8)
  // is Utf8StrB now utf8string? or something similar like Ansistring(UTF_8)
 
  Utf8StrConcat:=Utf8StrA+Utf8StrB;
 
  //  AnsiString(CP_ACP) = UTF8 + UT8
  //  automatic Conversion to ansiString(CP_ACP) ?
 
 
end;



--
Saludos

Santi
[hidden email]

_______________________________________________
fpc-pascal maillist  -  [hidden email]
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
Reply | Threaded
Open this post in threaded view
|

Re: Weird string behavior

Bart-48
On 7/22/16, Santiago A. <[hidden email]> wrote:

>   // is Utf8StrA now utf8string? or something similar like Ansistring(UTF_8)
>   // is Utf8StrB now utf8string? or something similar like Ansistring(UTF_8)

Just check the vaue of StringCodePage(Utf8StrA).

procedure DisplayBytes(S:RawByteString);
var
  i:Integer;
begin
  Write('  ');
  for i:=1 to length(s) do
    write(ord(s[i]),' ');
  writeln;
end;

var
  AnsiStrA: String;
begin
  AnsiStrA:=' ';
  AnsiStrA[1]:=#243; // o acute win-1252
  AnsiStrA := AnsiToUtf8(AnsiStrA);
  writeln('StringCodePage(AnsiStrA) now is: ',stringcodepage(ansistra));
  Write('AnsiStrA: ');DisplayBytes(AnsiStrA);
end.

Gives:
StringCodePage(AnsiStrA) now is: 65001
AnsiStrA:   195 179

Notice that your original problem was mainly due to the fact that
DisplayBytes used a String parameter, which lead to automatically
converting everything back to your Windows codepage.

Bart
_______________________________________________
fpc-pascal maillist  -  [hidden email]
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
Reply | Threaded
Open this post in threaded view
|

Re: Weird string behavior

Graeme Geldenhuys-6
On 2016-07-22 11:55, Bart wrote:
> Gives:
> StringCodePage(AnsiStrA) now is: 65001
> AnsiStrA:   195 179

I don't understand, why did AnsiStrA change its codepage type after the
3rd assignment to it?

Here is the results on my Windows 7 system.

==========================================
$ codepagestring.exe
StringCodePage(AnsiStrA) now is: 1252
StringCodePage(AnsiStrA) now is: 65001
AnsiStrA:   195 179
==========================================

All I did to the last code example was duplicate the line of code that
calls stringcodepage() as shown here... So I get a before and after result.

==========================================
program codepagestring;

{$mode objfpc}{$H+}

procedure DisplayBytes(S:RawByteString);
var
  i:Integer;
begin
  Write('  ');
  for i:=1 to length(s) do
    write(ord(s[i]),' ');
  writeln;
end;

var
  AnsiStrA: String;
begin
  writeln('StringCodePage(AnsiStrA) now is: ',stringcodepage(ansistra));
  AnsiStrA:=' ';
  AnsiStrA[1]:=#243; // o acute win-1252
  AnsiStrA := AnsiToUtf8(AnsiStrA);
  writeln('StringCodePage(AnsiStrA) now is: ',stringcodepage(ansistra));
  Write('AnsiStrA: ');
  DisplayBytes(AnsiStrA);
end.
==========================================


Regards,
  Graeme

--
fpGUI Toolkit - a cross-platform GUI toolkit using Free Pascal
http://fpgui.sourceforge.net/

My public PGP key:  http://tinyurl.com/graeme-pgp
_______________________________________________
fpc-pascal maillist  -  [hidden email]
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
Reply | Threaded
Open this post in threaded view
|

Re: Weird string behavior

Santiago A.
In reply to this post by Bart-48
El 22/07/2016 a las 12:55, Bart escribió:
> Just check the vaue of StringCodePage(Utf8StrA).

Not Initialized
  AnsiStrA: 1252
  ResultA: 1252

AnsiStrA:=' '
  AnsiStrA: 0

AnsiStrA[1]:=#243; // o acute win-1252
  AnsiStrA: 0

ResultA:=AnsiStrA
  ResultA: 0

ResultA := AnsiStrA + ' '
  ResultA: 1252

ResultA:=AnsiToUtf8(AnsiStrA);
  ResultA: 65001

ResultA:= AnsiToUtf8(AnsiStrA) + AnsiToUtf8(AnsiStrA);
  ResultA: 1252


I'm definitively completely lost

-------------------------------

program testconvertstr;

var
  AnsiStrA:string;
  ResultA:string;
begin
  writeln('Not Initialized');
  writeln('  AnsiStrA: ',stringcodepage(ansistra));
  writeln('  ResultA: ',stringcodepage(ResultA));

  Writeln;writeln('AnsiStrA:='' ''');
  AnsiStrA:=' ';
  writeln('  AnsiStrA: ',stringcodepage(ansistra));
  Writeln;writeln('AnsiStrA[1]:=#243; // o acute win-1252');
  AnsiStrA[1]:=#243; // o acute win-1252
  writeln('  AnsiStrA: ',stringcodepage(ansistra));

  Writeln;writeln('ResultA:=AnsiStrA');
  ResultA:=AnsiStrA;
  writeln('  ResultA: ',stringcodepage(ResultA));

  Writeln;writeln('ResultA := AnsiStrA + '' ''');
  ResultA:=AnsiStrA+' ';
  writeln('  ResultA: ',stringcodepage(ResultA));

  Writeln;Writeln('ResultA:=AnsiToUtf8(AnsiStrA);');
  ResultA:=AnsiToUtf8(AnsiStrA);
  writeln('  ResultA: ',stringcodepage(ResultA));

  Writeln;writeln('ResultA:= AnsiToUtf8(AnsiStrA) + AnsiToUtf8(AnsiStrA);');
  ResultA:=AnsiToUtf8(AnsiStrA)+AnsiToUtf8(AnsiStrA);
  writeln('  ResultA: ',stringcodepage(ResultA));
  Readln;
end.





--
Saludos

Santi
[hidden email]

_______________________________________________
fpc-pascal maillist  -  [hidden email]
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
Reply | Threaded
Open this post in threaded view
|

Re: Weird string behavior

Graeme Geldenhuys-6
On 2016-07-22 13:14, Santiago A. wrote:
> I'm definitively completely lost

:) So am I.

Regards,
  Graeme

_______________________________________________
fpc-pascal maillist  -  [hidden email]
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
Reply | Threaded
Open this post in threaded view
|

Re: Weird string behavior

Jonas Maebe-2
In reply to this post by Santiago A.
On 22/07/16 14:14, Santiago A. wrote:
>
> program testconvertstr;

You are missing {$h+} here. When posting programs, always include all
switches and/or all command line options. The program also compiles with
string = shortstring (the default), but has different behaviour in that
case.

> var
>   AnsiStrA:string;
>   ResultA:string;
> begin
>   writeln('Not Initialized');
>   writeln('  AnsiStrA: ',stringcodepage(ansistra));
>   writeln('  ResultA: ',stringcodepage(ResultA));

The string code page of an empty string is always DefaultSystemCodePage.

>   Writeln;writeln('AnsiStrA:='' ''');
>   AnsiStrA:=' ';
>   writeln('  AnsiStrA: ',stringcodepage(ansistra));

The string code page of constant strings is described at
http://wiki.freepascal.org/FPC_Unicode_support#String_constants . In
this case, it is CP_ACP (= 0) because no source file code page has been set.

>   Writeln;writeln('AnsiStrA[1]:=#243; // o acute win-1252');
>   AnsiStrA[1]:=#243; // o acute win-1252
>   writeln('  AnsiStrA: ',stringcodepage(ansistra));

Changing an individual byte of a string has no influence on its code page.

>   Writeln;writeln('ResultA:=AnsiStrA');
>   ResultA:=AnsiStrA;
>   writeln('  ResultA: ',stringcodepage(ResultA));

Assigning a ansistring to another ansistring with the same declared code
page (both AnsiStrA and ResultA have CP_ACP as declared code page) won't
change the (dynamic) string code page (see
http://wiki.freepascal.org/FPC_Unicode_support#Dynamic_code_page ).

>   Writeln;writeln('ResultA := AnsiStrA + '' ''');
>   ResultA:=AnsiStrA+' ';
>   writeln('  ResultA: ',stringcodepage(ResultA));

See http://wiki.freepascal.org/FPC_Unicode_support#String_concatenation 
: the result of a string concatenation will always be converted to the
declared code page of the destination (and CP_ACP represents the current
value of DefaultSystemCodePage, see
http://wiki.freepascal.org/FPC_Unicode_support#Code_page_identifiers ).

>   Writeln;Writeln('ResultA:=AnsiToUtf8(AnsiStrA);');
>   ResultA:=AnsiToUtf8(AnsiStrA);
>   writeln('  ResultA: ',stringcodepage(ResultA));

AnsiToUtf8() returns a RawByteString with dynamic code page CP_UTF8 (so
that the dynamic code page matches the actual string encoding).
Assigning a RawByteString to any other string type never results in a
string code page conversion (see
http://wiki.freepascal.org/FPC_Unicode_support#RawByteString ).

>   Writeln;writeln('ResultA:= AnsiToUtf8(AnsiStrA) + AnsiToUtf8(AnsiStrA);');
>   ResultA:=AnsiToUtf8(AnsiStrA)+AnsiToUtf8(AnsiStrA);
>   writeln('  ResultA: ',stringcodepage(ResultA));

See again
http://wiki.freepascal.org/FPC_Unicode_support#String_concatenations 
(same as before).


Jonas
_______________________________________________
fpc-pascal maillist  -  [hidden email]
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
Reply | Threaded
Open this post in threaded view
|

Re: Weird string behavior

Santiago A.
El 22/07/2016 a las 15:03, Jonas Maebe escribió:
>
> See again
> http://wiki.freepascal.org/FPC_Unicode_support#String_concatenations
> (same as before).

So

  ResultA := AnsiToUTF8(AnsiStrA + AnsiStrA);  // UTF-8   ResultB :=
AnsiToUTF8(AnsiStrA) + AnsiToUTF8(AnsiStrA); // Win-1252

   
And resultA is not equal to ResultB

It doesn't look like too intuitive.

I would say that it is closer to "hidden secret knowledge" than to the
"Principle of least surprise".

--
Saludos

Santiago A.
[hidden email]

_______________________________________________
fpc-pascal maillist  -  [hidden email]
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
Reply | Threaded
Open this post in threaded view
|

Re: Weird string behavior

Jonas Maebe-2
On 22/07/16 17:33, Santiago A. wrote:
> So
>
>   ResultA := AnsiToUTF8(AnsiStrA + AnsiStrA);  // UTF-8   ResultB :=
> AnsiToUTF8(AnsiStrA) + AnsiToUTF8(AnsiStrA); // Win-1252
>
>
> And resultA is not equal to ResultB
>
> It doesn't look like too intuitive.

It would be good if someone with access to Delphi 2009+ could test this.
It is possible that the result of concatenating two RawByteStrings
should again be a RawByteString and that hence no conversion should
happen on assignment here either.

> I would say that it is closer to "hidden secret knowledge" than to the
> "Principle of least surprise".

There is no hidden secret knowledge. Everything is documented and the
information is linked from the release notes. The "principle of least
surprise" has been applied in the sense that we didn't invent our own
system that introduces small or large differences compared to how Delphi
behaves in the same situation (and if there are differences, then those
are bugs in our implementation).


Jonas
_______________________________________________
fpc-pascal maillist  -  [hidden email]
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
Reply | Threaded
Open this post in threaded view
|

Re: Weird string behavior

Santiago A.
El 22/07/2016 a las 17:56, Jonas Maebe escribió:
>
> There is no hidden secret knowledge. Everything is documented and the
> information is linked from the release notes. The "principle of least
> surprise" has been applied in the sense that we didn't invent our own
> system that introduces small or large differences compared to how
> Delphi behaves in the same situation (and if there are differences,
> then those are bugs in our implementation).

With hidden knowledge I don't mean it's not documented, I mean that's
like small print in contracts.

Only those who know there is trick with adding are going to check
manual. Complex things and pushing the language to its limits could
require reading manual, adding two strings shouldn't.

MyString := expression

I really shouldn't need to know if right expression uses '+',  or the
result of a function, or '*', to guess what  MyString type is.

In addition, changing the codepage on the fly if a bad idea.
If I cant change the codepage dynamically (I don't like it, but let's
live with it), let me assign it explicitly, don't change it on the fly.

SetCodePage(MyString,win1252);
MyString := AnsiToUTF8(Ansi1 + Ansi2);  // Automatically converted to
Win-1252 before assign
MyString := AnsiToUTF8(Ansi1) + AnsiToUTF8(Ansi2);   // Automatically
converted to Win-1252 before assign

SetCodePage(MyString,utf8);
MyString := AnsiToUTF8(Ansi1 + Ansi2);  //   No conversion needed
MyString := AnsiToUTF8(Ansi2) + AnsiToUTF8(Ansi2); // No conversion needed

MyString := Ansi1 + Ansi2;  //  Automatically converted to Utf8 before
assign

None changes the codepage of the String but me

I don't like automatic conversion, but let's live with it. But I think
that automatic change of var type is really wrong.

This is Pascal, not bash or PHP.

--
Saludos

Santiago A.

_______________________________________________
fpc-pascal maillist  -  [hidden email]
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
Reply | Threaded
Open this post in threaded view
|

Re: Weird string behavior

Graeme Geldenhuys-6
On 2016-07-22 20:49, Santiago A. wrote:
> In addition, changing the codepage on the fly if a bad idea.

+1
I think that is very much the wrong behaviour too. As I mentioned before
is this mailing list. I think the FPC 3.x with these codepage aware
AnsiStrings is a damn mess. As far as I can see, FPC is now worse that
Delphi 2009 was. And NO, don't tell me everything should work as it did
before... It doesn't!! This thread just proves that one more time.

Regards,
  Graeme

_______________________________________________
fpc-pascal maillist  -  [hidden email]
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
Reply | Threaded
Open this post in threaded view
|

Re: Weird string behavior

Jonas Maebe-2
In reply to this post by Santiago A.
On 22/07/16 21:49, Santiago A. wrote:

> El 22/07/2016 a las 17:56, Jonas Maebe escribió:
>>
>> There is no hidden secret knowledge. Everything is documented and the
>> information is linked from the release notes. The "principle of least
>> surprise" has been applied in the sense that we didn't invent our own
>> system that introduces small or large differences compared to how
>> Delphi behaves in the same situation (and if there are differences,
>> then those are bugs in our implementation).
>
> With hidden knowledge I don't mean it's not documented, I mean that's
> like small print in contracts.

How much more prominent can it be when it's in the release notes?

> Only those who know there is trick with adding are going to check
> manual. Complex things and pushing the language to its limits could
> require reading manual, adding two strings shouldn't.

You don't have to read the manual. You have to read the release notes.

> MyString := expression
>
> I really shouldn't need to know if right expression uses '+',  or the
> result of a function, or '*', to guess what  MyString type is.

And maybe you shouldn't, because maybe there is a bug in FPC. But I
don't have Delphi 2009 and hence I cannot test.

> In addition, changing the codepage on the fly if a bad idea.
> If I cant change the codepage dynamically (I don't like it, but let's
> live with it), let me assign it explicitly, don't change it on the fly.

You can do that if you only use RawByteString. I doubt you'll find it
very convenient.

> I don't like automatic conversion, but let's live with it. But I think
> that automatic change of var type is really wrong.

I don't understand what you mean with this.


Jonas
_______________________________________________
fpc-pascal maillist  -  [hidden email]
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
Reply | Threaded
Open this post in threaded view
|

Re: Weird string behavior

Bart-48
In reply to this post by Jonas Maebe-2
On 7/22/16, Jonas Maebe <[hidden email]> wrote:

>>   ResultA := AnsiToUTF8(AnsiStrA + AnsiStrA);  // UTF-8   ResultB :=
>> AnsiToUTF8(AnsiStrA) + AnsiToUTF8(AnsiStrA); // Win-1252
>>
>>
>> And resultA is not equal to ResultB
>>
>> It doesn't look like too intuitive.
>
> It would be good if someone with access to Delphi 2009+ could test this.

I asked on Dutch Delphi forum if someone could test with D2009 or up.

Bart
_______________________________________________
fpc-pascal maillist  -  [hidden email]
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
Reply | Threaded
Open this post in threaded view
|

Re: Weird string behavior

Petr Kohut
Hello,
I tested the program listed below in "Delphi 10.1 Berlin" with the
results shown in the comments.

PK



program Project3;

{$APPTYPE CONSOLE}

{$R *.res}

var
   AnsiStrA: AnsiString;
   ResultA: AnsiString;
begin
   Writeln('Not Initialized');
   Writeln('  AnsiStrA: ', StringCodePage(AnsiStrA)); // 1250
   Writeln('  ResultA: ', StringCodePage(ResultA));   // 1250

   Writeln;
   Writeln('AnsiStrA := '' ''');
   AnsiStrA := ' ';
   Writeln('  AnsiStrA: ', StringCodePage(AnsiStrA)); // 1250

   Writeln;
   Writeln('AnsiStrA[1] := #243; // o acute win-1252');
   AnsiStrA[1] := #243; // o acute win-1252
   Writeln('  AnsiStrA: ', StringCodePage(AnsiStrA)); // 1250

   Writeln;
   Writeln('ResultA := AnsiStrA');
   ResultA := AnsiStrA;
   Writeln('  ResultA: ', StringCodePage(ResultA));   // 1250

   Writeln;
   Writeln('ResultA := AnsiStrA + '' ''');
   ResultA := AnsiStrA + ' ';
   Writeln('  ResultA: ', StringCodePage(ResultA));   // 1250

   Writeln;
   Writeln('ResultA := AnsiToUtf8(AnsiStrA);');
   ResultA := AnsiToUtf8(AnsiStrA);
   Writeln('  ResultA: ', StringCodePage(ResultA));   // 65001

   Writeln;
   Writeln('ResultA := AnsiToUtf8(AnsiStrA) + AnsiToUtf8(AnsiStrA);');
   ResultA := AnsiToUtf8(AnsiStrA) + AnsiToUtf8(AnsiStrA);
   Writeln('  ResultA: ', StringCodePage(ResultA));   // 65001

   Writeln;
   Writeln('ResultA := AnsiToUtf8(AnsiStrA) + AnsiStrA;');
   ResultA := AnsiToUtf8(AnsiStrA) + AnsiStrA;
   Writeln('  ResultA: ', StringCodePage(ResultA));   // 65001

   Writeln;
   Writeln('ResultA := AnsiStrA + AnsiToUtf8(AnsiStrA);');
   ResultA := AnsiStrA + AnsiToUtf8(AnsiStrA);
   Writeln('  ResultA: ', StringCodePage(ResultA));   // 1250

   Readln;
end.


(*
Not Initialized
   AnsiStrA: 1250
   ResultA: 1250

AnsiStrA := ' '
   AnsiStrA: 1250

AnsiStrA[1] := #243; // o acute win-1252
   AnsiStrA: 1250

ResultA := AnsiStrA
   ResultA: 1250

ResultA := AnsiStrA + ' '
   ResultA: 1250

ResultA := AnsiToUtf8(AnsiStrA);
   ResultA: 65001

ResultA := AnsiToUtf8(AnsiStrA) + AnsiToUtf8(AnsiStrA);
   ResultA: 65001

ResultA := AnsiToUtf8(AnsiStrA) + AnsiStrA;
   ResultA: 65001

ResultA := AnsiStrA + AnsiToUtf8(AnsiStrA);
   ResultA: 1250
*)




------ Původní zpráva ------
Od: "Bart" <[hidden email]>
Komu: "FPC-Pascal users discussions" <[hidden email]>
Odesláno: 23.07.2016 0:29:32
Předmět: Re: [fpc-pascal] Weird string behavior

>On 7/22/16, Jonas Maebe <[hidden email]> wrote:
>
>>>    ResultA := AnsiToUTF8(AnsiStrA + AnsiStrA);  // UTF-8   ResultB :=
>>>  AnsiToUTF8(AnsiStrA) + AnsiToUTF8(AnsiStrA); // Win-1252
>>>
>>>
>>>  And resultA is not equal to ResultB
>>>
>>>  It doesn't look like too intuitive.
>>
>>  It would be good if someone with access to Delphi 2009+ could test
>>this.
>
>I asked on Dutch Delphi forum if someone could test with D2009 or up.
>
>Bart
>_______________________________________________
>fpc-pascal maillist - [hidden email]
>http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal


_______________________________________________
fpc-pascal maillist  -  [hidden email]
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
Reply | Threaded
Open this post in threaded view
|

Re: Weird string behavior

Mattias Gaertner
In reply to this post by Bart-48
On Sat, 23 Jul 2016 00:29:32 +0200
Bart <[hidden email]> wrote:

> On 7/22/16, Jonas Maebe <[hidden email]> wrote:
>
> >>   ResultA := AnsiToUTF8(AnsiStrA + AnsiStrA);  // UTF-8   ResultB :=
> >> AnsiToUTF8(AnsiStrA) + AnsiToUTF8(AnsiStrA); // Win-1252
> >>
> >>
> >> And resultA is not equal to ResultB
> >>
> >> It doesn't look like too intuitive.  
> >
> > It would be good if someone with access to Delphi 2009+ could test this.  
>
> I asked on Dutch Delphi forum if someone could test with D2009 or up.

Here is a result of Delphi 10.1:

program DTestConcatenate;

{$APPTYPE CONSOLE}

{$R *.res}

uses
  System.SysUtils;

var
  s1,s2,s3: AnsiString;
  r1,r2,r3: RawByteString;
begin
  s1:='A';   // 1252
  s2:='Aä';  // 1252
  writeln('s1="',s1,'" cp=',StringCodePage(s1));
  writeln('s2="',s1,'" cp=',StringCodePage(s2));
  r1:=AnsiToUTF8(s1); // 65001
  r2:=AnsiToUTF8(s2); // 65001
  writeln('r1="',r1,'" cp=',StringCodePage(r1));
  writeln('r2="',r2,'" cp=',StringCodePage(r2));
  r3:=r1+r2; // 65001
  writeln('r3="',r3,'" cp=',StringCodePage(r3));
  s3:=r1+r2; // 65001
  writeln('s3="',s3,'" cp=',StringCodePage(s3));
end.

Mattias
_______________________________________________
fpc-pascal maillist  -  [hidden email]
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
Reply | Threaded
Open this post in threaded view
|

Re: Weird string behavior

Jonas Maebe-2
On 23/07/16 08:11, Mattias Gaertner wrote:
> Here is a result of Delphi 10.1:

Thank you (also Petr). Maybe one more, to know what happens if you mix
rawbytestring and ansistring in the concatenation:

program DTestConcatenate;

{$APPTYPE CONSOLE}

{$R *.res}

uses
   System.SysUtils;

var
   s1,s2,s3: AnsiString;
   r1,r2,r3: RawByteString;
begin
   s1:='A';   // 1252
   s2:='Aä';  // 1252
   writeln('s1="',s1,'" cp=',StringCodePage(s1));
   writeln('s2="',s1,'" cp=',StringCodePage(s2));
   r1:=AnsiToUTF8(s1); // 65001
   r2:=AnsiToUTF8(s2); // 65001
   writeln('r1="',r1,'" cp=',StringCodePage(r1));
   writeln('r2="',r2,'" cp=',StringCodePage(r2));

   r3:=s1+r2; // ??
   writeln('r3="',r3,'" cp=',StringCodePage(r3));
   r3:=r1+s2; // ??
   writeln('r3="',r3,'" cp=',StringCodePage(r3));


   s3:=s1+r2; // ??
   writeln('r3="',r3,'" cp=',StringCodePage(r3));
   s3:=r1+s2; // ??
   writeln('r3="',r3,'" cp=',StringCodePage(r3));
end.


Jonas
_______________________________________________
fpc-pascal maillist  -  [hidden email]
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
Reply | Threaded
Open this post in threaded view
|

Re: Weird string behavior

Jonas Maebe-2
On 23/07/16 12:13, Jonas Maebe wrote:

> On 23/07/16 08:11, Mattias Gaertner wrote:
>> Here is a result of Delphi 10.1:
>
> Thank you (also Petr). Maybe one more, to know what happens if you mix
> rawbytestring and ansistring in the concatenation:
>
> program DTestConcatenate;
>
> {$APPTYPE CONSOLE}
>
> {$R *.res}
>
> uses
>   System.SysUtils;
>
> var
>   s1,s2,s3: AnsiString;
>   r1,r2,r3: RawByteString;
> begin
>   s1:='A';   // 1252
>   s2:='Aä';  // 1252
>   writeln('s1="',s1,'" cp=',StringCodePage(s1));
>   writeln('s2="',s1,'" cp=',StringCodePage(s2));
>   r1:=AnsiToUTF8(s1); // 65001
>   r2:=AnsiToUTF8(s2); // 65001
>   writeln('r1="',r1,'" cp=',StringCodePage(r1));
>   writeln('r2="',r2,'" cp=',StringCodePage(r2));
>
>   r3:=s1+r2; // ??
>   writeln('r3="',r3,'" cp=',StringCodePage(r3));
>   r3:=r1+s2; // ??
>   writeln('r3="',r3,'" cp=',StringCodePage(r3));
>
>
>   s3:=s1+r2; // ??
>   writeln('r3="',r3,'" cp=',StringCodePage(r3));
>   s3:=r1+s2; // ??
>   writeln('r3="',r3,'" cp=',StringCodePage(r3));

Some copy paste errors in the last four lines, that should be

   s3:=s1+r2; // ??
   writeln('s3="',s3,'" cp=',StringCodePage(s3));
   s3:=r1+s2; // ??
   writeln('s3="',s3,'" cp=',StringCodePage(s3));

And maybe also (to check whether there is a difference depending on the
actual code page):

   setstringcodepage(rawbytestring(s1),65001,true);

   r3:=s1+r2; // ??
   writeln('r3="',r3,'" cp=',StringCodePage(r3));
   r3:=r1+s1; // ??
   writeln('r3="',r3,'" cp=',StringCodePage(r3));

   s3:=s1+r2; // ??
   writeln('s3="',s3,'" cp=',StringCodePage(s3));
   s3:=r1+s2; // ??
   writeln('s3="',s3,'" cp=',StringCodePage(s3));


Jonas
_______________________________________________
fpc-pascal maillist  -  [hidden email]
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
Reply | Threaded
Open this post in threaded view
|

Re: Weird string behavior

wkitty42
In reply to this post by Jonas Maebe-2
On 07/23/2016 06:13 AM, Jonas Maebe wrote:
[...]
> var
>   s1,s2,s3: AnsiString;
>   r1,r2,r3: RawByteString;
> begin
>   s1:='A';   // 1252
>   s2:='Aä';  // 1252
>   writeln('s1="',s1,'" cp=',StringCodePage(s1));
>   writeln('s2="',s1,'" cp=',StringCodePage(s2));

writeln('s2="',s2,'" cp=',StringCodePage(s2));


you're not the only one to have missed that...

gotta wonder how fubar the test results are now ;)


--
  NOTE: No off-list assistance is given without prior approval.
        *Please keep mailing list traffic on the list* unless
        private contact is specifically requested and granted.
_______________________________________________
fpc-pascal maillist  -  [hidden email]
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
Reply | Threaded
Open this post in threaded view
|

Re: Weird string behavior

Jonas Maebe-2
On 23/07/16 12:58, [hidden email] wrote:

> On 07/23/2016 06:13 AM, Jonas Maebe wrote:
> [...]
>> var
>>   s1,s2,s3: AnsiString;
>>   r1,r2,r3: RawByteString;
>> begin
>>   s1:='A';   // 1252
>>   s2:='Aä';  // 1252
>>   writeln('s1="',s1,'" cp=',StringCodePage(s1));
>>   writeln('s2="',s1,'" cp=',StringCodePage(s2));
>
> writeln('s2="',s2,'" cp=',StringCodePage(s2));
>
>
> you're not the only one to have missed that...

The only thing that matters for this test is the stringcodepage value,
which is the correct one.


Jonas
_______________________________________________
fpc-pascal maillist  -  [hidden email]
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
12