AnsiStrings and Memory Management

classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|

AnsiStrings and Memory Management

Adrian Veith
Hi,

two days ago i posted a comment about my experiences porting our
database server to fpc. yesterday i tried to figure out where the speed
differences between delphi and fpc come from. At the moment i stuck a
little bit, because the results i get don't make any sense at the moment
to me.
Anyway i dropped over another problem, which might be part of the cause
of my problems. I tried to create a test base for my investigations and
created a function NumberToText which converts an integer to written
text  (like NumberToText(1234) -> "one thousand two hundred thirty
four"). When I filled my database with these generated strings I found
some significant speed differences between delphi and fpc (over factor
10), which made me curious. According to your comments about my first
posting, i suspected the Memory Manager and created three different
versions of this NumberToText function and three test loops which each
calls the according NumberToText function for 0..999999. Here are the
results for fpc and delphi (w and w/o FastMM4)

Test1 uses "optimized" string concatention which avoids heap fragmentation
Test2 passes the string results using a var parameter
Test3 passes the string results as function results

Test1:
    fpc:  18.8 sec - factor = 8.17
    delphi (standard MM): 3.1 sec - factor = 1.34
    delphi (FastMM4): 2.3 sec - factor = 1

    fpc (ShortStrings): 1.25 sec - factor = 1.04
    delphi (ShortStrings): 1.20 sec - factor = 1

Test2:
    fpc:  45.2 sec - factor = 23.78
    delphi (standard MM): 2.9 sec - factor = 1.52
    delphi (FastMM4): 1.9 sec - factor = 1

    fpc (ShortStrings): 0.72 sec - factor = 1
    delphi (ShortStrings): 5.97 sec - factor = 8.29

Test3:
    fpc:  27.1 sec - factor = 12.9
    delphi (standard MM): 3.4 sec - factor = 1.61
    delphi (FastMM4): 2.1 sec - factor = 1

    fpc (ShortStrings): 1.27 sec - factor = 1
    delphi (ShortStrings): 4.84 sec - factor = 3.81

as you can see the "optimized" version (test 1) brings great benefits
for fpc compared to the more natural approach (test 3). If someone has
optimized code for delphi usage (test 2), he will suffer with the
greatest performance hit in fpc.
Anyway these results have nothing to do with the code quality of the
compiler (look at the ShortString results), but i think it's quite an
important rtl and memory manager issue - and Delphi proves, that it a
much higher speed with AnsiStrings is possible (astonishing that
ShortStrings are slower than AnsiStrings in this example in delphi).

cheers,

Adrian Veith.

Code is below:

program Bench2;
{$APPTYPE CONSOLE}
{$H+}

uses
  //{$ifndef fpc} FastMM4, {$endif}
    sysutils,
    Windows;

const
  cTimes = 999999;
    Number1: array [0..19] of string = (
    'zero', 'one', 'two', 'three', 'four', 'five',
    'six', 'seven', 'eight', 'nine', 'ten', 'eleven',
    'twelve', 'thirteen', 'fourteen', 'fifteen', 'sixteen',
    'seventeen', 'eighteen', 'nineteen');
   
  Number9: array [0..9] of string = (
    '', ' one', ' two', ' three', ' four', ' five',
    ' six', ' seven', ' eight', ' nine');

    Number10: array [0..9] of string = (
    'zero', 'ten', 'twenty', 'thirty', 'fourty', 'fifty',
    'sixty', 'seventy', 'eighty', 'ninety');

var
    StartTick: Cardinal;

procedure StartLog(const Text: string; Count: Integer);
begin
    if Count > 0 then
        write(Text, ': ', Count, ' ... ')
    else
        write(Text, ' ... ');
    StartTick:= GetTickCount;
end;

procedure EndLog(const Text: string);
begin
    writeln(Text, ' done in ', (GetTickCount - StartTick) / 1000.0: 0:
3, ' sec');
end;

type
  TFastStringRec = record
    l: Cardinal;
    s: string;
  end;
 
procedure FS_Clear(var AFS: TFastStringRec); inline;
begin
  AFS.L:= 0;
  AFS.S:= '';
end;

procedure FS_Assign(var AFS: TFastStringRec; const s: string); inline;
begin
  AFS.l:= Length(s);
  SetLength(AFS.s, (AFS.l and not 63) + 64);
  if AFS.l > 0 then
    Move(s[1], AFS.s[1], AFS.l);
end;

procedure FS_Append(var AFS: TFastStringRec; const s: string); overload;
inline;
var
  L, ls: Cardinal;
begin
  ls:= Length(s);
  if ls > 0 then begin
    L:= AFS.l;
    AFS.l:= L + ls;
    SetLength(AFS.s, (AFS.l and not 63) + 64);
    Move(s[1], AFS.s[1 + L], ls);
  end;
end;

procedure FS_Append(var AFS, S: TFastStringRec); overload; inline;
var
  L: Cardinal;
begin
  if S.L > 0 then begin
    L:= AFS.l;
    AFS.l:= L + S.L;
    SetLength(AFS.s, (AFS.l and not 63) + 64);
    Move(S.S[1], AFS.S[1 + L], S.L);
  end;
end;

function FS_ToStr(var AFS: TFastStringRec): string; inline;
begin
  if AFS.L >  0 then begin
    SetLength(Result, AFS.L);
    Move(AFS.S[1], Result[1], AFS.L);
  end else
    Result:= '';
end;

procedure NumberToText_V1(out s: string; n: Integer);

  procedure TensToText(var s: TFastStringRec; dig: Integer);
  var
    x: Integer;
  begin
      if dig > 0 then begin
          if dig >= 20 then begin
            x:= dig mod 10;
            FS_Assign(s, Number10[dig div 10]);
              if x <> 0 then
               FS_Append(s, Number9[x]);
          end else begin
              FS_Assign(s, Number1[dig]);
          end;
      end else
        FS_Clear(s);
  end;
 
  procedure HundredsToText(var s: TFastStringRec; dig: Integer);
  var
      h, t: Integer;
      s1: TFastStringRec;
  begin
    if dig > 0 then begin
        t:= dig mod 100;
        h:= dig div 100;
        if h > 0 then begin
        TensToText(s, h);
          if t > 0 then begin
            FS_Append(s, ' houndred ');
          TensToText(s1, t);
          FS_Append(s, s1);
          end else
            FS_Append(s, ' houndred');
        end else
          TensToText(s, t);
      end else
        FS_Clear(s);
  end;

var
    dig, h: Integer;
    s0, s1: TFastStringRec;
begin
    if n > 0 then begin
        dig:= n div 1000;
        h:= n mod 1000;
        if dig > 0 then begin
          HundredsToText(s0, dig);
          if h > 0 then begin
        FS_Append(s0, ' thousand ');
              HundredsToText(s1, h);
        FS_Append(s0, s1);
          end else
            FS_Append(s0, ' thousand');
        end else
          HundredsToText(s0, h);
        s:= FS_ToStr(s0);
    end else
        s:= Number1[0];
end;


procedure NumberToText_V2(out s: string; n: Integer);

  procedure TensToText(out s: string; dig: Integer);
  var
    x: Integer;
  begin
      if dig > 0 then begin
          if dig >= 20 then begin
            x:= dig mod 10;
              if x <> 0 then begin
                  s:= Number10[dig div 10] + Number9[x]
              end else
                s:= Number10[dig div 10];
          end else begin
              s:= Number1[dig];
          end;
      end else
        s:= '';
  end;

  procedure HundredsToText(out s: string; dig: Integer);
  var
      h, t: Integer;
      s1: string;
  begin
    if dig > 0 then begin
        t:= dig mod 100;
        h:= dig div 100;
        if h > 0 then begin
        TensToText(s, h);
          if t > 0 then begin
            s:= s + ' houndred ';
          TensToText(s1, t);
          s:= s + s1;
          end else
            s:= s + ' houndred';
        end else
          TensToText(s, t);
      end else
        s:= '';
  end;
 
var
    dig, h: Integer;
    s1: string;
begin
    if n > 0 then begin
        dig:= n div 1000;
        h:= n mod 1000;
        if dig > 0 then begin
          HundredsToText(s, dig);
          if h > 0 then begin
        s:= s + ' thousand ';
              HundredsToText(s1, h);
        s:= s + s1;
          end else
            s:= s + ' thousand';
        end else
          HundredsToText(s, h);
    end else
        s:= Number1[0];
end;

function NumberToText_V3(n: Integer): string;

    function TensToText(dig: Integer): string;
  var
    x: Integer;
  begin
      if dig > 0 then begin
          if dig >= 20 then begin
            x:= dig mod 10;
              if x <> 0 then begin
                  Result:= Number10[dig div 10] + Number9[x]
              end else
                Result:= Number10[dig div 10];
          end else begin
              Result:= Number1[dig];
          end;
      end else
        Result:= '';
  end;

    function HundredsToText(dig: Integer): string;
  var
      h, t: Integer;
  begin
    if dig > 0 then begin
        t:= dig mod 100;
        h:= dig div 100;
        if h > 0 then begin
          if t > 0 then
            Result:= TensToText(h) + ' houndred ' + TensToText(t)
          else
            Result:= TensToText(h) + ' houndred';
        end else
          Result:= TensToText(t);
      end else
        Result:= '';
  end;

var
    dig, h: Integer;
begin
    if n > 0 then begin
        dig:= n div 1000;
        h:= n mod 1000;
        if dig > 0 then begin
          if h > 0 then
        Result:= HundredsToText(dig) + ' thousand ' + HundredsToText(h)
          else
            Result:= HundredsToText(dig) + ' thousand';
        end else
          Result:= HundredsToText(h);
    end else
        Result:= Number1[0];
end;

procedure Test1;
var
    i: Integer;
    s: string;
begin
    StartLog('Test 1', cTimes + 1);
    for i:= 0 to cTimes do begin
      NumberToText_V1(s, i);
    end;
    EndLog('');
end;

procedure Test2;
var
    i: Integer;
    s: string;
begin
    StartLog('Test 2', cTimes + 1);
    for i:= 0 to cTimes do begin
      NumberToText_V2(s, i);
    end;
    EndLog('');
end;

procedure Test3;
var
    i: Integer;
    s: string;
begin
    StartLog('Test 3', cTimes + 1);
    for i:= 0 to cTimes do begin
      s:= NumberToText_V3(i);
    end;
    EndLog('');
end;

procedure Benchmark;
begin
    Test1;
    Test2;
    Test3;
end;

begin
    Benchmark;
    Readln;
end.
_______________________________________________
fpc-pascal maillist  -  [hidden email]
http://lists.freepascal.org/mailman/listinfo/fpc-pascal
Reply | Threaded
Open this post in threaded view
|

Re: AnsiStrings and Memory Management

Matt Emson

>(astonishing that  ShortStrings are slower than AnsiStrings in this example
in delphi).

Not really. Delphi is silently upcasting your shortstrings to AnsiStrings in
the background. You would probably have to recompile the VCL to get around
this ;-)

Hope that helps,

M

_______________________________________________
fpc-pascal maillist  -  [hidden email]
http://lists.freepascal.org/mailman/listinfo/fpc-pascal
Reply | Threaded
Open this post in threaded view
|

Re: AnsiStrings and Memory Management

Peter Vreman
>
>>(astonishing that  ShortStrings are slower than AnsiStrings in this
>> example
> in delphi).
>
> Not really. Delphi is silently upcasting your shortstrings to AnsiStrings
> in
> the background. You would probably have to recompile the VCL to get around
> this ;-)

I've commited a few small fixes to improve performance, please try again
with current 2.1.1 from svn



_______________________________________________
fpc-pascal maillist  -  [hidden email]
http://lists.freepascal.org/mailman/listinfo/fpc-pascal
Reply | Threaded
Open this post in threaded view
|

Re: AnsiStrings and Memory Management

Florian Klaempfl
In reply to this post by Adrian Veith
Adrian Veith wrote:

> Hi,
>
> two days ago i posted a comment about my experiences porting our
> database server to fpc. yesterday i tried to figure out where the speed
> differences between delphi and fpc come from. At the moment i stuck a
> little bit, because the results i get don't make any sense at the moment
> to me.
> Anyway i dropped over another problem, which might be part of the cause
> of my problems. I tried to create a test base for my investigations and
> created a function NumberToText which converts an integer to written
> text  (like NumberToText(1234) -> "one thousand two hundred thirty
> four"). When I filled my database with these generated strings I found
> some significant speed differences between delphi and fpc (over factor
> 10), which made me curious. According to your comments about my first
> posting, i suspected the Memory Manager and created three different
> versions of this NumberToText function and three test loops which each
> calls the according NumberToText function for 0..999999. Here are the
> results for fpc and delphi (w and w/o FastMM4)
>
> Test1 uses "optimized" string concatention which avoids heap fragmentation

Did you try/have an unoptimized version? FPC doesn't need these
optimizations, I fear they hurt only.
_______________________________________________
fpc-pascal maillist  -  [hidden email]
http://lists.freepascal.org/mailman/listinfo/fpc-pascal
Reply | Threaded
Open this post in threaded view
|

Re: AnsiStrings and Memory Management

Florian Klaempfl
In reply to this post by Peter Vreman
Peter Vreman wrote:

>>>(astonishing that  ShortStrings are slower than AnsiStrings in this
>>>example
>>
>>in delphi).
>>
>>Not really. Delphi is silently upcasting your shortstrings to AnsiStrings
>>in
>>the background. You would probably have to recompile the VCL to get around
>>this ;-)
>
>
> I've commited a few small fixes to improve performance, please try again
> with current 2.1.1 from svn

It's enough to get as fast as D7 :) I guess to get more speed we need:
1) subroutine generation without entry/exit code
2) better register allocation for small procedures

else we can't compete with the hand optimized assembler routines of the delphi
rtl: fpc_unique_string is used a lot here which looks like:

.globl fpc_ansistr_unique
fpc_ansistr_unique:
.globl FPC_ANSISTR_UNIQUE
FPC_ANSISTR_UNIQUE:
# Temps allocated between ebp-8 and ebp+0
        pushl %ebp
        movl %esp,%ebp
        subl $8,%esp
        movl %ebx,-8(%ebp)
        movl %esi,-4(%ebp)
# Var S located in register
# Var $result located in register
        movl %eax,%ebx
# [437] pointer(result) := pointer(s);
        movl (%ebx),%esi
# [438] If Pointer(S)=Nil then
        testl %esi,%esi
        je .L4159
.L4164:
# [440] if PAnsiRec(Pointer(S)-Firstoff)^.Ref<>1 then
        movl (%ebx),%eax
        subl $8,%eax
        movl (%eax),%eax
        cmpl $1,%eax
        je .L4166
# [441] result:=fpc_truely_ansistr_unique(s);
        movl %ebx,%eax
        call SYSTEM_FPC_TRUELY_ANSISTR_UNIQUE$POINTER$$POINTER
        movl %eax,%esi
.L4166:
.L4159:
# [442] end;
        movl %esi,%eax
        movl -8(%ebp),%ebx
        movl -4(%ebp),%esi
        leave
        ret

Making it "frameless" safes 4 instructions.

Being able to put result into eax saves at least another 4. result could be put
in eax if the compiler recognizes that it doesn't have to save result while the
fpc_truely_ansistr_unique(s) call because result is assigned after it. Is this
doable (Daniel, Peter)?

Further, using ebx for s isn't necessary because after the
fpc_truely_ansistr_unique call s isn't used anymore. Could save another 2
instructions.

_______________________________________________
fpc-pascal maillist  -  [hidden email]
http://lists.freepascal.org/mailman/listinfo/fpc-pascal
Reply | Threaded
Open this post in threaded view
|

Re: AnsiStrings and Memory Management

Adrian Veith
In reply to this post by Florian Klaempfl

Florian Klaempfl schrieb:

> Adrian Veith wrote:
>
>  
>> Hi,
>>
>> two days ago i posted a comment about my experiences porting our
>> database server to fpc. yesterday i tried to figure out where the speed
>> differences between delphi and fpc come from. At the moment i stuck a
>> little bit, because the results i get don't make any sense at the moment
>> to me.
>> Anyway i dropped over another problem, which might be part of the cause
>> of my problems. I tried to create a test base for my investigations and
>> created a function NumberToText which converts an integer to written
>> text  (like NumberToText(1234) -> "one thousand two hundred thirty
>> four"). When I filled my database with these generated strings I found
>> some significant speed differences between delphi and fpc (over factor
>> 10), which made me curious. According to your comments about my first
>> posting, i suspected the Memory Manager and created three different
>> versions of this NumberToText function and three test loops which each
>> calls the according NumberToText function for 0..999999. Here are the
>> results for fpc and delphi (w and w/o FastMM4)
>>
>> Test1 uses "optimized" string concatention which avoids heap fragmentation
>>    
>
> Did you try/have an unoptimized version? FPC doesn't need these
> optimizations, I fear they hurt only.
>  

the example includes two "optimized" and one unoptimized version. the
first "optimized" version is the fasted for fpc 2.0.2 (about 2 times)
but in delphi it has no effect.
But you are completely right for the fpc 2.1.1 version which I just
tried out - and I am very pleased. Here are the results:

Test1:
   fpc:  3.1 sec - factor = 1.34
   delphi (standard MM): 3.1 sec - factor = 1.34
   delphi (FastMM4): 2.3 sec - factor = 1

Test2:
   fpc: 2.8  sec - factor =  1.47
   delphi (standard MM): 2.9 sec - factor = 1.52
   delphi (FastMM4): 1.9 sec - factor = 1

Test3:
   fpc:  2.5 sec - factor = 1,19
   delphi (standard MM): 3.4 sec - factor = 1.61
   delphi (FastMM4): 2.1 sec - factor = 1


the increase in speed is remarkable !!!

_______________________________________________
fpc-pascal maillist  -  [hidden email]
http://lists.freepascal.org/mailman/listinfo/fpc-pascal
Reply | Threaded
Open this post in threaded view
|

Re: AnsiStrings and Memory Management

Adrian Veith
In reply to this post by Peter Vreman
Peter Vreman schrieb:

>>> (astonishing that  ShortStrings are slower than AnsiStrings in this
>>> example
>>>      
>> in delphi).
>>
>> Not really. Delphi is silently upcasting your shortstrings to AnsiStrings
>> in
>> the background. You would probably have to recompile the VCL to get around
>> this ;-)
>>    
>
> I've commited a few small fixes to improve performance, please try again
> with current 2.1.1 from svn
>
>
>
>  

I just tried the 2.1.1 version. The increase of speed is remarkable. Thanks.

_______________________________________________
fpc-pascal maillist  -  [hidden email]
http://lists.freepascal.org/mailman/listinfo/fpc-pascal
Reply | Threaded
Open this post in threaded view
|

Re: AnsiStrings and Memory Management

Peter Vreman
In reply to this post by Florian Klaempfl
> Peter Vreman wrote:
>
>>>>(astonishing that  ShortStrings are slower than AnsiStrings in this
>>>>example
>>>
>>>in delphi).
>>>
>>>Not really. Delphi is silently upcasting your shortstrings to
>>> AnsiStrings
>>>in
>>>the background. You would probably have to recompile the VCL to get
>>> around
>>>this ;-)
>>
>>
>> I've commited a few small fixes to improve performance, please try again
>> with current 2.1.1 from svn
>
> It's enough to get as fast as D7 :) I guess to get more speed we need:
> 1) subroutine generation without entry/exit code
> 2) better register allocation for small procedures
>
> else we can't compete with the hand optimized assembler routines of the
> delphi
> rtl: fpc_unique_string is used a lot here which looks like:
>
> .globl fpc_ansistr_unique
> fpc_ansistr_unique:
> .globl FPC_ANSISTR_UNIQUE
> FPC_ANSISTR_UNIQUE:
> # Temps allocated between ebp-8 and ebp+0
> pushl %ebp
> movl %esp,%ebp
> subl $8,%esp
> movl %ebx,-8(%ebp)
> movl %esi,-4(%ebp)
> # Var S located in register
> # Var $result located in register
> movl %eax,%ebx
> # [437] pointer(result) := pointer(s);
> movl (%ebx),%esi
> # [438] If Pointer(S)=Nil then
> testl %esi,%esi
> je .L4159
> .L4164:
> # [440] if PAnsiRec(Pointer(S)-Firstoff)^.Ref<>1 then
> movl (%ebx),%eax
> subl $8,%eax
> movl (%eax),%eax
> cmpl $1,%eax
> je .L4166
> # [441] result:=fpc_truely_ansistr_unique(s);
> movl %ebx,%eax
> call SYSTEM_FPC_TRUELY_ANSISTR_UNIQUE$POINTER$$POINTER
> movl %eax,%esi
> .L4166:
> .L4159:
> # [442] end;
> movl %esi,%eax
> movl -8(%ebp),%ebx
> movl -4(%ebp),%esi
> leave
> ret
>
> Making it "frameless" safes 4 instructions.
>
> Being able to put result into eax saves at least another 4. result could
> be put
> in eax if the compiler recognizes that it doesn't have to save result
> while the
> fpc_truely_ansistr_unique(s) call because result is assigned after it. Is
> this
> doable (Daniel, Peter)?
>
> Further, using ebx for s isn't necessary because after the
> fpc_truely_ansistr_unique call s isn't used anymore. Could save another 2
> instructions.

This issue is also in the Todo for 2.1.1

The register variable allocation should be optimized. Currently the
virtual registers for register variables are allocated for the whole
procedure. A first step is to limit the lifetime of the virtual register.
But limiting the limetime requires synchronization points of the register
variables for loops and if-then-else statements. Example:

while ... do
  begin
    inc(regvar);
    writeln(regvar);
  end

the lifetime of the regvar shall be the whole loop and not only for those
two instructions.

Another issue that can use some optimization is the parameter loading and
allocation of internally called functions like FPC_INCR_ANSISTR_REF. A
possible solution is to modify the parameter generation code to first load
all parameters values to virtual registers and then with a single method
push all parameters like it is already done in ncgcal.pas.



_______________________________________________
fpc-pascal maillist  -  [hidden email]
http://lists.freepascal.org/mailman/listinfo/fpc-pascal
Reply | Threaded
Open this post in threaded view
|

Re: AnsiStrings and Memory Management

Florian Klaempfl
I think the fpc code is quite usable now:

FPC 2.1.1
Test 1: 1000000 ...  done in 2.265 sec
Test 2: 1000000 ...  done in 1.563 sec
Test 3: 1000000 ...  done in 1.687 sec

D7:
Test 1: 1000000 ...  done in 2.078 sec
Test 2: 1000000 ...  done in 2.078 sec
Test 3: 1000000 ...  done in 2.312 sec

Can someone test with FPC 2.1.1 and D2006? If necessary, I can speed up
Test 1 and Test 2 with little effort :)
_______________________________________________
fpc-pascal maillist  -  [hidden email]
http://lists.freepascal.org/mailman/listinfo/fpc-pascal
Reply | Threaded
Open this post in threaded view
|

Re: AnsiStrings and Memory Management

Florian Klaempfl
Florian Klaempfl wrote:

> I think the fpc code is quite usable now:
>
> FPC 2.1.1
> Test 1: 1000000 ...  done in 2.265 sec
> Test 2: 1000000 ...  done in 1.563 sec
> Test 3: 1000000 ...  done in 1.687 sec
>
> D7:
> Test 1: 1000000 ...  done in 2.078 sec
> Test 2: 1000000 ...  done in 2.078 sec
> Test 3: 1000000 ...  done in 2.312 sec
>
> Can someone test with FPC 2.1.1 and D2006? If necessary, I can speed up
> Test 1 and Test 2 with little effort :)

We used the holidays for another couple of improvements, so I would like
to see new timings :) (rtl (important!)+bench compiled with -O3p3r and
maybe -Cppentium2)
_______________________________________________
fpc-pascal maillist  -  [hidden email]
http://lists.freepascal.org/mailman/listinfo/fpc-pascal