code optimization

classic Classic list List threaded Threaded
24 messages Options
12
Reply | Threaded
Open this post in threaded view
|

code optimization

stefan077
Hi all,
 
I am currently trying to get the fastest possible code from fpc on a  modern CPU (Intel Xeon) under Windows XP. I use the compiler options:
 
fpc -Mdelphi -O3 -OpPENTIUMM -Cfsse2 -Cr- -Co- -CO- -Ci- myprogram.dpr
 
Are there any better settings I should use? Compiling exactly the same  program with Delphi7 results in 50% faster code. I understand that fpc  does not support the most recent CPUs, but neither does DELPHI 7. So  what optimizations are done by Delphi 7 that are not done by fpc and why  is this the case?
 
 Stefan
___________________________________________________________
Neu: WEB.DE De-Mail - Einfach wie E-Mail, sicher wie ein Brief!  
Jetzt De-Mail-Adresse reservieren: https://produkte.web.de/go/demail02
_______________________________________________
fpc-pascal maillist  -  [hidden email]
http://lists.freepascal.org/mailman/listinfo/fpc-pascal
Reply | Threaded
Open this post in threaded view
|

Re: code optimization

Adrian Veith
 Hi Stefan,

is this a benchmark program or a complex program you are talking about.
If it is a benchmark, then it would be interesting to see the code,
because from my experience I doubt that Delphi produces better code than
fpc (in general it is the other way round). If it is a complex program,
then you need to analyze carefully which part of the program consumes
the most time and why. A number of 50% is in anyway unrealistic (this is
something you get if you compare inlined code against uninlined) ,
because the differences you get from code optimization are in a range
from +/-10% normally - unless you have found a real performance
bottleneck. And sometimes (most of) it is only an unoptimized library code.

Adrian.

On 21.09.2010 16:39, [hidden email] wrote:

> Hi all,
>  
> I am currently trying to get the fastest possible code from fpc on a  modern CPU (Intel Xeon) under Windows XP. I use the compiler options:
>  
> fpc -Mdelphi -O3 -OpPENTIUMM -Cfsse2 -Cr- -Co- -CO- -Ci- myprogram.dpr
>  
> Are there any better settings I should use? Compiling exactly the same  program with Delphi7 results in 50% faster code. I understand that fpc  does not support the most recent CPUs, but neither does DELPHI 7. So  what optimizations are done by Delphi 7 that are not done by fpc and why  is this the case?
>  
>  Stefan
> ___________________________________________________________
> Neu: WEB.DE De-Mail - Einfach wie E-Mail, sicher wie ein Brief!  
> Jetzt De-Mail-Adresse reservieren: https://produkte.web.de/go/demail02
> _______________________________________________
> fpc-pascal maillist  -  [hidden email]
> http://lists.freepascal.org/mailman/listinfo/fpc-pascal
_______________________________________________
fpc-pascal maillist  -  [hidden email]
http://lists.freepascal.org/mailman/listinfo/fpc-pascal
Reply | Threaded
Open this post in threaded view
|

Re: code optimization

luca_manganelli
[hidden email] scritti il 22/09/2010 08.08.45

> is this a benchmark program or a complex program you are talking about.
> If it is a benchmark, then it would be interesting to see the code,
> because from my experience I doubt that Delphi produces better code than
> fpc (in general it is the other way round). If it is a complex program,
> then you need to analyze carefully which part of the program consumes
> the most time and why. A number of 50% is in anyway unrealistic (this is
> something you get if you compare inlined code against uninlined) ,
> because the differences you get from code optimization are in a range
> from +/-10% normally - unless you have found a real performance
> bottleneck. And sometimes (most of) it is only an unoptimized library
code.


"We should forget about small efficiencies, say about 97% of the time:
premature optimization is the root of all evil"


Donald Knuth

_______________________________________________
fpc-pascal maillist  -  [hidden email]
http://lists.freepascal.org/mailman/listinfo/fpc-pascal
Reply | Threaded
Open this post in threaded view
|

Re: code optimization

stefan077
In reply to this post by Adrian Veith
Hi Adrian,

it is a scientific application that I have, which has about 2000 lines of code. By extracting the most time consuming routine I now have 360 lines of code. With this code I get the following runtime results:

optimized FPC pascal   *** is  58% SLOWER  ***  than optimized DELPHI 7
unoptimized FPC pascal *** is 60% SLOWER *** than optimized DELPHI 7
unoptimized Delphi 7  *** is 62% SLOWER *** than optimized DELPHI 7

Thus it looks like FPC pascal is doing very bad on optimizing the code.
I agree, that I also have seen examples where FPC pascal code is about 10% faster than Delphi code.
So why does FPC pascal fail on this code?

I have included the code below. I compiled it with Delphi 7 , optimization on, range checking off, stack checking off, I/O checking off.
For FPC pascal I used the compiler options:  -Mdelphi -O3 -OpPENTIUMM -Cfsse2 -Cr- -Co- -CO- -Ci-
The FPC compiler version is 2.4.0, I run under Windows XP.

any suggestions?
Stefan


-----Ursprüngliche Nachricht-----
Von: "Adrian Veith" <[hidden email]>
Gesendet: 22.09.2010 08:08:45
An: "FPC-Pascal users discussions" <[hidden email]>
Betreff: Re: [fpc-pascal] code optimization

> Hi Stefan,
>
>is this a benchmark program or a complex program you are talking about.
>If it is a benchmark, then it would be interesting to see the code,
>because from my experience I doubt that Delphi produces better code than
>fpc (in general it is the other way round). If it is a complex program,
>then you need to analyze carefully which part of the program consumes
>the most time and why. A number of 50% is in anyway unrealistic (this is
>something you get if you compare inlined code against uninlined) ,
>because the differences you get from code optimization are in a range
>from +/-10% normally - unless you have found a real performance
>bottleneck. And sometimes (most of) it is only an unoptimized library code.
>
>Adrian.
>

Here is the code (I was not able to cut it down in size further):

program P;

{$APPTYPE CONSOLE}


uses
 Math,
 SysUtils,
 Windows;


type
 TRectangle       =  object
 width   : Integer;
 Area    : Int64;
 NextUnusedRectangle: Integer;
 end;

 TRectangleArray  = Array of TRectangle;


var
 UnsortedRectangle: TRectangleArray;
 NumRectangles    : Integer;
 BoundingBox      : TRectangle;



function GapAlgorithm: Boolean;

 type
 TBarEntry = record
 Height: Integer;
 Width : Integer;
 Start : Integer;
 Next  : Integer;
 Prev  : Integer;
 end;

 var
 SolutionFound    : Boolean;
 Bar              : Array of TBarEntry;
 FreeBarEntry     : Array of Integer;
 NumFreeBarEntries: Integer;
 NumIterations    : Int64;


 procedure  ExtendPartialSolution (NumPlacedRectangles, FirstUnusedRectangle: Integer);
 type
 TBarCase = (BarCase1, BarCase2, BarCase3, BarCase4, BarCase5, BarCase6);

 var
 i, MinValley, MinValleyWidth: Integer;
 PrevBar, NextBar            : Integer;
 RectWidth                   : Integer;
 BarCase                     : TBarCase;
 NextBarWidth                : Integer;
 NewEntry, NewEntry2         : Integer;
 MinValleyArea               : Int64;
 MinValleyHeight             : Integer;
 TotalAreaOfFittingRectangles: Int64;
 CurrentRectangle            : Integer;
 PreviousRectangle           : Integer;
 OldFirstUnusedRectangle     : Integer;
 OldPrevNextRectangle        : Integer;

 begin

 if NumPlacedRectangles = NumRectangles
 then begin
 writeln ('Solution found');
 SolutionFound := true;
 exit;
 end
 else begin
 inc (NumIterations);

 MinValleyWidth := BoundingBox.Width+1;
 PrevBar := 1;
 i       := Bar[PrevBar].Next;
 NextBar := Bar[i].Next;
 while NextBar <> 0 do begin
 if (Bar[i].Width        < MinValleyWidth) and
 (Bar[PrevBar].Height > Bar[i].Height) and
 (Bar[NextBar].Height > Bar[i].Height)
 then begin
 MinValleyWidth  := Bar[i].Width;
 MinValley       := i;
 end;
 PrevBar := i;
 i       := NextBar;
 NextBar := Bar[i].Next;
 end;

 MinValleyHeight := min(Bar[Bar[MinValley].Prev].Height, Bar[Bar[MinValley].Next].Height)- Bar[MinValley].Height;
 MinValleyArea   := int64(MinValleyHeight) * int64(MinValleyWidth);

 if MinValleyWidth < BoundingBox.Width
 then begin

 TotalAreaOfFittingRectangles := 0;
 CurrentRectangle := FirstUnusedRectangle;
 while CurrentRectangle <> 0 do begin
 i := CurrentRectangle;
 if (UnsortedRectangle[i].Width <= MinValleyWidth)
 then inc (TotalAreaOfFittingRectangles, UnsortedRectangle[i].Area);
 CurrentRectangle := UnsortedRectangle[CurrentRectangle].NextUnusedRectangle
 end;

 if TotalAreaOfFittingRectangles < MinValleyArea
 then exit;
 end;


 PreviousRectangle := 0;
 CurrentRectangle  := FirstUnusedRectangle;
 while CurrentRectangle <> 0 do begin
 i := CurrentRectangle;
 if (UnsortedRectangle[i].width <= MinValleyWidth)
 and (UnsortedRectangle[i].Width + Bar[MinValley].Height <= BoundingBox.Width)
 then begin
 OldFirstUnusedRectangle := FirstUnusedRectangle;
 OldPrevNextRectangle    := UnsortedRectangle[PreviousRectangle].NextUnusedRectangle;
 if CurrentRectangle = FirstUnusedRectangle
 then begin
 FirstUnusedRectangle := UnsortedRectangle[CurrentRectangle].NextUnusedRectangle;
 end
 else begin
 UnsortedRectangle[PreviousRectangle].NextUnusedRectangle := UnsortedRectangle[CurrentRectangle].NextUnusedRectangle
 end;

 PrevBar    := Bar[MinValley].Prev;
 NextBar    := Bar[MinValley].Next;
 RectWidth  := UnsortedRectangle[i].Width;

 if MinValleyWidth = RectWidth
 then begin
 if Bar[PrevBar].Height = Bar[MinValley].Height + RectWidth
 then begin
 if Bar[NextBar].Height = Bar[MinValley].Height + RectWidth
 then begin
 BarCase := BarCase3;
 NextBarWidth := Bar[NextBar].Width;
 inc (Bar[PrevBar].Width, RectWidth + NextBarWidth);
 Bar[PrevBar].Next           := Bar[NextBar].Next;
 Bar[Bar[NextBar].Next].Prev := PrevBar;
 Inc (NumFreeBarEntries);
 FreeBarEntry[NumFreeBarEntries] := NextBar;
 Inc (NumFreeBarEntries);
 FreeBarEntry[NumFreeBarEntries] := MinValley;
 end
 else begin
 BarCase := BarCase4;
 inc (Bar[PrevBar].Width, RectWidth);
 Bar[PrevBar].Next := NextBar;
 Bar[NextBar].Prev := PrevBar;
 Inc (NumFreeBarEntries);
 FreeBarEntry[NumFreeBarEntries] := MinValley;
 end
 end
 else begin
 if Bar[NextBar].Height = Bar[MinValley].Height + RectWidth
 then begin
 BarCase := BarCase5;
 inc (Bar[NextBar].Width, RectWidth);
 dec (Bar[NextBar].Start, RectWidth);
 Bar[PrevBar].Next := NextBar;
 Bar[NextBar].Prev := PrevBar;
 Inc (NumFreeBarEntries);
 FreeBarEntry[NumFreeBarEntries] := MinValley;
 end
 else begin
 BarCase := BarCase6;
 inc (Bar[MinValley].Height, RectWidth);
 end
 end
 end
 else begin
 if Bar[PrevBar].Height = Bar[MinValley].Height + RectWidth
 then begin
 BarCase := BarCase1;
 inc (Bar[PrevBar].Width, RectWidth);
 dec (Bar[MinValley].Width, RectWidth);
 inc (Bar[MinValley].Start, RectWidth);
 end
 else begin
 BarCase := BarCase2;
 NewEntry := FreeBarEntry[NumFreeBarEntries];
 dec (NumFreeBarEntries);
 Bar[PrevBar].Next    := NewEntry;
 Bar[MinValley].Prev := NewEntry;
 dec (Bar[MinValley].Width, RectWidth);
 inc (Bar[MinValley].Start, RectWidth);
 Bar[NewEntry].Height := Bar[MinValley].Height + RectWidth;
 Bar[NewEntry].Width  := RectWidth;
 Bar[NewEntry].Start  := Bar[MinValley].Start - RectWidth;
 Bar[NewEntry].Prev   := PrevBar;
 Bar[NewEntry].Next   := MinValley;
 end
 end;



 ExtendPartialSolution (NumPlacedRectangles+1, FirstUnusedRectangle);

 if SolutionFound then exit;

 case BarCase of
 BarCase1: begin
 dec (Bar[PrevBar].Width, RectWidth);
 inc (Bar[MinValley].Width, RectWidth);
 dec (Bar[MinValley].Start, RectWidth);
 end;
 BarCase2: begin
 Bar[PrevBar].Next    := MinValley;
 Bar[MinValley].Prev  := PrevBar;
 inc (Bar[MinValley].Width, RectWidth);
 dec (Bar[MinValley].Start, RectWidth);
 inc (NumFreeBarEntries);
 FreeBarEntry[NumFreeBarEntries] := NewEntry;
 end;
 BarCase3: begin
 dec (Bar[PrevBar].Width, RectWidth + NextBarWidth);
 NewEntry := FreeBarEntry[NumFreeBarEntries];
 dec (NumFreeBarEntries);
 NewEntry2 := FreeBarEntry[NumFreeBarEntries];
 dec (NumFreeBarEntries);
 Bar[NewEntry ].Height := Bar[PrevBar].Height - RectWidth;
 Bar[NewEntry ].Width  := RectWidth;
 Bar[NewEntry ].Start  := Bar[PrevBar].Start + Bar[PrevBar].Width;
 Bar[NewEntry ].Prev   := PrevBar;
 Bar[NewEntry ].Next   := NewEntry2;
 Bar[NewEntry2].Height := Bar[PrevBar].Height;
 Bar[NewEntry2].Width  := NextBarWidth;
 Bar[NewEntry2].Start  := Bar[NewEntry].Start + RectWidth;
 Bar[NewEntry2].Prev   := NewEntry;
 Bar[NewEntry2].Next   := Bar[PrevBar].Next;
 Bar[Bar[PrevBar].Next].Prev := NewEntry2;
 Bar[PrevBar].Next     := NewEntry;
 end;
 BarCase4: begin
 dec (Bar[PrevBar].Width, RectWidth);
 NewEntry := FreeBarEntry[NumFreeBarEntries];
 dec (NumFreeBarEntries);
 Bar[NewEntry].Height := Bar[PrevBar].Height - RectWidth;
 Bar[NewEntry].Width  := RectWidth;
 Bar[NewEntry].Start  := Bar[PrevBar].Start + Bar[PrevBar].Width;
 Bar[NewEntry].Prev   := PrevBar;
 Bar[NewEntry].Next   := NextBar;
 Bar[PrevBar].Next    := NewEntry;
 Bar[NextBar].Prev    := NewEntry;
 end;
 BarCase5: begin
 dec (Bar[NextBar].Width, RectWidth);
 inc (Bar[NextBar].Start, RectWidth);
 NewEntry := FreeBarEntry[NumFreeBarEntries];
 dec (NumFreeBarEntries);
 Bar[NewEntry].Height := Bar[NextBar].Height - RectWidth;
 Bar[NewEntry].Width  := RectWidth;
 Bar[NewEntry ].Start := Bar[NextBar].Start - RectWidth;
 Bar[NewEntry].Prev   := PrevBar;
 Bar[NewEntry].Next   := NextBar;
 Bar[PrevBar].Next    := NewEntry;
 Bar[NextBar].Prev    := NewEntry;
 end;
 BarCase6: begin
 dec (Bar[MinValley].Height, RectWidth);
 end;
 end;

 FirstUnusedRectangle := OldFirstUnusedRectangle;
 UnsortedRectangle[PreviousRectangle].NextUnusedRectangle := OldPrevNextRectangle ;
 end;

 PreviousRectangle := CurrentRectangle;
 CurrentRectangle := UnsortedRectangle[CurrentRectangle].NextUnusedRectangle;
 end;
 end;
 end;


 var
 i: integer;

 begin
 result := true;

 SetLength (Bar, NumRectangles + 3);
 Bar[1].Height  := BoundingBox.Width + 1;
 Bar[1].Width   := 1;
 Bar[1].Next    := 2;
 Bar[1].Prev    := 0;
 Bar[1].Start   := -1;

 Bar[2].Height  := 0;
 Bar[2].Width   := BoundingBox.Width;
 Bar[2].Next    := 3;
 Bar[2].Prev    := 1;
 Bar[2].Start   := 0;

 Bar[3].Height  := Bar[1].Height;
 Bar[3].Width   := 1;
 Bar[3].Next    := 0;
 Bar[3].Prev    := 2;
 Bar[3].Start   := Bar[2].Width;

 SetLength (FreeBarEntry, NumRectangles + 1);
 NumFreeBarEntries := 0;
 for i := NumRectangles + 2 downto 4 do begin
 inc (NumFreeBarEntries);
 FreeBarEntry [NumFreeBarEntries] := i;
 end;

 for i:=1 to NumRectangles-1 do
 UnsortedRectangle[i].NextUnusedRectangle := i+1;
 UnsortedRectangle[NumRectangles].NextUnusedRectangle := 0;

 NumIterations :=0;
 SolutionFound := false;
 ExtendPartialSolution (0, 1);
 result := SolutionFound;
 writeln ('# iterations: ', NumIterations:12);
 end;



var
 i: integer;
 StartTime: TDateTime;

begin
 StartTime := now;
 BoundingBox.Width := 70;
 BoundingBox.Area  := int64(BoundingBox.Width) * int64(BoundingBox.Width);
 NumRectangles     := 24;
 SetLength (UnsortedRectangle, NumRectangles + 5);
 for i:=1 to NumRectangles do begin
 UnsortedRectangle[i].Width     := i;
 UnsortedRectangle[i].Area      := int64(i) * int64(i);
 end;

 if GapAlgorithm
 then writeln ('solution found')
 else writeln ('no solution found');
 writeln ('runtime: ', (Now-StartTime)*3600*24:8:2, 's');
end.
___________________________________________________________
GRATIS: Spider-Man 1-3 sowie 300 weitere Videos!
Jetzt kostenlose Movie-FLAT freischalten! http://movieflat.web.de
_______________________________________________
fpc-pascal maillist  -  [hidden email]
http://lists.freepascal.org/mailman/listinfo/fpc-pascal
Reply | Threaded
Open this post in threaded view
|

Re: code optimization

Jonas Maebe-2

On 22 Sep 2010, at 16:08, [hidden email] wrote:

> Thus it looks like FPC pascal is doing very bad on optimizing the  
> code.
> I agree, that I also have seen examples where FPC pascal code is  
> about 10% faster than Delphi code.
> So why does FPC pascal fail on this code?

At first sight it looks like FPC makes less efficient use of the  
available registers. Other than from the machine code, it's also  
indicated by the program being just as fast on my PowerMac G5/1.8GHz  
as on an AMD Opteron 2356 at 2.3GHz (normally pretty much any  
reasonably modern non-netbook-class x86 trounces my 6 year old Mac  
regardless of the clock speed and of what you throw at them :).  
Interestingly, the program becomes about 20% slower when compiled by  
FPC for x86-64, which is odd because more registers are available  
there (only half as many as on the PowerPC, but still...).

Delphi (Kylix) also uses a trick to avoid some multiplies with a small  
constant, but that has almost no influence on modern x86 cpus (the  
time of the FPC-compiled program barely changes if you manually apply  
that trick to the generated code).


Jonas
_______________________________________________
fpc-pascal maillist  -  [hidden email]
http://lists.freepascal.org/mailman/listinfo/fpc-pascal
Reply | Threaded
Open this post in threaded view
|

Re: code optimization

Schindler Karl-Michael-2
In reply to this post by stefan077
> Message: 6
> Date: Wed, 22 Sep 2010 16:08:37 +0200 (CEST)
> From: [hidden email]
> Subject: Re: [fpc-pascal] code optimization
> To: [hidden email]
> Message-ID:
> <1487431390.1512221.1285164517310.JavaMail.fmail@mwmweb065>
> Content-Type: text/plain; charset=UTF-8
>
> Hi Adrian,
>
> it is a scientific application that I have, which has about 2000 lines of code. By extracting the most time consuming routine I now have 360 lines of code. With this code I get the following runtime results:
>
> optimized FPC pascal   *** is  58% SLOWER  ***  than optimized DELPHI 7
> unoptimized FPC pascal *** is 60% SLOWER *** than optimized DELPHI 7
> unoptimized Delphi 7  *** is 62% SLOWER *** than optimized DELPHI 7
>
> Thus it looks like FPC pascal is doing very bad on optimizing the code.
> I agree, that I also have seen examples where FPC pascal code is about 10% faster than Delphi code.
> So why does FPC pascal fail on this code?
>
> I have included the code below. I compiled it with Delphi 7 , optimization on, range checking off, stack checking off, I/O checking off.
> For FPC pascal I used the compiler options: Â -Mdelphi -O3 -OpPENTIUMM -Cfsse2 -Cr- -Co- -CO- -Ci-
> The FPC compiler version is 2.4.0, I run under Windows XP.
>
> any suggestions?
> Stefan

My 2 cents:
looking at the code, i would assume that you can gain by using linked lists with pointers instead of arrays and working with the index. This would reduce the number of offset calculations. However, it means quite a rewrite. So, do you really need more speed?

Even the following primitive replace, which is not yet a conversion to a linked list already saved about 10% on Mac OS X:

      while CurrentRectangle <> 0 do
      begin
        i := CurrentRectangle;
        if (UnsortedRectangle[i].Width <= MinValleyWidth) then
          inc (TotalAreaOfFittingRectangles, UnsortedRectangle[i].Area);
        CurrentRectangle := UnsortedRectangle[CurrentRectangle].NextUnusedRectangle
      end;

-----

var
  ThisUnsortedRectangle: ^TRectangle;
...
      while CurrentRectangle <> 0 do
      begin
        ThisUnsortedRectangle := @UnsortedRectangle[CurrentRectangle];
        if (ThisUnsortedRectangle^.Width <= MinValleyWidth) then
          inc (TotalAreaOfFittingRectangles, ThisUnsortedRectangle^.Area);
        CurrentRectangle := ThisUnsortedRectangle^.NextUnusedRectangle
      end;

Btw. profiling indicates that this loop takes over 30% of cpu time. Skipping the initial assignment (i := CurrentRectangle;) made up for about 5%.

All the best

Michael Schindler_______________________________________________
fpc-pascal maillist  -  [hidden email]
http://lists.freepascal.org/mailman/listinfo/fpc-pascal
Reply | Threaded
Open this post in threaded view
|

Re: code optimization

stefan077
<[hidden email]> wrote:

>My 2 cents:
>looking at the code, i would assume that you can gain by using linked lists with pointers instead of arrays and working with the index. This would reduce the number of offset calculations. However, it means quite a rewrite. So, do you really need more speed?

My experience is that linked lists with pointers are much slower than linked lists realized by arrays. Allocating and dellocating memory is very time consuming. Even if you allocate all memory in advance using pointers easily results in cache misses. The additional offset calculation should not take any additional clock cycles on modern CPUs.
And yes, I need more speed, because the program I wrote is going to run several CPU-years. It will run in the end on several Intel Xeons in parallel, but still it means that the runtime is a few months. So even 10% speedup can save me a week or more.
 

>
>Even the following primitive replace, which is not yet a conversion to a linked list already saved about 10% on Mac OS X:
>
>      while CurrentRectangle <> 0 do
>      begin
>        i := CurrentRectangle;
>        if (UnsortedRectangle[i].Width <= MinValleyWidth) then
>          inc (TotalAreaOfFittingRectangles, UnsortedRectangle[i].Area);
>        CurrentRectangle := UnsortedRectangle[CurrentRectangle].NextUnusedRectangle
>      end;
>
>-----
>
>var
>  ThisUnsortedRectangle: ^TRectangle;
>...
>      while CurrentRectangle <> 0 do
>      begin
>        ThisUnsortedRectangle := @UnsortedRectangle[CurrentRectangle];
> if (ThisUnsortedRectangle^.Width <= MinValleyWidth) then
>          inc (TotalAreaOfFittingRectangles, ThisUnsortedRectangle^.Area);
>        CurrentRectangle := ThisUnsortedRectangle^.NextUnusedRectangle
>      end;

I see almost no difference if I compile with Delphi 7 under Windows XP. But actually you hit the point because for these few lines FPC-pascal seems to have great problems with optimization.

>
>Btw. profiling indicates that this loop takes over 30% of cpu time. Skipping the initial assignment (i := CurrentRectangle;) made up for about 5%.

Not for Delphi 7. So Delphi 7 seems to eliminate the first assignment while FPC-pascal seems not to do so.

BTW, the code that I sent was not intended to be a speed optimized code. It was just some code that shows that FPC-pascal can generate code that is more than 50% slower than Delphi 7 code.

Stefan
_______________________________________________
fpc-pascal maillist  -  [hidden email]
http://lists.freepascal.org/mailman/listinfo/fpc-pascal
Reply | Threaded
Open this post in threaded view
|

Re: code optimization

Adrian Veith
In reply to this post by stefan077
 Hi Stefan,

I analyzed your code - I think the problem is the array element address
calculation of the fpc compiler. You have a lot of code like
Bar[MinValley] etc. The delphi compile uses the lea assembler code for
this, whereas fpc calculates the address of one element with imul  which
is much slower. Anyway you could speed up your code significantly if you
help the compiler by reducing the address calculations with the help of
pointers like this:

MinValley:= ...
PBarMinValley:= @Bar[MinValley]

and replace any following Bar[MinValley].FOO with PBarMinValley.FOO

and the same with any other index.

This will speed up the code for Delphi and fpc as well, because both
compilers are not smart enough the see the unnecessary repeated address
calculations.

Cheers,

Adrian.



On 22.09.2010 16:08, [hidden email] wrote:

> Hi Adrian,
>
> it is a scientific application that I have, which has about 2000 lines of code. By extracting the most time consuming routine I now have 360 lines of code. With this code I get the following runtime results:
>
> optimized FPC pascal   *** is  58% SLOWER  ***  than optimized DELPHI 7
> unoptimized FPC pascal *** is 60% SLOWER *** than optimized DELPHI 7
> unoptimized Delphi 7  *** is 62% SLOWER *** than optimized DELPHI 7
>
> Thus it looks like FPC pascal is doing very bad on optimizing the code.
> I agree, that I also have seen examples where FPC pascal code is about 10% faster than Delphi code.
> So why does FPC pascal fail on this code?
>
> I have included the code below. I compiled it with Delphi 7 , optimization on, range checking off, stack checking off, I/O checking off.
> For FPC pascal I used the compiler options:  -Mdelphi -O3 -OpPENTIUMM -Cfsse2 -Cr- -Co- -CO- -Ci-
> The FPC compiler version is 2.4.0, I run under Windows XP.
>
> any suggestions?
> Stefan
>
>
> -----Ursprüngliche Nachricht-----
> Von: "Adrian Veith" <[hidden email]>
> Gesendet: 22.09.2010 08:08:45
> An: "FPC-Pascal users discussions" <[hidden email]>
> Betreff: Re: [fpc-pascal] code optimization
>
>> Hi Stefan,
>>
>> is this a benchmark program or a complex program you are talking about.
>> If it is a benchmark, then it would be interesting to see the code,
>> because from my experience I doubt that Delphi produces better code than
>> fpc (in general it is the other way round). If it is a complex program,
>> then you need to analyze carefully which part of the program consumes
>> the most time and why. A number of 50% is in anyway unrealistic (this is
>> something you get if you compare inlined code against uninlined) ,
>> because the differences you get from code optimization are in a range
> >from +/-10% normally - unless you have found a real performance
>> bottleneck. And sometimes (most of) it is only an unoptimized library code.
>>
>> Adrian.
>>
> Here is the code (I was not able to cut it down in size further):
>
> program P;
>
> {$APPTYPE CONSOLE}
>
>
> uses
>  Math,
>  SysUtils,
>  Windows;
>
>
> type
>  TRectangle       =  object
>  width   : Integer;
>  Area    : Int64;
>  NextUnusedRectangle: Integer;
>  end;
>
>  TRectangleArray  = Array of TRectangle;
>
>
> var
>  UnsortedRectangle: TRectangleArray;
>  NumRectangles    : Integer;
>  BoundingBox      : TRectangle;
>
>
>
> function GapAlgorithm: Boolean;
>
>  type
>  TBarEntry = record
>  Height: Integer;
>  Width : Integer;
>  Start : Integer;
>  Next  : Integer;
>  Prev  : Integer;
>  end;
>
>  var
>  SolutionFound    : Boolean;
>  Bar              : Array of TBarEntry;
>  FreeBarEntry     : Array of Integer;
>  NumFreeBarEntries: Integer;
>  NumIterations    : Int64;
>
>
>  procedure  ExtendPartialSolution (NumPlacedRectangles, FirstUnusedRectangle: Integer);
>  type
>  TBarCase = (BarCase1, BarCase2, BarCase3, BarCase4, BarCase5, BarCase6);
>
>  var
>  i, MinValley, MinValleyWidth: Integer;
>  PrevBar, NextBar            : Integer;
>  RectWidth                   : Integer;
>  BarCase                     : TBarCase;
>  NextBarWidth                : Integer;
>  NewEntry, NewEntry2         : Integer;
>  MinValleyArea               : Int64;
>  MinValleyHeight             : Integer;
>  TotalAreaOfFittingRectangles: Int64;
>  CurrentRectangle            : Integer;
>  PreviousRectangle           : Integer;
>  OldFirstUnusedRectangle     : Integer;
>  OldPrevNextRectangle        : Integer;
>
>  begin
>
>  if NumPlacedRectangles = NumRectangles
>  then begin
>  writeln ('Solution found');
>  SolutionFound := true;
>  exit;
>  end
>  else begin
>  inc (NumIterations);
>
>  MinValleyWidth := BoundingBox.Width+1;
>  PrevBar := 1;
>  i       := Bar[PrevBar].Next;
>  NextBar := Bar[i].Next;
>  while NextBar <> 0 do begin
>  if (Bar[i].Width        < MinValleyWidth) and
>  (Bar[PrevBar].Height > Bar[i].Height) and
>  (Bar[NextBar].Height > Bar[i].Height)
>  then begin
>  MinValleyWidth  := Bar[i].Width;
>  MinValley       := i;
>  end;
>  PrevBar := i;
>  i       := NextBar;
>  NextBar := Bar[i].Next;
>  end;
>
>  MinValleyHeight := min(Bar[Bar[MinValley].Prev].Height, Bar[Bar[MinValley].Next].Height)- Bar[MinValley].Height;
>  MinValleyArea   := int64(MinValleyHeight) * int64(MinValleyWidth);
>
>  if MinValleyWidth < BoundingBox.Width
>  then begin
>
>  TotalAreaOfFittingRectangles := 0;
>  CurrentRectangle := FirstUnusedRectangle;
>  while CurrentRectangle <> 0 do begin
>  i := CurrentRectangle;
>  if (UnsortedRectangle[i].Width <= MinValleyWidth)
>  then inc (TotalAreaOfFittingRectangles, UnsortedRectangle[i].Area);
>  CurrentRectangle := UnsortedRectangle[CurrentRectangle].NextUnusedRectangle
>  end;
>
>  if TotalAreaOfFittingRectangles < MinValleyArea
>  then exit;
>  end;
>
>
>  PreviousRectangle := 0;
>  CurrentRectangle  := FirstUnusedRectangle;
>  while CurrentRectangle <> 0 do begin
>  i := CurrentRectangle;
>  if (UnsortedRectangle[i].width <= MinValleyWidth)
>  and (UnsortedRectangle[i].Width + Bar[MinValley].Height <= BoundingBox.Width)
>  then begin
>  OldFirstUnusedRectangle := FirstUnusedRectangle;
>  OldPrevNextRectangle    := UnsortedRectangle[PreviousRectangle].NextUnusedRectangle;
>  if CurrentRectangle = FirstUnusedRectangle
>  then begin
>  FirstUnusedRectangle := UnsortedRectangle[CurrentRectangle].NextUnusedRectangle;
>  end
>  else begin
>  UnsortedRectangle[PreviousRectangle].NextUnusedRectangle := UnsortedRectangle[CurrentRectangle].NextUnusedRectangle
>  end;
>
>  PrevBar    := Bar[MinValley].Prev;
>  NextBar    := Bar[MinValley].Next;
>  RectWidth  := UnsortedRectangle[i].Width;
>
>  if MinValleyWidth = RectWidth
>  then begin
>  if Bar[PrevBar].Height = Bar[MinValley].Height + RectWidth
>  then begin
>  if Bar[NextBar].Height = Bar[MinValley].Height + RectWidth
>  then begin
>  BarCase := BarCase3;
>  NextBarWidth := Bar[NextBar].Width;
>  inc (Bar[PrevBar].Width, RectWidth + NextBarWidth);
>  Bar[PrevBar].Next           := Bar[NextBar].Next;
>  Bar[Bar[NextBar].Next].Prev := PrevBar;
>  Inc (NumFreeBarEntries);
>  FreeBarEntry[NumFreeBarEntries] := NextBar;
>  Inc (NumFreeBarEntries);
>  FreeBarEntry[NumFreeBarEntries] := MinValley;
>  end
>  else begin
>  BarCase := BarCase4;
>  inc (Bar[PrevBar].Width, RectWidth);
>  Bar[PrevBar].Next := NextBar;
>  Bar[NextBar].Prev := PrevBar;
>  Inc (NumFreeBarEntries);
>  FreeBarEntry[NumFreeBarEntries] := MinValley;
>  end
>  end
>  else begin
>  if Bar[NextBar].Height = Bar[MinValley].Height + RectWidth
>  then begin
>  BarCase := BarCase5;
>  inc (Bar[NextBar].Width, RectWidth);
>  dec (Bar[NextBar].Start, RectWidth);
>  Bar[PrevBar].Next := NextBar;
>  Bar[NextBar].Prev := PrevBar;
>  Inc (NumFreeBarEntries);
>  FreeBarEntry[NumFreeBarEntries] := MinValley;
>  end
>  else begin
>  BarCase := BarCase6;
>  inc (Bar[MinValley].Height, RectWidth);
>  end
>  end
>  end
>  else begin
>  if Bar[PrevBar].Height = Bar[MinValley].Height + RectWidth
>  then begin
>  BarCase := BarCase1;
>  inc (Bar[PrevBar].Width, RectWidth);
>  dec (Bar[MinValley].Width, RectWidth);
>  inc (Bar[MinValley].Start, RectWidth);
>  end
>  else begin
>  BarCase := BarCase2;
>  NewEntry := FreeBarEntry[NumFreeBarEntries];
>  dec (NumFreeBarEntries);
>  Bar[PrevBar].Next    := NewEntry;
>  Bar[MinValley].Prev := NewEntry;
>  dec (Bar[MinValley].Width, RectWidth);
>  inc (Bar[MinValley].Start, RectWidth);
>  Bar[NewEntry].Height := Bar[MinValley].Height + RectWidth;
>  Bar[NewEntry].Width  := RectWidth;
>  Bar[NewEntry].Start  := Bar[MinValley].Start - RectWidth;
>  Bar[NewEntry].Prev   := PrevBar;
>  Bar[NewEntry].Next   := MinValley;
>  end
>  end;
>
>
>
>  ExtendPartialSolution (NumPlacedRectangles+1, FirstUnusedRectangle);
>
>  if SolutionFound then exit;
>
>  case BarCase of
>  BarCase1: begin
>  dec (Bar[PrevBar].Width, RectWidth);
>  inc (Bar[MinValley].Width, RectWidth);
>  dec (Bar[MinValley].Start, RectWidth);
>  end;
>  BarCase2: begin
>  Bar[PrevBar].Next    := MinValley;
>  Bar[MinValley].Prev  := PrevBar;
>  inc (Bar[MinValley].Width, RectWidth);
>  dec (Bar[MinValley].Start, RectWidth);
>  inc (NumFreeBarEntries);
>  FreeBarEntry[NumFreeBarEntries] := NewEntry;
>  end;
>  BarCase3: begin
>  dec (Bar[PrevBar].Width, RectWidth + NextBarWidth);
>  NewEntry := FreeBarEntry[NumFreeBarEntries];
>  dec (NumFreeBarEntries);
>  NewEntry2 := FreeBarEntry[NumFreeBarEntries];
>  dec (NumFreeBarEntries);
>  Bar[NewEntry ].Height := Bar[PrevBar].Height - RectWidth;
>  Bar[NewEntry ].Width  := RectWidth;
>  Bar[NewEntry ].Start  := Bar[PrevBar].Start + Bar[PrevBar].Width;
>  Bar[NewEntry ].Prev   := PrevBar;
>  Bar[NewEntry ].Next   := NewEntry2;
>  Bar[NewEntry2].Height := Bar[PrevBar].Height;
>  Bar[NewEntry2].Width  := NextBarWidth;
>  Bar[NewEntry2].Start  := Bar[NewEntry].Start + RectWidth;
>  Bar[NewEntry2].Prev   := NewEntry;
>  Bar[NewEntry2].Next   := Bar[PrevBar].Next;
>  Bar[Bar[PrevBar].Next].Prev := NewEntry2;
>  Bar[PrevBar].Next     := NewEntry;
>  end;
>  BarCase4: begin
>  dec (Bar[PrevBar].Width, RectWidth);
>  NewEntry := FreeBarEntry[NumFreeBarEntries];
>  dec (NumFreeBarEntries);
>  Bar[NewEntry].Height := Bar[PrevBar].Height - RectWidth;
>  Bar[NewEntry].Width  := RectWidth;
>  Bar[NewEntry].Start  := Bar[PrevBar].Start + Bar[PrevBar].Width;
>  Bar[NewEntry].Prev   := PrevBar;
>  Bar[NewEntry].Next   := NextBar;
>  Bar[PrevBar].Next    := NewEntry;
>  Bar[NextBar].Prev    := NewEntry;
>  end;
>  BarCase5: begin
>  dec (Bar[NextBar].Width, RectWidth);
>  inc (Bar[NextBar].Start, RectWidth);
>  NewEntry := FreeBarEntry[NumFreeBarEntries];
>  dec (NumFreeBarEntries);
>  Bar[NewEntry].Height := Bar[NextBar].Height - RectWidth;
>  Bar[NewEntry].Width  := RectWidth;
>  Bar[NewEntry ].Start := Bar[NextBar].Start - RectWidth;
>  Bar[NewEntry].Prev   := PrevBar;
>  Bar[NewEntry].Next   := NextBar;
>  Bar[PrevBar].Next    := NewEntry;
>  Bar[NextBar].Prev    := NewEntry;
>  end;
>  BarCase6: begin
>  dec (Bar[MinValley].Height, RectWidth);
>  end;
>  end;
>
>  FirstUnusedRectangle := OldFirstUnusedRectangle;
>  UnsortedRectangle[PreviousRectangle].NextUnusedRectangle := OldPrevNextRectangle ;
>  end;
>
>  PreviousRectangle := CurrentRectangle;
>  CurrentRectangle := UnsortedRectangle[CurrentRectangle].NextUnusedRectangle;
>  end;
>  end;
>  end;
>
>
>  var
>  i: integer;
>
>  begin
>  result := true;
>
>  SetLength (Bar, NumRectangles + 3);
>  Bar[1].Height  := BoundingBox.Width + 1;
>  Bar[1].Width   := 1;
>  Bar[1].Next    := 2;
>  Bar[1].Prev    := 0;
>  Bar[1].Start   := -1;
>
>  Bar[2].Height  := 0;
>  Bar[2].Width   := BoundingBox.Width;
>  Bar[2].Next    := 3;
>  Bar[2].Prev    := 1;
>  Bar[2].Start   := 0;
>
>  Bar[3].Height  := Bar[1].Height;
>  Bar[3].Width   := 1;
>  Bar[3].Next    := 0;
>  Bar[3].Prev    := 2;
>  Bar[3].Start   := Bar[2].Width;
>
>  SetLength (FreeBarEntry, NumRectangles + 1);
>  NumFreeBarEntries := 0;
>  for i := NumRectangles + 2 downto 4 do begin
>  inc (NumFreeBarEntries);
>  FreeBarEntry [NumFreeBarEntries] := i;
>  end;
>
>  for i:=1 to NumRectangles-1 do
>  UnsortedRectangle[i].NextUnusedRectangle := i+1;
>  UnsortedRectangle[NumRectangles].NextUnusedRectangle := 0;
>
>  NumIterations :=0;
>  SolutionFound := false;
>  ExtendPartialSolution (0, 1);
>  result := SolutionFound;
>  writeln ('# iterations: ', NumIterations:12);
>  end;
>
>
>
> var
>  i: integer;
>  StartTime: TDateTime;
>
> begin
>  StartTime := now;
>  BoundingBox.Width := 70;
>  BoundingBox.Area  := int64(BoundingBox.Width) * int64(BoundingBox.Width);
>  NumRectangles     := 24;
>  SetLength (UnsortedRectangle, NumRectangles + 5);
>  for i:=1 to NumRectangles do begin
>  UnsortedRectangle[i].Width     := i;
>  UnsortedRectangle[i].Area      := int64(i) * int64(i);
>  end;
>
>  if GapAlgorithm
>  then writeln ('solution found')
>  else writeln ('no solution found');
>  writeln ('runtime: ', (Now-StartTime)*3600*24:8:2, 's');
> end.
> ___________________________________________________________
> GRATIS: Spider-Man 1-3 sowie 300 weitere Videos!
> Jetzt kostenlose Movie-FLAT freischalten! http://movieflat.web.de
_______________________________________________
fpc-pascal maillist  -  [hidden email]
http://lists.freepascal.org/mailman/listinfo/fpc-pascal
Reply | Threaded
Open this post in threaded view
|

Re: code optimization

Jonas Maebe-2

On 23 Sep 2010, at 16:59, Adrian Veith wrote:

> I analyzed your code - I think the problem is the array element  
> address
> calculation of the fpc compiler. You have a lot of code like
> Bar[MinValley] etc. The delphi compile uses the lea assembler code for
> this, whereas fpc calculates the address of one element with imul  
> which
> is much slower.

Please see the last paragraph of http://lists.freepascal.org/lists/fpc-pascal/2010-September/026510.html

> Anyway you could speed up your code significantly if you
> help the compiler by reducing the address calculations with the help  
> of
> pointers like this:

It may help a lot, but only because it will reduce register pressure,  
not because the multiplications are gone.


Jonas
_______________________________________________
fpc-pascal maillist  -  [hidden email]
http://lists.freepascal.org/mailman/listinfo/fpc-pascal
Reply | Threaded
Open this post in threaded view
|

Re: code optimization

stefan077
In reply to this post by stefan077
> Eduardo <[hidden email]> wrote:

>
>Can you try optimize for size? In some cases, it reduces L2 / L3 cache
>miss and runs faster than O3. It happens in other compilers and
>languages too.
>
I just tried it: The code gets even slightly larger and much slower (almost a factor of 2).
___________________________________________________________
GRATIS: Spider-Man 1-3 sowie 300 weitere Videos!
Jetzt kostenlose Movie-FLAT freischalten! http://movieflat.web.de
_______________________________________________
fpc-pascal maillist  -  [hidden email]
http://lists.freepascal.org/mailman/listinfo/fpc-pascal
Reply | Threaded
Open this post in threaded view
|

Re: code optimization

Paulo Costa
On 23/09/2010 18:09, [hidden email] wrote:
>> Eduardo<[hidden email]>  wrote:
>
>>
>> Can you try optimize for size? In some cases, it reduces L2 / L3 cache
>> miss and runs faster than O3. It happens in other compilers and
>> languages too.
>>
> I just tried it: The code gets even slightly larger and much slower (almost a factor of 2).

Hi,

Try -Os3 instead of -O3

and try to reorder

type
  TRectangle       =  object
  width   : Integer;
  Area    : Int64;
  NextUnusedRectangle: Integer;

to

type
  TRectangle       =  object
  width   : Integer;
  NextUnusedRectangle: Integer;
  Area    : Int64;

and the var declaration:

  var
  i, MinValley, MinValleyWidth: Integer;
  PrevBar, NextBar            : Integer;
  RectWidth                   : Integer;
  NextBarWidth                : Integer;
  NewEntry, NewEntry2         : Integer;
  MinValleyHeight             : Integer;
  MinValleyArea               : Int64;
  TotalAreaOfFittingRectangles: Int64;
  CurrentRectangle            : Integer;
  PreviousRectangle           : Integer;
  OldFirstUnusedRectangle     : Integer;
  BarCase                     : TBarCase;
  OldPrevNextRectangle        : Integer;

It seems a matter of alignment. I tested and could extract a 5% speed
increase (XP and fpc 2.2.4). With 2.4.0 it may not work like that but is
worth a try.

Paulo Costa
_______________________________________________
fpc-pascal maillist  -  [hidden email]
http://lists.freepascal.org/mailman/listinfo/fpc-pascal
Reply | Threaded
Open this post in threaded view
|

Re: code optimization

Adrian Veith
In reply to this post by Jonas Maebe-2


On 23.09.2010 17:03, Jonas Maebe wrote:

>
> On 23 Sep 2010, at 16:59, Adrian Veith wrote:
>
>> I analyzed your code - I think the problem is the array element address
>> calculation of the fpc compiler. You have a lot of code like
>> Bar[MinValley] etc. The delphi compile uses the lea assembler code for
>> this, whereas fpc calculates the address of one element with imul  which
>> is much slower.
>
> Please see the last paragraph of
> http://lists.freepascal.org/lists/fpc-pascal/2010-September/026510.html
>
>> Anyway you could speed up your code significantly if you
>> help the compiler by reducing the address calculations with the help of
>> pointers like this:
>
> It may help a lot, but only because it will reduce register pressure,
> not because the multiplications are gone.

It reduces the total number of multiplications about 70% - I gave the
code to one of my guys and he changed the code using pointers to
elements wherever possible. This are the differences:

fpc - original code: 17s
fpc - pointer to elements: 12 s
delphi - original code: 9s

tested on a AMD notebook.

We haven't tested the new code with delphi yet, but the benefits should
be marginal compared to fpc.

As a conclusion one can say, that fpc's array arithmetic is suboptimal.

Cheers,

Adrian
_______________________________________________
fpc-pascal maillist  -  [hidden email]
http://lists.freepascal.org/mailman/listinfo/fpc-pascal
Reply | Threaded
Open this post in threaded view
|

Re: code optimization

Jonas Maebe-2

On 24 Sep 2010, at 08:06, Adrian Veith wrote:

> On 23.09.2010 17:03, Jonas Maebe wrote:
>>
>> It may help a lot, but only because it will reduce register pressure,
>> not because the multiplications are gone.
>
> It reduces the total number of multiplications about 70% - I gave the
> code to one of my guys and he changed the code using pointers to
> elements wherever possible.

Note the above: he changed the code using pointers. He did not replace  
the multiplications with "lea" instructions. Simply replacing the  
multiplications with lea instructions has almost no influence on the  
speed, as I mentioned twice before (unless you have a very old x86).  
Multiplications, especially with small numbers, are very fast on  
modern processors.

> This are the differences:
>
> fpc - original code: 17s
> fpc - pointer to elements: 12 s
> delphi - original code: 9s

Yes, because as I mentioned above, this reduces the register pressure.

> As a conclusion one can say, that fpc's array arithmetic is  
> suboptimal.

You can conclude that, but that is virtually completely unrelated to  
the speed of this example program.


Jonas
_______________________________________________
fpc-pascal maillist  -  [hidden email]
http://lists.freepascal.org/mailman/listinfo/fpc-pascal
Reply | Threaded
Open this post in threaded view
|

Re: code optimization

Adrian Veith


On 24.09.2010 10:37, Jonas Maebe wrote:

>
> On 24 Sep 2010, at 08:06, Adrian Veith wrote:
>
>> On 23.09.2010 17:03, Jonas Maebe wrote:
>>>
>>> It may help a lot, but only because it will reduce register pressure,
>>> not because the multiplications are gone.
>>
>> It reduces the total number of multiplications about 70% - I gave the
>> code to one of my guys and he changed the code using pointers to
>> elements wherever possible.
>
> Note the above: he changed the code using pointers. He did not replace
> the multiplications with "lea" instructions. Simply replacing the
> multiplications with lea instructions has almost no influence on the
> speed, as I mentioned twice before (unless you have a very old x86).
> Multiplications, especially with small numbers, are very fast on
> modern processors.

Changing to pointers reduces the amount of multiplications for accessing
the nth element in an array - if you compare the delphi code to th fpc
code on assembler base, this is the main difference in both generated
codes. Register allocation is on a comparable level for both versions.
>
>> This are the differences:
>>
>> fpc - original code: 17s
>> fpc - pointer to elements: 12 s
>> delphi - original code: 9s
>

we optimized the code further and eliminated the all Next, Prev: Integer
etc to and changed them to pointers again. Here are the results:

original code:

fpc 17 s
delphi 9 s

first optimization - saving redundant array access to pointers:

fpc 12s =  -30%
delphi 8s = - 11%

next optimization - changed code to use linked lists inside the arrays:

fpc 7s = -41 % from opt1 = - 58% from orig
delphi 6s = - 25% from opt1 = -33% from orig

 
> Yes, because as I mentioned above, this reduces the register pressure.
>

The numbers show, that fpc has a problem when it comes to accessing
elements inside an array with elements sizes <> power of 2 - call it
whatever you like. But the numbers show, that the benefits of the
optimizations for delphi are on a much lower level than for fpc -
conclusion is:

delphi's array arithmetic is much better than fpc's in these cases - and
there is a potential for optimization - many of the changes we did
manually  could be done by a smart optimizers as well. There were so
many redundant address calculations, most of them should be recognized
by an optimizer. I choose this example to train my people to see
bottlenecks and how to avoid them - even without knowing what the code
does actually.

last optimized version of the code attached:

program project1;

{$APPTYPE CONSOLE}


uses
  Math,
  SysUtils,
  Windows;

type
  PRectangle = ^TRectangle;
  TRectangle = object
    Width: integer;
    Area:  int64;
    NextUnusedRectangle: PRectangle;
  end;


  TRectangleArray = array of TRectangle;


var
  UnsortedRectangle: TRectangleArray;
  NumRectangles: integer;
  BoundingBox: TRectangle;



  function GapAlgorithm: boolean;

  type
    PBarEntry = ^TBarEntry;
    TBarEntry = record
      Height: integer;
      Width:  integer;
      Start:  integer;
      Next:   PBarEntry;
      Prev:   PBarEntry;
    end;

  var
    SolutionFound: boolean;
    Bar: array of TBarEntry;
    FreeBarEntry: array of PBarEntry;
    NumFreeBarEntries: integer;
    NumIterations: int64;


    procedure ExtendPartialSolution(NumPlacedRectangles,
FirstUnusedRectangle: PRectangle);
    type
      TBarCase = (BarCase1, BarCase2, BarCase3, BarCase4, BarCase5,
BarCase6);

    var
      MinValleyWidth: integer;
      i, MinValley: PBarEntry;
      PrevBar, NextBar: PBarEntry;
      RectWidth:    integer;
      BarCase:      TBarCase;
      NextBarWidth: integer;
      NewEntry, NewEntry2: PBarEntry;
      MinValleyArea: int64;
      MinValleyHeight: integer;
      TotalAreaOfFittingRectangles: int64;
      CurrentRectangle: PRectangle;
      PreviousRectangle: PRectangle;
      OldFirstUnusedRectangle: PRectangle;
      OldPrevNextRectangle: PRectangle;
      //PPrevBar, PNextBar, PNewEntry, PNewEntry2, PMinValley: PBarEntry;
      //PCurrentRectangle,
      //PPreviousRectangle,
      wi: PRectangle;
    begin

      if NumPlacedRectangles =  @UnsortedRectangle[NumRectangles] then
      begin
        writeln('Solution found');
        SolutionFound := True;
        exit;
      end
      else
      begin
        Inc(NumIterations);

        MinValleyWidth := BoundingBox.Width + 1;
        PrevBar := @Bar[1];
        i := PrevBar.Next;
        NextBar := i.Next;

        while NextBar <> @Bar[0] do
        begin
          with i^ do begin
            if (Width < MinValleyWidth) and (PrevBar.Height > Height) and
              (NextBar.Height > Height) then
            begin
              MinValleyWidth := Width;
              MinValley      := i;
            end;
          end;
          PrevBar := i;
          i := NextBar;
          NextBar := NextBar.Next;
        end;
        //PPrevBar:= @Bar[PrevBar];

        //PMinValley := @Bar[MinValley];
        MinValleyHeight := min(MinValley.Prev.Height,
          MinValley.Next.Height) - MinValley.Height;
        MinValleyArea   := int64(MinValleyHeight) * int64(MinValleyWidth);

        if MinValleyWidth < BoundingBox.Width then
        begin

          TotalAreaOfFittingRectangles := 0;
          CurrentRectangle := FirstUnusedRectangle;
          while CurrentRectangle <> @UnsortedRectangle[0] do
          begin
            with CurrentRectangle^ do begin
              if (Width <= MinValleyWidth) then
                Inc(TotalAreaOfFittingRectangles, Area);
              CurrentRectangle := NextUnusedRectangle;
            end;
          end;

          if TotalAreaOfFittingRectangles < MinValleyArea then
            exit;
        end;


        //PPreviousRectangle := 0;
        PreviousRectangle:= @UnsortedRectangle[0];
        CurrentRectangle  := FirstUnusedRectangle;
        //PCurrentRectangle:= @UnsortedRectangle[CurrentRectangle];
        while CurrentRectangle <> @UnsortedRectangle[0] do
        begin
          wi := CurrentRectangle;
          if (wi.Width <= MinValleyWidth) and
            (wi.Width + MinValley.Height <= BoundingBox.Width) then
          begin
            OldFirstUnusedRectangle := FirstUnusedRectangle;
            OldPrevNextRectangle    :=
PreviousRectangle.NextUnusedRectangle;
            if CurrentRectangle = FirstUnusedRectangle then
            begin
              FirstUnusedRectangle := CurrentRectangle.NextUnusedRectangle;
            end
            else
            begin
              PreviousRectangle.NextUnusedRectangle :=
                CurrentRectangle.NextUnusedRectangle;
            end;

            PrevBar   := MinValley.Prev;
            //PPrevBar  := @Bar[PrevBar];
            NextBar   := MinValley.Next;
            //PNextBar  := @Bar[NextBar];
            RectWidth := wi.Width;

            if MinValleyWidth = RectWidth then
            begin
              if PrevBar.Height = MinValley.Height + RectWidth then
              begin
                if NextBar.Height = MinValley.Height + RectWidth then
                begin
                  BarCase      := BarCase3;
                  NextBarWidth := NextBar.Width;
                  Inc(PrevBar.Width, RectWidth + NextBarWidth);
                  PrevBar.Next := NextBar.Next;
                  NextBar.Next.Prev := PrevBar;
                  Inc(NumFreeBarEntries);
                  FreeBarEntry[NumFreeBarEntries] := NextBar;
                  Inc(NumFreeBarEntries);
                  FreeBarEntry[NumFreeBarEntries] := MinValley;
                end
                else
                begin
                  BarCase := BarCase4;
                  Inc(PrevBar.Width, RectWidth);
                  PrevBar.Next := NextBar;
                  NextBar.Prev := PrevBar;
                  Inc(NumFreeBarEntries);
                  FreeBarEntry[NumFreeBarEntries] := MinValley;
                end;
              end
              else
              begin
                if NextBar.Height = MinValley.Height + RectWidth then
                begin
                  BarCase := BarCase5;
                  Inc(NextBar.Width, RectWidth);
                  Dec(NextBar.Start, RectWidth);
                  PrevBar.Next := NextBar;
                  NextBar.Prev := PrevBar;
                  Inc(NumFreeBarEntries);
                  FreeBarEntry[NumFreeBarEntries] := MinValley;
                end
                else
                begin
                  BarCase := BarCase6;
                  Inc(MinValley.Height, RectWidth);
                end;
              end;
            end
            else
            begin
              if PrevBar.Height = MinValley.Height + RectWidth then
              begin
                BarCase := BarCase1;
                Inc(PrevBar.Width, RectWidth);
                Dec(MinValley.Width, RectWidth);
                Inc(MinValley.Start, RectWidth);
              end
              else
              begin
                BarCase  := BarCase2;
                NewEntry := FreeBarEntry[NumFreeBarEntries];
                //PNewEntry:= @Bar[NewEntry];
                Dec(NumFreeBarEntries);
                PrevBar.Next   := NewEntry;
                MinValley.Prev := NewEntry;
                Dec(MinValley.Width, RectWidth);
                Inc(MinValley.Start, RectWidth);
                NewEntry.Height := MinValley.Height + RectWidth;
                NewEntry.Width  := RectWidth;
                NewEntry.Start  := MinValley.Start - RectWidth;
                NewEntry.Prev   := PrevBar;
                NewEntry.Next   := MinValley;
              end;
            end;



            ExtendPartialSolution(NumPlacedRectangles + 1,
FirstUnusedRectangle);

            if SolutionFound then
              exit;

            case BarCase of
              BarCase1:
              begin
                Dec(PrevBar.Width, RectWidth);
                Inc(MinValley.Width, RectWidth);
                Dec(MinValley.Start, RectWidth);
              end;
              BarCase2:
              begin
                PrevBar.Next   := MinValley;
                MinValley.Prev := PrevBar;
                Inc(MinValley.Width, RectWidth);
                Dec(MinValley.Start, RectWidth);
                Inc(NumFreeBarEntries);
                FreeBarEntry[NumFreeBarEntries] := NewEntry;
              end;
              BarCase3:
              begin
                Dec(PrevBar.Width, RectWidth + NextBarWidth);
                NewEntry := FreeBarEntry[NumFreeBarEntries];
                //PNewEntry:= @Bar[NewEntry];
                Dec(NumFreeBarEntries);
                NewEntry2 := FreeBarEntry[NumFreeBarEntries];
                //PNewEntry2:= @Bar[NewEntry2];
                Dec(NumFreeBarEntries);
                NewEntry.Height  := PrevBar.Height - RectWidth;
                NewEntry.Width   := RectWidth;
                NewEntry.Start   := PrevBar.Start + PrevBar.Width;
                NewEntry.Prev    := PrevBar;
                NewEntry.Next    := NewEntry2;
                NewEntry2.Height := PrevBar.Height;
                NewEntry2.Width  := NextBarWidth;
                NewEntry2.Start  := NewEntry.Start + RectWidth;
                NewEntry2.Prev   := NewEntry;
                NewEntry2.Next   := PrevBar.Next;
                PrevBar.Next.Prev := NewEntry2;
                PrevBar.Next     := NewEntry;
              end;
              BarCase4:
              begin
                Dec(PrevBar.Width, RectWidth);
                NewEntry := FreeBarEntry[NumFreeBarEntries];
                //PNewEntry:= @Bar[NewEntry];
                Dec(NumFreeBarEntries);
                NewEntry.Height := PrevBar.Height - RectWidth;
                NewEntry.Width  := RectWidth;
                NewEntry.Start  := PrevBar.Start + PrevBar.Width;
                NewEntry.Prev   := PrevBar;
                NewEntry.Next   := NextBar;
                PrevBar.Next    := NewEntry;
                NextBar.Prev    := NewEntry;
              end;
              BarCase5:
              begin
                Dec(NextBar.Width, RectWidth);
                Inc(NextBar.Start, RectWidth);
                NewEntry := FreeBarEntry[NumFreeBarEntries];
                //PNewEntry:= @Bar[NewEntry];
                Dec(NumFreeBarEntries);
                NewEntry.Height := NextBar.Height - RectWidth;
                NewEntry.Width  := RectWidth;
                NewEntry.Start  := NextBar.Start - RectWidth;
                NewEntry.Prev   := PrevBar;
                NewEntry.Next   := NextBar;
                PrevBar.Next    := NewEntry;
                NextBar.Prev    := NewEntry;
              end;
              BarCase6:
              begin
                Dec(MinValley.Height, RectWidth);
              end;
            end;

            FirstUnusedRectangle := OldFirstUnusedRectangle;
            PreviousRectangle.NextUnusedRectangle := OldPrevNextRectangle;
          end;

          PreviousRectangle := CurrentRectangle;
          //PPreviousRectangle:= PCurrentRectangle;
          CurrentRectangle  := CurrentRectangle.NextUnusedRectangle;
          //PCurrentRectangle:= @UnsortedRectangle[CurrentRectangle];
        end;
      end;
    end;

  var
    i: integer;

  begin
    Result := True;

    SetLength(Bar, NumRectangles + 3);
    Bar[1].Height := BoundingBox.Width + 1;
    Bar[1].Width  := 1;
    Bar[1].Next   := @Bar[2];
    Bar[1].Prev   := @Bar[0];
    Bar[1].Start  := -1;

    Bar[2].Height := 0;
    Bar[2].Width  := BoundingBox.Width;
    Bar[2].Next   := @Bar[3];
    Bar[2].Prev   := @Bar[1];
    Bar[2].Start  := 0;

    Bar[3].Height := Bar[1].Height;
    Bar[3].Width  := 1;
    Bar[3].Next   := @Bar[0];
    Bar[3].Prev   := @Bar[2];
    Bar[3].Start  := Bar[2].Width;

    SetLength(FreeBarEntry, NumRectangles + 1);
    NumFreeBarEntries := 0;
    for i := NumRectangles + 2 downto 4 do
    begin
      Inc(NumFreeBarEntries);
      FreeBarEntry[NumFreeBarEntries] := @Bar[i];
    end;

    for i := 0 to NumRectangles - 1 do
      UnsortedRectangle[i].NextUnusedRectangle := @UnsortedRectangle[i + 1];
    UnsortedRectangle[NumRectangles].NextUnusedRectangle :=
@UnsortedRectangle[0];

    NumIterations := 0;
    SolutionFound := False;
    ExtendPartialSolution(@UnsortedRectangle[0], @UnsortedRectangle[1]);
    Result := SolutionFound;
    writeln('# iterations: ', NumIterations: 12);
  end;



var
  i: integer;
  StartTime: TDateTime;

begin
  StartTime     := now;
  BoundingBox.Width := 70;
  BoundingBox.Area := int64(BoundingBox.Width) * int64(BoundingBox.Width);
  NumRectangles := 24;
  SetLength(UnsortedRectangle, NumRectangles + 5);
  for i := 1 to NumRectangles do
  begin
    with UnsortedRectangle[i] do begin
      Width := i;
      Area  := int64(i) * int64(i);
    end;
  end;

  if GapAlgorithm then
    writeln('solution found')
  else
    writeln('no solution found');
  writeln('runtime: ', (Now - StartTime) * 3600 * 24: 8: 2, 's');
  readln;
end.



_______________________________________________
fpc-pascal maillist  -  [hidden email]
http://lists.freepascal.org/mailman/listinfo/fpc-pascal
Reply | Threaded
Open this post in threaded view
|

Re: code optimization

Jonas Maebe-2

On 24 Sep 2010, at 11:48, Adrian Veith wrote:

> Changing to pointers reduces the amount of multiplications for  
> accessing
> the nth element in an array - if you compare the delphi code to th fpc
> code on assembler base, this is the main difference in both generated
> codes.

Did you actually try replacing only the multiplications with lea's in  
the assembler code generated by FPC (one lea to multiply by 5 and then  
the times 4 during the load/store)? I did before posting my initial  
reply because it also seemed to be the most logical explanation to me.  
It turned out to be a red herring:

With imull $20:
# iterations:     26662054
no solution found
runtime:    10.75s

With "lea (%reg,%reg,4),%reg" followed by "movl (%xxx,%reg,4),
%yyy" (not just for mov, but for every single memory expression that  
depends on an "imull $20"):
# iterations:     26662054
no solution found
runtime:    10.06s

Kylix 3 (~ Delphi 6.5):
# iterations:     26662054
no solution found
runtime:     6.65s

> Register allocation is on a comparable level for both versions.

Delphi keeps the "Bar" pointer in a register, while FPC spills it to  
the stack. Because Bar is used in most of the most-executed  
statements, this has a huge impact.


Jonas
_______________________________________________
fpc-pascal maillist  -  [hidden email]
http://lists.freepascal.org/mailman/listinfo/fpc-pascal
Reply | Threaded
Open this post in threaded view
|

Re: code optimization

stefan077
In reply to this post by Adrian Veith
Hi Adrian,

"Adrian Veith" <[hidden email]> wrote:

[...]
>
>we optimized the code further and eliminated the all Next, Prev: Integer
>etc to and changed them to pointers again. Here are the results:
[...]
>first optimization - saving redundant array access to pointers:
[...]
>next optimization - changed code to use linked lists inside the arrays:
>
>fpc 7s = -41 % from opt1 = - 58% from orig
>delphi 6s = - 25% from opt1 = -33% from orig
>

I get the following values with the code you sent:
fpc: 4.51s  vs  Delphi: 3.05s on Intel Xeon and
fpc: 7.33s  vs  Delphi: 5.17s on Intel Core 2 mobile
so even for your optimized code delphi is more than 40% faster.
Probably we use different delphi compilers. I got these results using Delphi 7.

[...]
>last optimized version of the code attached:
[...]

thanks a lot for the code, I used these ideas for the original code and it gave me a speedup of 20%.
___________________________________________________________
WEB.DE DSL SOMMER-SPECIAL: Surf & Phone Flat 16.000 für
nur 19,99 &euro;/mtl.!* http://produkte.web.de/go/DSL_Doppel_Flatrate/2
_______________________________________________
fpc-pascal maillist  -  [hidden email]
http://lists.freepascal.org/mailman/listinfo/fpc-pascal
Reply | Threaded
Open this post in threaded view
|

Re: code optimization

Jonas Maebe-2
In reply to this post by Jonas Maebe-2

On 24 Sep 2010, at 14:35, Jonas Maebe wrote:

> On 24 Sep 2010, at 11:48, Adrian Veith wrote:
>
>> Register allocation is on a comparable level for both versions.
>
> Delphi keeps the "Bar" pointer in a register, while FPC spills it to  
> the stack. Because Bar is used in most of the most-executed  
> statements, this has a huge impact.

Correction: Delphi keeps the hidden parent frame pointer parameter in  
a register (which is required every time Bar is accessed), while FPC  
puts it on the stack. The end result is the same though: 1 extra  
memory access every time Bar or any other variable from the parent  
procedure is accessed.


Jonas
_______________________________________________
fpc-pascal maillist  -  [hidden email]
http://lists.freepascal.org/mailman/listinfo/fpc-pascal
Reply | Threaded
Open this post in threaded view
|

Re: code optimization

Adrian Veith
In reply to this post by Jonas Maebe-2


On 24.09.2010 14:35, Jonas Maebe wrote:

>
> On 24 Sep 2010, at 11:48, Adrian Veith wrote:
>
>> Changing to pointers reduces the amount of multiplications for accessing
>> the nth element in an array - if you compare the delphi code to th fpc
>> code on assembler base, this is the main difference in both generated
>> codes.
>
> Did you actually try replacing only the multiplications with lea's in
> the assembler code generated by FPC (one lea to multiply by 5 and then
> the times 4 during the load/store)? I did before posting my initial
> reply because it also seemed to be the most logical explanation to me.
> It turned out to be a red herring:
>
> With imull $20:
> # iterations:     26662054
> no solution found
> runtime:    10.75s
>
> With "lea (%reg,%reg,4),%reg" followed by "movl (%xxx,%reg,4),%yyy"
> (not just for mov, but for every single memory expression that depends
> on an "imull $20"):
> # iterations:     26662054
> no solution found
> runtime:    10.06s
>
> Kylix 3 (~ Delphi 6.5):
> # iterations:     26662054
> no solution found
> runtime:     6.65s

I must confess - I did not - but I will because that's interesting.
Maybe this will behave different on different CPUs, because the picture
is very different on my I7 compared to the older AMD.

fpc:

on i7:

orig: 9s
opt1: 7s = -22%
opt2: 6s = -14% total -33%

on AMD:

orig: 17s
opt1: 12s = -30%
opt2: 7s = -41% total -58%

so the older AMD suffers much more from the not optimized code than the
newer i7. At the end they are almost on the same speed.

same code with delphi (5):

on i7:

orig:  6s
opt1: 5.5s = -8%
opt2: 4.5s = -18% total  -25%

on AMD:

orig:  9s
opt1: 8s = -11%
opt2: 6s = -25% total -33%

Strange picture (I did all tests 3 times and took the medium).

>
>> Register allocation is on a comparable level for both versions.
>
> Delphi keeps the "Bar" pointer in a register, while FPC spills it to
> the stack. Because Bar is used in most of the most-executed
> statements, this has a huge impact.
>

you got me ;-)
_______________________________________________
fpc-pascal maillist  -  [hidden email]
http://lists.freepascal.org/mailman/listinfo/fpc-pascal
Reply | Threaded
Open this post in threaded view
|

Re: code optimization

Jürgen Hestermann
In reply to this post by stefan077
[hidden email] schrieb:
> My experience is that linked lists with pointers are much slower than linked lists realized by arrays.

That's my experience too. I converted a few programs from linked lists to array of pointers and the speed increase was always dramatically.
_______________________________________________
fpc-pascal maillist  -  [hidden email]
http://lists.freepascal.org/mailman/listinfo/fpc-pascal
Reply | Threaded
Open this post in threaded view
|

Re: code optimization

Florian Klämpfl
In reply to this post by Jonas Maebe-2
Am 24.09.2010 15:36, schrieb Jonas Maebe:

>
> On 24 Sep 2010, at 14:35, Jonas Maebe wrote:
>
>> On 24 Sep 2010, at 11:48, Adrian Veith wrote:
>>
>>> Register allocation is on a comparable level for both versions.
>>
>> Delphi keeps the "Bar" pointer in a register, while FPC spills it to
>> the stack. Because Bar is used in most of the most-executed
>> statements, this has a huge impact.
>
> Correction: Delphi keeps the hidden parent frame pointer parameter in a
> register (which is required every time Bar is accessed), while FPC puts
> it on the stack. The end result is the same though: 1 extra memory
> access every time Bar or any other variable from the parent procedure is
> accessed.

I've almost a patch ready which should solve this. Furthermore, I
improved cse a little bit. Nevertheless, this safes only 5% on my
machine. Unfortunatly, I found no proper way to enable cse on dyn. array
expressions, this should help for another 5%.

Did anybody (Jonas?) profile which the real hot spot lines are?
_______________________________________________
fpc-pascal maillist  -  [hidden email]
http://lists.freepascal.org/mailman/listinfo/fpc-pascal
12