CASE

classic Classic list List threaded Threaded
15 messages Options
Reply | Threaded
Open this post in threaded view
|

CASE

Paul Davidson
May be having case statement problem.

App has case statement with 146 sequential labels.
They are in order, from a defined type.
The assembler output is scanning each label, where it seems a jump
would be more efficient.

ppc 32, Darwin
/usr/local/bin/ppcppc cape80.pas -Ci -Co -g -gl -O1 -vr -a -al
-FEbuild/cape80.build/cape80.build -ocape80

Also tried -O2 and 3

Result:

# [735] case l3o3.FormType of                                          
  // Select 3o3 processor
        lwz r2,160(r1)
        cmplwi cr0,r2,0
        beq cr0,L1051
        cmplwi cr0,r2,1
        beq cr0,L1052
        cmplwi cr0,r2,2
        beq cr0,L1053
        cmplwi cr0,r2,3
        beq cr0,L1054

etc.

Any hints?


P Davidson

_______________________________________________
fpc-pascal maillist  -  [hidden email]
http://lists.freepascal.org/mailman/listinfo/fpc-pascal
Reply | Threaded
Open this post in threaded view
|

Re: CASE

Eduardo Morras
At 04:42 08/11/2005, you wrote:
>May be having case statement problem.
>
>App has case statement with 146 sequential labels.
>They are in order, from a defined type.
>The assembler output is scanning each label, where it seems a jump
>would be more efficient.

I don't understand well what you have:

case L3o3.Formtype of
1,2,3,....,146 : begin <somecode>
                 end;
or

case L3o3.FormType of
1 : begin <somecode1>
      end
2 : begin <somecode2>
      end

and so on.....

For the second case the assemmbler code seems ok to me, also L1051
L1052 L1053 looks like are different destinations. For the first,
well, it can be optimized "a bit" by hand, but note that it's
difficult to do in "automatic mode" by a compiler.

lwz           r2,160(r1)        // Load in r2 the value to compare
subi         r3,r2,145  // Change 145 with the top value. Don't need
to record in cr0, so subi is used instead of subi. (with point)
cmplwi     cr3,r3,0     // Change 0 with the low value. So if 145-r2
is greater or equal 0, it's between them or is one of them
bge          cr3,L1051  // branch to L1051. Use bge+ cr3,L1051 if you
are sure that branch will be taken often

Note that cr3, is a condition register field and can be used by the
compiler for other pourposes. This code only works with integer
values 0 < r2 < 65535, perhaps also works with chars. If you know
that the condition is true (r2 is between 0 and 145 often), you can
use the '+' format of the branch.

The code may not work, i'm writing it by memory.

>ppc 32, Darwin
>/usr/local/bin/ppcppc cape80.pas -Ci -Co -g -gl -O1 -vr -a -al
>-FEbuild/cape80.build/cape80.build -ocape80
>
>Also tried -O2 and 3
>
>Result:
>
># [735] case l3o3.FormType of
>  // Select 3o3 processor
>         lwz     r2,160(r1)
>         cmplwi  cr0,r2,0
>         beq     cr0,L1051
>         cmplwi  cr0,r2,1
>         beq     cr0,L1052
>         cmplwi  cr0,r2,2
>         beq     cr0,L1053
>         cmplwi  cr0,r2,3
>         beq     cr0,L1054
>
>etc.
>
>Any hints?

_______________________________________________
fpc-pascal maillist  -  [hidden email]
http://lists.freepascal.org/mailman/listinfo/fpc-pascal
Reply | Threaded
Open this post in threaded view
|

Re: CASE

Paul Davidson

On Nov 8, 2005, at 11:39, Eduardo wrote:

> At 04:42 08/11/2005, you wrote:
>> May be having case statement problem.
>>
>> App has case statement with 146 sequential labels.
>> They are in order, from a defined type.
>> The assembler output is scanning each label, where it seems a jump
>> would be more efficient.
>
> I don't understand well what you have:
>
Not this

> case L3o3.Formtype of
> 1,2,3,....,146 : begin <somecode>
>                 end;
> or
>

This :)

> case L3o3.FormType of
> 1 : begin <somecode1>
>      end
> 2 : begin <somecode2>
>      end
>
> and so on.....
>
> For the second case the assemmbler code seems ok to me, also L1051
> L1052 L1053 looks like are different destinations. For the first,
> well, it can be optimized "a bit" by hand, but note that it's
> difficult to do in "automatic mode" by a compiler.
>
> lwz           r2,160(r1)        // Load in r2 the value to compare
> subi         r3,r2,145  // Change 145 with the top value. Don't need
> to record in cr0, so subi is used instead of subi. (with point)
> cmplwi     cr3,r3,0     // Change 0 with the low value. So if 145-r2
> is greater or equal 0, it's between them or is one of them
> bge          cr3,L1051  // branch to L1051. Use bge+ cr3,L1051 if you
> are sure that branch will be taken often
>
> Note that cr3, is a condition register field and can be used by the
> compiler for other pourposes. This code only works with integer values
> 0 < r2 < 65535, perhaps also works with chars. If you know that the
> condition is true (r2 is between 0 and 145 often), you can use the '+'
> format of the branch.
>
> The code may not work, i'm writing it by memory.
>
> _______________________________________________
> fpc-pascal maillist  -  [hidden email]
> http://lists.freepascal.org/mailman/listinfo/fpc-pascal
>
>

Was under the impression that doing a jump, offset by label number,
would be faster.  Specially for such long case statements.

P Davidson
http://CoraxNetworks.com

_______________________________________________
fpc-pascal maillist  -  [hidden email]
http://lists.freepascal.org/mailman/listinfo/fpc-pascal
Reply | Threaded
Open this post in threaded view
|

Re: CASE

Eduardo Morras

>>>May be having case statement problem.
>>>
>>>App has case statement with 146 sequential labels.
>>>They are in order, from a defined type.
>>>The assembler output is scanning each label, where it seems a jump
>>>would be more efficient.
>
>This :)
>
>>case L3o3.FormType of
>>1 : begin <somecode1>
>>      end
>>2 : begin <somecode2>
>>      end
>>
>>and so on.....
>>
>>For the second case the assemmbler code seems ok to me, also L1051
>>L1052 L1053 looks like are different destinations. For the first,
>>well, it can be optimized "a bit" by hand, but note that it's
>>difficult to do in "automatic mode" by a compiler.
>
>Was under the impression that doing a jump, offset by label number,
>would be faster.  Specially for such long case statements.

Jump by offset label number? I don't understand it.

_______________________________________________
fpc-pascal maillist  -  [hidden email]
http://lists.freepascal.org/mailman/listinfo/fpc-pascal
Reply | Threaded
Open this post in threaded view
|

Re: CASE

Paul Davidson
>
> Jump by offset label number? I don't understand it.
> _______________________________________________
> fpc-pascal maillist  -  [hidden email]
> http://lists.freepascal.org/mailman/listinfo/fpc-pascal
>
>

Sorry Eduardo, am not knowledgeable of compiler internals.
There is (?) an optimization in FPC that uses a jump table(?) instead
of scanning each label value of a case statement.  Hoping that this is
the case, as it seems to be with i386.  This would mean faster
execution of the case.  Yes?

P Davidson
http://CoraxNetworks.com

_______________________________________________
fpc-pascal maillist  -  [hidden email]
http://lists.freepascal.org/mailman/listinfo/fpc-pascal
Reply | Threaded
Open this post in threaded view
|

Re: CASE

Peter Vreman
In reply to this post by Paul Davidson
At 04:42 8-11-2005, you wrote:

>May be having case statement problem.
>
>App has case statement with 146 sequential labels.
>They are in order, from a defined type.
>The assembler output is scanning each label, where it seems a jump would
>be more efficient.
>
>ppc 32, Darwin
>/usr/local/bin/ppcppc cape80.pas -Ci -Co -g -gl -O1 -vr -a -al
>-FEbuild/cape80.build/cape80.build -ocape80
>
>Also tried -O2 and 3
>
>Result:
>
># [735] case l3o3.FormType of
>  // Select 3o3 processor
>         lwz     r2,160(r1)
>         cmplwi  cr0,r2,0
>         beq     cr0,L1051
>         cmplwi  cr0,r2,1
>         beq     cr0,L1052
>         cmplwi  cr0,r2,2
>         beq     cr0,L1053
>         cmplwi  cr0,r2,3
>         beq     cr0,L1054
>
>etc.
>
>Any hints?

Use 2.1.1


Peter

_______________________________________________
fpc-pascal maillist  -  [hidden email]
http://lists.freepascal.org/mailman/listinfo/fpc-pascal
Reply | Threaded
Open this post in threaded view
|

Re: CASE

Paul Davidson

On Nov 8, 2005, at 12:11, Peter Vreman wrote:

>>
>> # [735] case l3o3.FormType of
>>  // Select 3o3 processor
>>         lwz     r2,160(r1)
>>         cmplwi  cr0,r2,0
>>         beq     cr0,L1051
>>         cmplwi  cr0,r2,1
>>         beq     cr0,L1052
>>         cmplwi  cr0,r2,2
>>         beq     cr0,L1053
>>         cmplwi  cr0,r2,3
>>         beq     cr0,L1054
>>
>> etc.
>>
>> Any hints?
>
> Use 2.1.1
>
>
> Peter
>
> _______________________________________________
> fpc-pascal maillist  -  [hidden email]
> http://lists.freepascal.org/mailman/listinfo/fpc-pascal
>
>

version 2.1.1 [2005/11/08] for powerpc
Same result, Peter :|

P Davidson
http://CoraxNetworks.com

_______________________________________________
fpc-pascal maillist  -  [hidden email]
http://lists.freepascal.org/mailman/listinfo/fpc-pascal
Reply | Threaded
Open this post in threaded view
|

Re: CASE

Eduardo Morras
In reply to this post by Paul Davidson
At 18:07 08/11/2005, you wrote:

>>Jump by offset label number? I don't understand it.
>>_______________________________________________
>>fpc-pascal maillist  -  [hidden email]
>>http://lists.freepascal.org/mailman/listinfo/fpc-pascal
>>
>
>Sorry Eduardo, am not knowledgeable of compiler internals.
>There is (?) an optimization in FPC that uses a jump table(?)
>instead of scanning each label value of a case statement.  Hoping
>that this is the case, as it seems to be with i386.  This would mean
>faster execution of the case.  Yes?

Don't know how fpc ppc works internally. Surely this optimization is
not done, it's uncommon, i think.

For now, the only optimization to the compiler output maybe use other
crs for comparations and separate the branch from the comparation to
prevent stalls. Here an example.

Original code

# [735] case l3o3.FormType of
  // Select 3o3 processor
         lwz     r2,160(r1)
         cmplwi  cr0,r2,0
         beq     cr0,L1051
         cmplwi  cr0,r2,1
         beq     cr0,L1052
         cmplwi  cr0,r2,2
         beq     cr0,L1053
         cmplwi  cr0,r2,3
         beq     cr0,L1054

Modified code

# [735] case l3o3.FormType of
  // Select 3o3 processor
         lwz     r3,160(r1)     // r3, has the value to check
         cmplwi  cr0,r3,0
         cmplwi  cr2,r3,1        // cr1 is used for fpu condition and
cannot be used for integer
         cmplwi  cr3,r3,2
         cmplwi  cr4,r3,3
         beq     cr0,L1051    // separate branch from comp (4 lines
up) can make the branch be pre-calculated and executed in zero cycles
         cmplwi  cr5,r3,4
         beq     cr2,L1052
         cmplwi  cr0,r3,5
         beq     cr3,L1053
         cmplwi  cr2,r3,6
         beq     cr4,L1054

and so on...

_______________________________________________
fpc-pascal maillist  -  [hidden email]
http://lists.freepascal.org/mailman/listinfo/fpc-pascal
Reply | Threaded
Open this post in threaded view
|

Re: CASE

Eduardo Morras
In reply to this post by Paul Davidson
At 18:22 08/11/2005, you wrote:

>>>Any hints?
>>
>>Use 2.1.1
>>
>>
>>Peter
>
>version 2.1.1 [2005/11/08] for powerpc
>Same result, Peter :|
>
>P Davidson

Perhaps this code can do the trick, but don't know if is easily
integrable in fpc ppc codegenerator, you can adapt it a bit if you want

** IN r3 Value to check r29 branch lookup table r5 top/end r6
low/begin r4 where to branch

SecuentialCase: lwz        r3,VALUE         // VALUE from TOC or BSS or other.
                          subi       r29,r29,4          // Adjust
value for access
           caseloop: lwzu      r4,4(r29)          // in address
pointed by r29 begins the jump look-up table (L1051,L1052)
                          mtctr     r4                    // Move
address in r4 to CTR special register. CTR has now the destiny address
                          lwz        r5,TOP             // last value
                          lwz        r6,LOW            // low value
                          cmplwi   cr2,r3,r6           // compare if
value with r6
                          bcctr     12,10                // BO = 12
Branch if condition True. BI = 10 Condition flag Equal of cr2.
                          addi       r6,r6,1              // r6 := r6 +1
                          cmplwi   cr3,r5,r6           // is r6 > r5 ?
                          ble        cr3,caseloop     // Branch to
caseloop if FALSE
                          otherwise statement or branch to it

** OUT r29, r6 originals values are destroyed.  

_______________________________________________
fpc-pascal maillist  -  [hidden email]
http://lists.freepascal.org/mailman/listinfo/fpc-pascal
Reply | Threaded
Open this post in threaded view
|

Re: CASE

Peter Vreman
In reply to this post by Paul Davidson

>>>
>>># [735] case l3o3.FormType of
>>>  // Select 3o3 processor
>>>         lwz     r2,160(r1)
>>>         cmplwi  cr0,r2,0
>>>         beq     cr0,L1051
>>>         cmplwi  cr0,r2,1
>>>         beq     cr0,L1052
>>>         cmplwi  cr0,r2,2
>>>         beq     cr0,L1053
>>>         cmplwi  cr0,r2,3
>>>         beq     cr0,L1054
>>>
>>>etc.
>>>
>>>Any hints?
>>
>>Use 2.1.1
>
>version 2.1.1 [2005/11/08] for powerpc
>Same result, Peter :|

The 2.1.1 compiler contains code to generate jumptables for ppc, see
powerpc/nppcset.pas. You'll have to debug the compiler to see why it
doesn't use them.


Peter

_______________________________________________
fpc-pascal maillist  -  [hidden email]
http://lists.freepascal.org/mailman/listinfo/fpc-pascal
Reply | Threaded
Open this post in threaded view
|

Re: CASE

Anton Tichawa
In reply to this post by Paul Davidson
Paul Davidson wrote:

>
> On Nov 8, 2005, at 12:11, Peter Vreman wrote:
>
>>>
>>> # [735] case l3o3.FormType of
>>>  // Select 3o3 processor
>>>         lwz     r2,160(r1)
>>>         cmplwi  cr0,r2,0
>>>         beq     cr0,L1051
>>>         cmplwi  cr0,r2,1
>>>         beq     cr0,L1052
>>>         cmplwi  cr0,r2,2
>>>         beq     cr0,L1053
>>>         cmplwi  cr0,r2,3
>>>         beq     cr0,L1054
>>>
>>> etc.
>>>
>>> Any hints?
>>
>>
>> Use 2.1.1
>>
>>
>> Peter
>>
>> _______________________________________________
>> fpc-pascal maillist  -  [hidden email]
>> http://lists.freepascal.org/mailman/listinfo/fpc-pascal
>>
>>
>
> version 2.1.1 [2005/11/08] for powerpc
> Same result, Peter :|
>
> P Davidson
> http://CoraxNetworks.com
>
> _______________________________________________
> fpc-pascal maillist  -  [hidden email]
> http://lists.freepascal.org/mailman/listinfo/fpc-pascal
>
You could use an array of procedures. It's members are initialized once,

var my_table: array [0 .. 146] of procedure;

my_table[0] := @my_proc_0;
my_tyble[1] := @my_proc_1;
...

and the 'case' statement is simply replaced with

begin
  ..
  my_table[x];
  ..
end;

which is fast, because it performs no compare, but loads the procedure
address from the array, indexed with x.

Anton.

_______________________________________________
fpc-pascal maillist  -  [hidden email]
http://lists.freepascal.org/mailman/listinfo/fpc-pascal
Reply | Threaded
Open this post in threaded view
|

Re: CASE

Marco van de Voort
In reply to this post by Peter Vreman
> >>># [735] case l3o3.FormType of
> >>>  // Select 3o3 processor
> >>>         lwz     r2,160(r1)
> >>>         cmplwi  cr0,r2,0
> >>>         beq     cr0,L1051
> >>>         cmplwi  cr0,r2,1
> >>>         beq     cr0,L1052
> >>>         cmplwi  cr0,r2,2
> >>>         beq     cr0,L1053
> >>>         cmplwi  cr0,r2,3
> >>>         beq     cr0,L1054
> >>>
> >>>etc.
> >>>
> >>>Any hints?
> >>
> >>Use 2.1.1
> >
> >version 2.1.1 [2005/11/08] for powerpc
> >Same result, Peter :|
>
> The 2.1.1 compiler contains code to generate jumptables for ppc, see
> powerpc/nppcset.pas. You'll have to debug the compiler to see why it
> doesn't use them.

Haven't really debugged, but there is an cs_optimize in aktmodeswitches
somewhere near where this code is called.

Paul: do you enable some -O parameter? Play with it etc.
_______________________________________________
fpc-pascal maillist  -  [hidden email]
http://lists.freepascal.org/mailman/listinfo/fpc-pascal
Reply | Threaded
Open this post in threaded view
|

Re: CASE

Adriaan van Os-2
Marco van de Voort wrote:

>>>>> # [735] case l3o3.FormType of
>>>>>  // Select 3o3 processor
>>>>>         lwz     r2,160(r1)
>>>>>         cmplwi  cr0,r2,0
>>>>>         beq     cr0,L1051
>>>>>         cmplwi  cr0,r2,1
>>>>>         beq     cr0,L1052
>>>>>         cmplwi  cr0,r2,2
>>>>>         beq     cr0,L1053
>>>>>         cmplwi  cr0,r2,3
>>>>>         beq     cr0,L1054
>>>>>
>>>>> etc.
>>>>>
>>>>> Any hints?
>>>>
>>>> Use 2.1.1
>>>
>>> version 2.1.1 [2005/11/08] for powerpc
>>> Same result, Peter :|
>>
>> The 2.1.1 compiler contains code to generate jumptables for ppc, see
>> powerpc/nppcset.pas. You'll have to debug the compiler to see why it
>> doesn't use them.
>
> Haven't really debugged, but there is an cs_optimize in aktmodeswitches
> somewhere near where this code is called.
>
> Paul: do you enable some -O parameter? Play with it etc.

I think -O1r is the best optimization available for ppc-darwin, isn't
it ?

Regards,

Adriaan van Os

_______________________________________________
fpc-pascal maillist  -  [hidden email]
http://lists.freepascal.org/mailman/listinfo/fpc-pascal
Reply | Threaded
Open this post in threaded view
|

Re: CASE

Marco van de Voort
> Marco van de Voort wrote:

> > somewhere near where this code is called.
> >
> > Paul: do you enable some -O parameter? Play with it etc.
>
> I think -O1r is the best optimization available for ppc-darwin, isn't
> it ?

As far as I can see from the source, try -O1gr

both optimization and "for size" seem to be needed.
_______________________________________________
fpc-pascal maillist  -  [hidden email]
http://lists.freepascal.org/mailman/listinfo/fpc-pascal
Reply | Threaded
Open this post in threaded view
|

Re: CASE

Paul Davidson
Thanks Adriaan, Marco.
All is well :)

On Nov 9, 2005, at 4:38, Marco van de Voort wrote:

> -O1gr
P Davidson
Corax Networks Inc.
http://CoraxNetworks.com

IMPORTANT NOTICE:  This message is intended only for the use of the
individual or entity to which it is addressed. The message may contain
information that is privileged, confidential and exempt from disclosure
under applicable law.  If the reader of this message is not the
intended recipient, or the employee or agent responsible for delivering
the message to the intended recipient, you are notified that any
dissemination, distribution or copying of this communication is
strictly prohibited.  If you have received this communication in error,
please notify Corax Networks immediately by email at
[hidden email].  Thank you.

_______________________________________________
fpc-pascal maillist  -  [hidden email]
http://lists.freepascal.org/mailman/listinfo/fpc-pascal