fast integer multiplication

classic Classic list List threaded Threaded
13 messages Options
Reply | Threaded
Open this post in threaded view
|

fast integer multiplication

Vincent Snijders
Hi,

Suppose I have the following code:

var
   a,b: dword;
   c: qword;

begin
   a := 10000000;
   b := 20000000;
   c := a * b;
   writeln(c);
end.

Now, although c is large enough to contain the result only the lower
dword is filled. I can force correct results by using c := qword(a) * b,
but the slow fpc_mul_qword is used.

The code generated for the above sample is:
# [16] c:=a*b;
        movl U_P$PROJECT1_A,%edx
        movl U_P$PROJECT1_B,%eax
        mull %edx
        movl $0,%edx
        movl %eax,U_P$PROJECT1_C
        movl %edx,U_P$PROJECT1_C+4

What I want is the above code, but without the "movl $0,%edx"
instruction. Is there a way to do this (wihtout using fpc_mul_qword).

Regards,
Vincent.

_______________________________________________
fpc-pascal maillist  -  [hidden email]
http://lists.freepascal.org/mailman/listinfo/fpc-pascal
Reply | Threaded
Open this post in threaded view
|

Re: fast integer multiplication

Florian Klaempfl-2
Vincent Snijders wrote:

> Hi,
>
> Suppose I have the following code:
>
> var
>   a,b: dword;
>   c: qword;
>
> begin
>   a := 10000000;
>   b := 20000000;
>   c := a * b;
>   writeln(c);
> end.
>
> Now, although c is large enough to contain the result only the lower
> dword is filled. I can force correct results by using c := qword(a) * b,
> but the slow fpc_mul_qword is used.
>
> The code generated for the above sample is:
> # [16] c:=a*b;
>     movl    U_P$PROJECT1_A,%edx
>     movl    U_P$PROJECT1_B,%eax
>     mull    %edx
>     movl    $0,%edx
>     movl    %eax,U_P$PROJECT1_C
>     movl    %edx,U_P$PROJECT1_C+4
>
> What I want is the above code, but without the "movl $0,%edx"
> instruction. Is there a way to do this (wihtout using fpc_mul_qword).

Only assembler for now. Any suggestions how the compiler could be told
to generate such code?

_______________________________________________
fpc-pascal maillist  -  [hidden email]
http://lists.freepascal.org/mailman/listinfo/fpc-pascal
Reply | Threaded
Open this post in threaded view
|

Re: fast integer multiplication

Marcel Martin
Florian Klaempfl a écrit :

> Vincent Snijders wrote:
>
>
>>Hi,
>>
>>Suppose I have the following code:
>>
>>var
>>  a,b: dword;
>>  c: qword;
>>
>>begin
>>  a := 10000000;
>>  b := 20000000;
>>  c := a * b;
>>  writeln(c);
>>end.
>>
>>Now, although c is large enough to contain the result only the lower
>>dword is filled. I can force correct results by using c := qword(a) * b,
>>but the slow fpc_mul_qword is used.
>>
>>The code generated for the above sample is:
>># [16] c:=a*b;
>>    movl    U_P$PROJECT1_A,%edx
>>    movl    U_P$PROJECT1_B,%eax
>>    mull    %edx
>>    movl    $0,%edx
>>    movl    %eax,U_P$PROJECT1_C
>>    movl    %edx,U_P$PROJECT1_C+4
>>
>>What I want is the above code, but without the "movl $0,%edx"
>>instruction. Is there a way to do this (wihtout using fpc_mul_qword).
>
>
> Only assembler for now. Any suggestions how the compiler could be told
> to generate such code?

A small function (like the one I asked for a few weeks ago :-)
that would be inlined by the compiler?

Currently I use this one

function UI32x32To64(A,B: Longword): QWord;
assembler; register; nostackframe;
asm
     mull %edx
end;

It is fast but certainly much less than if it were inlined.

MM

_______________________________________________
fpc-pascal maillist  -  [hidden email]
http://lists.freepascal.org/mailman/listinfo/fpc-pascal
Reply | Threaded
Open this post in threaded view
|

Re: fast integer multiplication

Marcel Martin
In reply to this post by Florian Klaempfl-2
Florian Klaempfl a écrit :

> Vincent Snijders wrote:
>
>
>>Hi,
>>
>>Suppose I have the following code:
>>
>>var
>>  a,b: dword;
>>  c: qword;
>>
>>begin
>>  a := 10000000;
>>  b := 20000000;
>>  c := a * b;
>>  writeln(c);
>>end.
>>
>>Now, although c is large enough to contain the result only the lower
>>dword is filled. I can force correct results by using c := qword(a) * b,
>>but the slow fpc_mul_qword is used.
>>
>>The code generated for the above sample is:
>># [16] c:=a*b;
>>    movl    U_P$PROJECT1_A,%edx
>>    movl    U_P$PROJECT1_B,%eax
>>    mull    %edx
>>    movl    $0,%edx
>>    movl    %eax,U_P$PROJECT1_C
>>    movl    %edx,U_P$PROJECT1_C+4
>>
>>What I want is the above code, but without the "movl $0,%edx"
>>instruction. Is there a way to do this (wihtout using fpc_mul_qword).
>
>
> Only assembler for now. Any suggestions how the compiler could be told
> to generate such code?

Since you ask the question, I suppose you cannot simply
suppress the "movl $0,%edx" line generation [*]. What about
using a compiler directive, something like {$EXTENDEDMUL ON/OFF}
for instance?

[*] Note that because of this line, currently FPC has not a
"coherent" behaviour. If you write x := y*z with x: Word and
y,z: Byte or with x: Longword and y,z: Word, the resulting value
x is the exact multiplication result and not the result truncated
to the size of the multiplicand and multiplicator type (like it
is with x: QWord and y,z: Longword).

MM


_______________________________________________
fpc-pascal maillist  -  [hidden email]
http://lists.freepascal.org/mailman/listinfo/fpc-pascal
Reply | Threaded
Open this post in threaded view
|

Re: fast integer multiplication

Nikolay Nikolov
In reply to this post by Florian Klaempfl-2
Florian Klaempfl wrote:

>Vincent Snijders wrote:
>
>  
>
>>Hi,
>>
>>Suppose I have the following code:
>>
>>var
>>  a,b: dword;
>>  c: qword;
>>
>>begin
>>  a := 10000000;
>>  b := 20000000;
>>  c := a * b;
>>  writeln(c);
>>end.
>>
>>Now, although c is large enough to contain the result only the lower
>>dword is filled. I can force correct results by using c := qword(a) * b,
>>but the slow fpc_mul_qword is used.
>>
>>The code generated for the above sample is:
>># [16] c:=a*b;
>>    movl    U_P$PROJECT1_A,%edx
>>    movl    U_P$PROJECT1_B,%eax
>>    mull    %edx
>>    movl    $0,%edx
>>    movl    %eax,U_P$PROJECT1_C
>>    movl    %edx,U_P$PROJECT1_C+4
>>
>>What I want is the above code, but without the "movl $0,%edx"
>>instruction. Is there a way to do this (wihtout using fpc_mul_qword).
>>    
>>
>
>Only assembler for now. Any suggestions how the compiler could be told
>to generate such code?
>  
>
IMHO it should automatically optimize expressions like:
  qword(a) * b
where a and b are dwords


_______________________________________________
fpc-pascal maillist  -  [hidden email]
http://lists.freepascal.org/mailman/listinfo/fpc-pascal
Reply | Threaded
Open this post in threaded view
|

Re: fast integer multiplication

Peter Vreman
In reply to this post by Marcel Martin
> Florian Klaempfl a écrit :
>> Vincent Snijders wrote:
>>
>>
>>>Hi,
>>>
>>>Suppose I have the following code:
>>>
>>>var
>>>  a,b: dword;
>>>  c: qword;
>>>
>>>begin
>>>  a := 10000000;
>>>  b := 20000000;
>>>  c := a * b;
>>>  writeln(c);
>>>end.
>>>
>>>Now, although c is large enough to contain the result only the lower
>>>dword is filled. I can force correct results by using c := qword(a) * b,
>>>but the slow fpc_mul_qword is used.
>>>
>>>The code generated for the above sample is:
>>># [16] c:=a*b;
>>>    movl    U_P$PROJECT1_A,%edx
>>>    movl    U_P$PROJECT1_B,%eax
>>>    mull    %edx
>>>    movl    $0,%edx
>>>    movl    %eax,U_P$PROJECT1_C
>>>    movl    %edx,U_P$PROJECT1_C+4
>>>
>>>What I want is the above code, but without the "movl $0,%edx"
>>>instruction. Is there a way to do this (wihtout using fpc_mul_qword).

No. At the time a*b is calculated there is nothing known about the size of
the result.


>> Only assembler for now. Any suggestions how the compiler could be told
>> to generate such code?
>
> Since you ask the question, I suppose you cannot simply
> suppress the "movl $0,%edx" line generation [*]. What about
> using a compiler directive, something like {$EXTENDEDMUL ON/OFF}
> for instance?

I don't like this. This means we have to add directives for all kind of
special cpu features to improve some special corner cases.




_______________________________________________
fpc-pascal maillist  -  [hidden email]
http://lists.freepascal.org/mailman/listinfo/fpc-pascal
Reply | Threaded
Open this post in threaded view
|

Re: fast integer multiplication

Florian Klaempfl-3
Peter Vreman wrote:

>>Florian Klaempfl a ?crit :
>>
>>>Vincent Snijders wrote:
>>>
>>>
>>>
>>>>Hi,
>>>>
>>>>Suppose I have the following code:
>>>>
>>>>var
>>>> a,b: dword;
>>>> c: qword;
>>>>
>>>>begin
>>>> a := 10000000;
>>>> b := 20000000;
>>>> c := a * b;
>>>> writeln(c);
>>>>end.
>>>>
>>>>Now, although c is large enough to contain the result only the lower
>>>>dword is filled. I can force correct results by using c := qword(a) * b,
>>>>but the slow fpc_mul_qword is used.
>>>>
>>>>The code generated for the above sample is:
>>>># [16] c:=a*b;
>>>>   movl    U_P$PROJECT1_A,%edx
>>>>   movl    U_P$PROJECT1_B,%eax
>>>>   mull    %edx
>>>>   movl    $0,%edx
>>>>   movl    %eax,U_P$PROJECT1_C
>>>>   movl    %edx,U_P$PROJECT1_C+4
>>>>
>>>>What I want is the above code, but without the "movl $0,%edx"
>>>>instruction. Is there a way to do this (wihtout using fpc_mul_qword).
>
>
> No. At the time a*b is calculated there is nothing known about the size of
> the result.

a*b knows that it should do a qword multiplication and it can check if both
operands are dword. It could remove the type casts then and do a mulll.

>
>
>
>>>Only assembler for now. Any suggestions how the compiler could be told
>>>to generate such code?
>>
>>Since you ask the question, I suppose you cannot simply
>>suppress the "movl $0,%edx" line generation [*]. What about
>>using a compiler directive, something like {$EXTENDEDMUL ON/OFF}
>>for instance?
>
>
> I don't like this. This means we have to add directives for all kind of
> special cpu features to improve some special corner cases.

Me neither.


_______________________________________________
fpc-pascal maillist  -  [hidden email]
http://lists.freepascal.org/mailman/listinfo/fpc-pascal
Reply | Threaded
Open this post in threaded view
|

Re: fast integer multiplication

Marcel Martin
In reply to this post by Peter Vreman
Peter Vreman a écrit :
 >Marcel Martin wrote :
>>Since you ask the question, I suppose you cannot simply
>>suppress the "movl $0,%edx" line generation [*]. What about
>>using a compiler directive, something like {$EXTENDEDMUL ON/OFF}
>>for instance?
>
>
> I don't like this. This means we have to add directives for all kind of
> special cpu features to improve some special corner cases.

But it is a special case because FPC makes it special. As I wrote
yesterday, when you write Word := Byte * Byte, what you get is
the exact result (not the result mod 2^8), when you write
Longword := Word * Word, what you get is the exact result (not the
Result mod 2^16), but when you write QWord := Longword * Longword,
there, what you get is not (always) the exact result but the result
mod 2^32. This is FPC which makes special this case by setting the
product high part to 0 (what it doesn't do with other cases).

Being said that, I agree with you that adding compiler directives
is not a good thing [*]. Here, it would be sufficient to suppress
the generation of the line "movl $0,%edx". Since you do not do so,
I suppose there would be some side effects I don't see at the
moment, but it would be the simplest solution.

[*] If QWord := Longword * Longword gave the exact result, it would
allow me to suppress a lot of compiler directives :-)

MM

_______________________________________________
fpc-pascal maillist  -  [hidden email]
http://lists.freepascal.org/mailman/listinfo/fpc-pascal
Reply | Threaded
Open this post in threaded view
|

Re: fast integer multiplication

Jonas Maebe-2

On 29 Jul 2005, at 11:36, Marcel Martin wrote:

> But it is a special case because FPC makes it special.

No, it's simply a consequence of the fact that Pascal performs all  
calculations using the native integer type by default.


Jonas

_______________________________________________
fpc-pascal maillist  -  [hidden email]
http://lists.freepascal.org/mailman/listinfo/fpc-pascal
Reply | Threaded
Open this post in threaded view
|

Re: fast integer multiplication

Peter Vreman
In reply to this post by Marcel Martin
>>>Since you ask the question, I suppose you cannot simply
>>>suppress the "movl $0,%edx" line generation [*]. What about
>>>using a compiler directive, something like {$EXTENDEDMUL ON/OFF}
>>>for instance?
>>
>> I don't like this. This means we have to add directives for all kind of
>> special cpu features to improve some special corner cases.
>
> But it is a special case because FPC makes it special. As I wrote
> yesterday, when you write Word := Byte * Byte, what you get is
> the exact result (not the result mod 2^8), when you write
> Longword := Word * Word, what you get is the exact result (not the
> Result mod 2^16), but when you write QWord := Longword * Longword,
> there, what you get is not (always) the exact result but the result
> mod 2^32. This is FPC which makes special this case by setting the
> product high part to 0 (what it doesn't do with other cases).
>
> Being said that, I agree with you that adding compiler directives
> is not a good thing [*]. Here, it would be sufficient to suppress
> the generation of the line "movl $0,%edx". Since you do not do so,
> I suppose there would be some side effects I don't see at the
> moment, but it would be the simplest solution.
>
> [*] If QWord := Longword * Longword gave the exact result, it would
> allow me to suppress a lot of compiler directives :-)

The rules are the same as with int64:=longint*longint. Below is a test.
The result is -2 on both fpc and delphi.

var
  e : int64;
  i,j : longint;

begin
  i:=$7fffffff;
  j:=2;
  e:=i*j;
  writeln(e);
end.




_______________________________________________
fpc-pascal maillist  -  [hidden email]
http://lists.freepascal.org/mailman/listinfo/fpc-pascal
Reply | Threaded
Open this post in threaded view
|

Re: fast integer multiplication

Marcel Martin
Peter Vreman a écrit :

>>>>Since you ask the question, I suppose you cannot simply
>>>>suppress the "movl $0,%edx" line generation [*]. What about
>>>>using a compiler directive, something like {$EXTENDEDMUL ON/OFF}
>>>>for instance?
>>>
>>>I don't like this. This means we have to add directives for all kind of
>>>special cpu features to improve some special corner cases.
>>
>>But it is a special case because FPC makes it special. As I wrote
>>yesterday, when you write Word := Byte * Byte, what you get is
>>the exact result (not the result mod 2^8), when you write
>>Longword := Word * Word, what you get is the exact result (not the
>>Result mod 2^16), but when you write QWord := Longword * Longword,
>>there, what you get is not (always) the exact result but the result
>>mod 2^32. This is FPC which makes special this case by setting the
>>product high part to 0 (what it doesn't do with other cases).
>>
>>Being said that, I agree with you that adding compiler directives
>>is not a good thing [*]. Here, it would be sufficient to suppress
>>the generation of the line "movl $0,%edx". Since you do not do so,
>>I suppose there would be some side effects I don't see at the
>>moment, but it would be the simplest solution.
>>
>>[*] If QWord := Longword * Longword gave the exact result, it would
>>allow me to suppress a lot of compiler directives :-)
>
>
> The rules are the same as with int64:=longint*longint. Below is a test.
> The result is -2 on both fpc and delphi.
>
> var
>   e : int64;
>   i,j : longint;
>
> begin
>   i:=$7fffffff;
>   j:=2;
>   e:=i*j;
>   writeln(e);
> end.

I was talking about the unsigned cases (but, yes, this is the
same with signed ones).

Now, concerning "both fpc and delphi". That Delphi code can be
compiled with FPC is rather good thing but FPC has not to follow
Delphi. The future of Pascal is not Delphi, it is FreePascal. And,
no, for once, I am not joking. :-)

I recently downloaded Lazarus. I wanted to see whether it could
be possible to adapt one of my softwares ("Primo", see at
http://www.ellipsa.net, written with Delphi 5). After having played
a few hours with Lazarus, I estimated that it was feasible and that
it could take me two weeks to do it. I was wrong. It took me less
than three days! Yes, in less than three days, a version of Primo
entirely written with FreePascal and Lazarus was working. Well,
there still are many "cosmetic" problems but it works.
I think that, in at most one year, it will be possible to write
very good programs with FreePascal and Lazarus (I am talking of
programs having clean GUI's).

MM

BTW, since a few days, I have big problems with this list. Most of
my posts come back with messages like:

   Diagnostic-Code: X-Postfix; host lists.freepascal.org[130.161.36.93]
   said: 550-32 x Worm.Lovgate.Z. Last seen 2005-07-29 11:01:07.905214
   (CET) 550 mail from 213.228.0.176 rejected: administrative
   prohibition (host is blacklisted) (in reply to RCPT TO command)

No, there is no worm on my PC ;-)

_______________________________________________
fpc-pascal maillist  -  [hidden email]
http://lists.freepascal.org/mailman/listinfo/fpc-pascal
Reply | Threaded
Open this post in threaded view
|

Re: fast integer multiplication

Peter Vreman
> Now, concerning "both fpc and delphi". That Delphi code can be
> compiled with FPC is rather good thing but FPC has not to follow
> Delphi. The future of Pascal is not Delphi, it is FreePascal. And,
> no, for once, I am not joking. :-)

Most users expect the same behaviour in delphi and fpc. We get a lot of
bug reports if we make something incompatible. Even if we document things
properly  the users will start complaining. And projects like CrossFPC are
requesting even more compatibility with Kylix/Delphi.




_______________________________________________
fpc-pascal maillist  -  [hidden email]
http://lists.freepascal.org/mailman/listinfo/fpc-pascal
Reply | Threaded
Open this post in threaded view
|

Re: fast integer multiplication

Gerhard Scholz
In reply to this post by Marcel Martin
...

>>The code generated for the above sample is:
>># [16] c:=a*b;
>>    movl    U_P$PROJECT1_A,%edx
>>    movl    U_P$PROJECT1_B,%eax
>>    mull    %edx
>>    movl    $0,%edx
>>    movl    %eax,U_P$PROJECT1_C
>>    movl    %edx,U_P$PROJECT1_C+4
>>
>>What I want is the above code, but without the "movl $0,%edx"
>>instruction. Is there a way to do this (wihtout using fpc_mul_qword).
>
>
> Only assembler for now. Any suggestions how the compiler could be told
> to generate such code?
...
>function UI32x32To64(A,B: Longword): QWord;
>assembler; register; nostackframe;
>asm
>    mull %edx
>end;
>
>It is fast but certainly much less than if it were inlined.

My suggestion would be:

  FUNCTION lmul ( CONST a,
                        b : LongInt ) : int64 ; inline ;

    BEGIN
{$ifdef cpu86}
      ASM
        movl    a,%eax
        imull   b
        movl    %eax,lmul
        movl    %edx,lmul+4
       END ;
{$else}
{$ifdef cpu68k}
      lmul := int64 ( a ) * b ;
{$else}
{$ifdef cpusparc}
      lmul := int64 ( a ) * b ;
{$else}
{$ifdef cpualpha}
      lmul := int64 ( a ) * b ;
{$else}
{$ifdef cpupowerpc}
      lmul := int64 ( a ) * b ;
{$else}
      lmul := int64 ( a ) * b ;
{$endif}
{$endif}
{$endif}
{$endif}
{$endif}
     END ;

and similar for unsigned mul.

(shortened here; full code in ulmul.pas; liitle test in tmuls.pas, timing
routines in
wtimer/tdrsc1)

Is portable so code doesn't need to be rewritten when compiled for other
processors (but not optimal then)

Tested only on i386.

Seems to be faster than standard multiplication (interesting: significantly
faster for signed mul than for unsigned mul), can be assembly-coded in the
branches for the other cpus
(if there are opcodes for such a multiplication - I don't know), could go to
unit math.pp.

It seems that routines which contain assembler are not inlined; on the day
somebody finds a trick to inline such code they should be really fast.

Gerhard

_______________________________________________
fpc-pascal maillist  -  [hidden email]
http://lists.freepascal.org/mailman/listinfo/fpc-pascal

ULMUL.pas (2K) Download Attachment
tmuls.pas (1K) Download Attachment
WTIMER.pas (1K) Download Attachment
TDRSC1.pas (2K) Download Attachment