Branch table

classic Classic list List threaded Threaded
13 messages Options
Reply | Threaded
Open this post in threaded view
|

Branch table

Free Pascal - General mailing list
Why the code below does exit gracefully without prints anything?
Sure, it is for my poor knowledge of the assembler, but in some details,
please...

program branch;
{$ASMMODE intel}
label next,stop,a,b,c;
var idx:byte;
begin
write('Index? ');
readln(idx);
asm
mov ax,idx;
shl ax,2;
mov bx,next;
add bx,ax;
jmp (*short*) [ebx+4];
next:
jmp a;
jmp b;
jmp c;
end['EAX','EBX'];
a:
writeln('0');
goto stop;
b:
writeln('1');
goto stop;
c:
writeln('2');
stop:
writeln('stop');
end.

Just another question: why the short modifier is unrecognized?
Thanks for any help in this holydays time,
Marco
_______________________________________________
fpc-pascal maillist  -  [hidden email]
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
Reply | Threaded
Open this post in threaded view
|

Re: Branch table

Giuliano Colla
Il 14/08/2018 15:21, Marco Borsari via fpc-pascal ha scritto:

> Why the code below does exit gracefully without prints anything?
> Sure, it is for my poor knowledge of the assembler, but in some
> details, please...
>
> program branch;
> {$ASMMODE intel}
> label next,stop,a,b,c;
> var idx:byte;
> begin
> write('Index? ');
> readln(idx);
> asm
> mov ax,idx;
> shl ax,2;
> mov bx,next;
> add bx,ax;
> jmp (*short*) [ebx+4];
> next:
> jmp a;
> jmp b;
> jmp c;
> end['EAX','EBX'];
> a:
> writeln('0');
> goto stop;
> b:
> writeln('1');
> goto stop;
> c:
> writeln('2');
> stop:
> writeln('stop');
> end.
>
> Just another question: why the short modifier is unrecognized?
> Thanks for any help in this holydays time,
> Marco

I tested your code in my environment (Linux - x86_64), compiled with fpc
-g -a branch.pas. (-g is to add debug information, -a to get the
assembler listing).
Then launched it with gdb debugger:
 >gdb branch
(gdb)run

The output is the following:

(gdb) run
Starting program: /home/colla/Applicazioni/Lazarus/Branch/branch
Index? 1

Program received signal SIGSEGV, Segmentation fault.
main () at branch.pas:13
13      jmp (*short*) [ebx+4];

The first thing I notice is that you load BX, which is a 16 bit register
(the lower 16 bits of EBX), with the value of "next", and then Jump to
the content of EBX which is a 32 bit register, whose lower 16 bits have
been loaded but whose upper 16 bits are undefined. A SIGSEV is therefore
to be expected. If you are in a 32 Bit environment you should use EBX,
if you're in a 64 Bit environment you should use RBX.

What I would do, if I were in your place, would be to rewrite your
program in pure Pascal, compile it with the -a switch to get the
assembler listing, and use the compiler generated assembler code as a
guideline to your optimized assembler.

Giuliano

--
Do not do to others as you would have them do to you.They might have different tastes.

_______________________________________________
fpc-pascal maillist  -  [hidden email]
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
Reply | Threaded
Open this post in threaded view
|

Re: Branch table

Free Pascal - General mailing list
Il 16/08/2018 17:40, Giuliano Colla ha scritto:

> The first thing I notice is that you load BX, which is a 16 bit register
> (the lower 16 bits of EBX), with the value of "next", and then Jump to
> the content of EBX which is a 32 bit register, whose lower 16 bits have
> been loaded but whose upper 16 bits are undefined. A SIGSEV is therefore
> to be expected. If you are in a 32 Bit environment you should use EBX,
> if you're in a 64 Bit environment you should use RBX.

Thank you, I have made the correction you suggested, but I obtain a
segmentation fault again. Indeed new code is

program branch;
{$ASMMODE intel}
label next,stop,a,b,c;
var idx:byte;
begin
write('Index? ');
readln(idx);
asm
mov al,idx;
shl ax,2;
mov ebx,next;
add ebx,eax;
jmp ebx;
next:
jmp a;
jmp b;
jmp c;
end['EAX','EBX'];
a:
writeln('0');
goto stop;
b:
writeln('1');
goto stop;
c:
writeln('2');
stop:
writeln('stop');
end.

Also for what I understand jmp does not require to refer to the address
in the register.

> What I would do, if I were in your place, would be to rewrite your
> program in pure Pascal, compile it with the -a switch to get the
> assembler listing, and use the compiler generated assembler code as a
> guideline to your optimized assembler.

I found that someone made that for me at link
http://www.mindfruit.co.uk/2012/01/switch-blocks-jump-tables.html
and I have tried to rewrite the program in at&t syntax

program branch2;
{$ASMMODE att}
label next,stop,a,b,c;
var idx:byte;
begin
write('Index? ');
readln(idx);
asm
movb idx,%al
jmp *next(,%eax,4)
next:
jmp a;
jmp b;
jmp c;
end['EAX'];
a:
writeln('0');
goto stop;
b:
writeln('1');
goto stop;
c:
writeln('2');
stop:
writeln('stop');
end.

but this fails compilation at line 10 with messages
Unsupported symbol type for operand
Unknown identifier 'NEXT'
Well, I think I can live without it, it is only for
the sake of curiosity,
Marco
_______________________________________________
fpc-pascal maillist  -  [hidden email]
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
Reply | Threaded
Open this post in threaded view
|

Re: Branch table

Giuliano Colla

Il 17/08/2018 15:46, Marco Borsari via fpc-pascal ha scritto:

I found that someone made that for me at link
http://www.mindfruit.co.uk/2012/01/switch-blocks-jump-tables.html
and I have tried to rewrite the program in at&t syntax

program branch2;
{$ASMMODE att}
label next,stop,a,b,c;
var idx:byte;
begin
write('Index? ');
readln(idx);
asm
movb idx,%al
jmp *next(,%eax,4)
next:
jmp a;
jmp b;
jmp c;
end['EAX'];
a:
writeln('0');
goto stop;
b:
writeln('1');
goto stop;
c:
writeln('2');
stop:
writeln('stop');
end.

but this fails compilation at line 10 with messages
Unsupported symbol type for operand
Unknown identifier 'NEXT'
Well, I think I can live without it, it is only for
the sake of curiosity,
Marco

I modified your code, to add a jump table (as it is in the example you mention) and here it works just fine. The code which works is:
program branch2;
{$ASMMODE att}
label stop,a,b,c;
var idx:byte;
    Jtab: array [0..2] of pointer;
begin
// Fill up the Jump table with the addresses of jump targets
Jtab[0] := @a;
Jtab[1] := @b;
Jtab[2] := @c;

write('Index? ');
readln(idx);
asm
mov $0x0,%eax // clear upper bits of eax
movb idx,%al  // so that it contains just idx
jmpq *Jtab(,%eax,8) // 8 is for x86_64; replace with 4 for i386
end['EAX'];
a:
writeln('0');
goto stop;
b:
writeln('1');
goto stop;
c:
writeln('2');
stop:
writeln('stop');
end. 
Enjoy programming!

Giuliano
-- 
Do not do to others as you would have them do to you.They might have different tastes.

_______________________________________________
fpc-pascal maillist  -  [hidden email]
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
Reply | Threaded
Open this post in threaded view
|

Re: Branch table

Free Pascal - General mailing list
Il 17/08/2018 18:04, Giuliano Colla ha scritto:
>
> Enjoy programming!
>
> Giuliano

Thank you!
_______________________________________________
fpc-pascal maillist  -  [hidden email]
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
Reply | Threaded
Open this post in threaded view
|

Re: Branch table

Free Pascal - General mailing list
In reply to this post by Giuliano Colla
Il 17/08/2018 18:04, Giuliano Colla ha scritto:

> I modified your code, to add a jump table (as it is in the example you
> mention)

I came to that

program branch;
{$ASMMODE intel}
label tab,stop,a,b,c;
var idx:byte;
begin
write('Index? ');
readln(idx);
asm
xor eax,eax;
mov al,idx;
shl ax,2;
mov ebx,tab;
add ebx,eax;
jmp ebx;
tab:
dd (*offset*) a;
dd (*offset*) b;
dd (*offset*) c;
end['EAX','EBX'];
a:
writeln('0');
goto stop;
b:
writeln('1');
goto stop;
c:
writeln('2');
stop:
writeln('stop');
end.

but it works only for index 0, and crashes otherwise. Maybe
a problem of alignment? Or must be tab declared in data section?
_______________________________________________
fpc-pascal maillist  -  [hidden email]
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
Reply | Threaded
Open this post in threaded view
|

Re: Branch table

Giuliano Colla
Il 18/08/2018 15:52, Marco Borsari via fpc-pascal ha scritto:

> it works only for index 0, and crashes otherwise. Maybe
> a problem of alignment? Or must be tab declared in data section?

On the Intel architecture you cannot perform pointer arithmetic as if a
pointer were just a number.
A pointer is composed of two parts: a "selector" and an "offset", which
must be handled separately.

If in place of your addition (add ebx,eax) you leave the appropriate
instruction to take care of putting together selector and offset, it works.
Just change your lines:

mov ebx,tab;
add ebx,eax;
jmp ebx;


Into:

jmp tab[eax]

and it will work just fine. Just tested.

Giuliano

--
Do not do to others as you would have them do to you.They might have different tastes.

_______________________________________________
fpc-pascal maillist  -  [hidden email]
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
Reply | Threaded
Open this post in threaded view
|

Re: Branch table

Free Pascal - General mailing list
Il 20/08/2018 17:32, Giuliano Colla ha scritto:

> On the Intel architecture you cannot perform pointer arithmetic as if a
> pointer were just a number.
> A pointer is composed of two parts: a "selector" and an "offset", which
> must be handled separately.

Ah, I saw, 32 bit segmentation is quite complicated.
Thank you twice, Marco

_______________________________________________
fpc-pascal maillist  -  [hidden email]
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
Reply | Threaded
Open this post in threaded view
|

Re: Branch table

Florian Klämpfl
Am 21.08.2018 um 11:42 schrieb Marco Borsari via fpc-pascal:
> Il 20/08/2018 17:32, Giuliano Colla ha scritto:
>
>> On the Intel architecture you cannot perform pointer arithmetic as if a
>> pointer were just a number.
>> A pointer is composed of two parts: a "selector" and an "offset", which
>> must be handled separately.
>
> Ah, I saw, 32 bit segmentation is quite complicated.
> Thank you twice, Marco

I still miss the point of a hand coded branch table ...

_______________________________________________
fpc-pascal maillist  -  [hidden email]
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
Reply | Threaded
Open this post in threaded view
|

Re: Branch table

Free Pascal - General mailing list
On Thu, 23 Aug 2018 09:32:58 +0200
Florian Klämpfl <[hidden email]> wrote:

> Am 21.08.2018 um 11:42 schrieb Marco Borsari via fpc-pascal:
> > Il 20/08/2018 17:32, Giuliano Colla ha scritto:
> >
> >> On the Intel architecture you cannot perform pointer arithmetic as if a
> >> pointer were just a number.
> >> A pointer is composed of two parts: a "selector" and an "offset", which
> >> must be handled separately.
> >
> > Ah, I saw, 32 bit segmentation is quite complicated.
> > Thank you twice, Marco
>
> I still miss the point of a hand coded branch table ...

It would be for the Wirth optimization in the access of an array,
when the index is 0 or 1, allowing the elimination of a multiplication.

case idx of
0: (*adr:=adr*);
1: adr:=adr+lel;
end
else adr:=adr+idx*lel;

It's half for paranoia and half for the desire to learn, I am sorry,
Marco
_______________________________________________
fpc-pascal maillist  -  [hidden email]
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
Reply | Threaded
Open this post in threaded view
|

Re: Branch table

Giuliano Colla

Il 23/08/2018 11:34, Marco Borsari via fpc-pascal ha scritto:

It would be for the Wirth optimization in the access of an array,
when the index is 0 or 1, allowing the elimination of a multiplication.

I'm afraid that this sort of optimization is rather outdated, and doesn't take into account a number of factors which make modern processors behave quite differently from old times ones.

First of all, saving a multiplication was important at the times when even integer multiplication was very costly in terms of machine cycles. Today one must carefully check if there's really any saving in replacing a multiplication (which is nowadays very fast) in just a couple of cases (index 0 and 1) with a number of checks and jumps for *all* cases.
Moreover, any compiler will align your array elements on 32bits boundaries (or 64bits if you're in a 64bits platform) for better efficiency, so that your multiplication will be a multiplication by a power of two, which all the modern compilers translate into a much faster shl. No doubt that adding checks and jumps to save a shl doesn't make much sense.

The second important factor to take into account is that modern processors do not execute instructions taken directly from the slower main memory, but they fetch them from the much faster cache. A jump instruction, if the target isn't very near, will generate a cache flush, thus dramatically increasing the execution time. That's why modern compilers try to avoid a branch table whenever possible. In the example you mentioned earlier (http://www.mindfruit.co.uk/2012/01/switch-blocks-jump-tables.html) you can read that gcc translates a switch (= fpc case) construct into a branch table with -O1 (minimal optimization), but with higher optimization (-O3) it avoids the branch table and translates the construct into a sequel of conditional jumps.
In any case for the above reason (avoid cache flushing with a jump target too far away) the code that I had proposed (branch table in the data area) is preferable to the one of your last attempt (branch table in the code area).

You may easily verify what solution provides the best efficiency in your platform, by performing a high number of iterations over your code, with random idx's in the range of your array size, and getting the total execution time (sort of: time your_prog). Keeping in mind that on a different platform (different cache size) it could behave quite differently.

Giuliano
-- 
Do not do to others as you would have them do to you.They might have different tastes.

_______________________________________________
fpc-pascal maillist  -  [hidden email]
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
Reply | Threaded
Open this post in threaded view
|

Re: Branch table

Florian Klämpfl
Am 26.08.2018 um 11:43 schrieb Giuliano Colla:
> Il 23/08/2018 11:34, Marco Borsari via fpc-pascal ha scritto:
>
>> It would be for the Wirth optimization in the access of an array,
>> when the index is 0 or 1, allowing the elimination of a multiplication.
>
> I'm afraid that this sort of optimization is rather outdated, and doesn't take into account a number of factors which
> make modern processors behave quite differently from old times ones.

Multiplication is much less expensive then a jmp in most cases on today's CPUs.

It is even the other way round, it is often useful to replace an if statement by a multiplication like:

if a>b then
  Inc(c,d);

can be replaced on modern CPUs by

Inc(c,d*ord(a>b));
_______________________________________________
fpc-pascal maillist  -  [hidden email]
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
Reply | Threaded
Open this post in threaded view
|

Re: Branch table

Free Pascal - General mailing list
On Sun, 26 Aug 2018 18:32:29 +0200
Florian Klämpfl <[hidden email]> wrote:

> Am 26.08.2018 um 11:43 schrieb Giuliano Colla:
> > Il 23/08/2018 11:34, Marco Borsari via fpc-pascal ha scritto:
> >
> >> It would be for the Wirth optimization in the access of an array,
> >> when the index is 0 or 1, allowing the elimination of a multiplication.
> >
> > I'm afraid that this sort of optimization is rather outdated, and doesn't take into account a number of factors which
> > make modern processors behave quite differently from old times ones.
>
> Multiplication is much less expensive then a jmp in most cases on today's CPUs.
>
> It is even the other way round, it is often useful to replace an if statement by a multiplication like:
>
> if a>b then
>   Inc(c,d);
>
> can be replaced on modern CPUs by
>
> Inc(c,d*ord(a>b));

Ah I see, anyway I did not think that, as Giuliano said, beacuse of alignment of
array element, the multiplication is already substituded by a shift, and the essence
of a branch table becomes superflous. Thank you for your time,
Marco
_______________________________________________
fpc-pascal maillist  -  [hidden email]
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal