Crypt++ 5.5.2 Tiger::Transform converted to DCPCrypt2.Compress, AV: Compiler Optimization Bug with SSE2 ? Or just programming bug ? HELP NEEDED :) ?!

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Crypt++ 5.5.2 Tiger::Transform converted to DCPCrypt2.Compress, AV: Compiler Optimization Bug with SSE2 ? Or just programming bug ? HELP NEEDED :) ?!

Skybuck Flying
Hello,

This is the tiger transform routine/procedure/method converted from the
Crypt++ 5.5.2 C/C++/Asm code to DCPCrypt2 Delphi/Pascal/Basm code.

It works with optimizations turned off.

As soon as optimizations are turned on, it crashes with an access violation
as soon as Index := 0; is executed.

I am not sure what the problem is because:

1. I didn't write the asm code.

2. I am not an asm expert.

I did my best converting it.

I think some possible causes might be:

1. Mis-aligned of byte boundaries ? (Maybe the C/C++/Asm had to do special
memory alignment ?)

2. Stack issue's ? (Not enough pushes or pops ?) (Framing issue?)

3. Compiler optimization bug ?

Who can tell for sure ?

Problem is in the Compress method.

// *** Begin of Unit ***

unit DCPtiger_optimized_version_201;

{******************************************************************************}
{* DCPcrypt v2.0 written by David Barton ([hidden email])
**********}
{******************************************************************************}
{* A binary compatible implementation of Tiger
********************************}
{******************************************************************************}
{* Copyright (c) 2002 David Barton
*}
{* Permission is hereby granted, free of charge, to any person obtaining a
*}
{* copy of this software and associated documentation files (the
"Software"), *}
{* to deal in the Software without restriction, including without limitation
*}
{* the rights to use, copy, modify, merge, publish, distribute, sublicense,
*}
{* and/or sell copies of the Software, and to permit persons to whom the
*}
{* Software is furnished to do so, subject to the following conditions:
*}
{*
*}
{* The above copyright notice and this permission notice shall be included
in *}
{* all copies or substantial portions of the Software.
*}
{*
*}
{* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS
OR *}
{* IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
*}
{* FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
*}
{* THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR
OTHER *}
{* LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
*}
{* FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
*}
{* DEALINGS IN THE SOFTWARE.
*}
{******************************************************************************}

{

Optimized version 2.01 created on 18 march 2008 by Skybuck Flying

ASM from Crypto++ 5.5.2

}

interface

uses
  Classes,
  Sysutils,
  DCPcrypt_version_201,
  DCPconst_version_201;

type
  TDCPOptimizedTiger = class(TDCP_hash)
  protected
    Len: int64;
 Index: DWord;
 CurrentHash: array[0..2] of int64;
 HashBuffer: array[0..63] of byte;
// procedure Compress;
 procedure Compress( Digest : Pint64; const X : Pint64 );

  public
 class function GetId: integer; override;
 class function GetAlgorithm: string; override;
    class function GetHashSize: integer; override;
    class function SelfTest: boolean; override;
    procedure Init; override;
    procedure Burn; override;
    procedure Update(const Buffer; Size: longword); override;
 procedure Final(var Digest); override;

  end;

{******************************************************************************}
{******************************************************************************}
implementation

{$R-}{$Q-}

{$INCLUDE DCPtiger_optimized_version_201.inc}

// for some reason it doesn't work when optimizations are turned on.

procedure TDCPOptimizedTiger.Compress( Digest : Pint64; const X : Pint64 );
// procedure TDCPOptimizedTiger.Compress;
//var
//  digest : pointer;
//  LocalX: array[0..7] of int64;
//  x : pointer;
begin
//  Move(HashBuffer,LocalX,Sizeof(LocalX));

//  Digest := @CurrentHash[0];
//  X := @HashBuffer[0];
  asm
   lea edx, [TigerTable]
   mov eax, digest

   mov esi, X

   movq mm0, [eax]
   movq mm1, [eax+1*8]

   movq mm5, mm1
   movq mm2, [eax+2*8]
   movq mm7, [edx+4*2048+0*8]
   movq mm6, [edx+4*2048+1*8]

   mov ecx, esp
   and esp, $fffffff0
   sub esp, 8*8
   push ecx

   xor ebx, ebx

   @label5:

   pxor mm2, [esi+0*8+ebx]
   movd ecx, mm2
   movzx edi, cl
   movq mm3, [edx+0*2048+edi*8]
   movzx edi, ch
   movq mm4, [edx+3*2048+edi*8]
   shr ecx, 16
   movzx edi, cl
   pxor mm3, [edx+1*2048+edi*8]
   movzx edi, ch
   pxor mm4, [edx+2*2048+edi*8]
   pextrw ecx, mm2, 2
   movzx edi, cl
   pxor mm3, [edx+2*2048+edi*8]
   movzx edi, ch
   pxor mm4, [edx+1*2048+edi*8]
   pextrw ecx, mm2, 3
   movzx edi, cl
   pxor mm3, [edx+3*2048+edi*8]
   psubq mm0, mm3
   movzx edi, ch
   pxor mm4, [edx+0*2048+edi*8]
   paddq mm1, mm4
   movq mm3, mm1
   psllq mm1, 2
   paddq mm1, mm3
   pxor mm0, [esi+1*8+ebx]
   movd ecx, mm0
   movzx edi, cl
   movq mm3, [edx+0*2048+edi*8]
   movzx edi, ch
   movq mm4, [edx+3*2048+edi*8]
   shr ecx, 16
   movzx edi, cl
   pxor mm3, [edx+1*2048+edi*8]
   movzx edi, ch
   pxor mm4, [edx+2*2048+edi*8]
   pextrw ecx, mm0, 2
   movzx edi, cl
   pxor mm3, [edx+2*2048+edi*8]
   movzx edi, ch
   pxor mm4, [edx+1*2048+edi*8]
   pextrw ecx, mm0, 3
   movzx edi, cl
   pxor mm3, [edx+3*2048+edi*8]
   psubq mm1, mm3
   movzx edi, ch
   pxor mm4, [edx+0*2048+edi*8]
   paddq mm2, mm4
   movq mm3, mm2
   psllq mm2, 2
   paddq mm2, mm3

   cmp ebx, 6*8
   je @labellabel2_5

   pxor mm1, [esi+2*8+ebx]
   movd ecx, mm1
   movzx edi, cl
   movq mm3, [edx+0*2048+edi*8]
   movzx edi, ch
   movq mm4, [edx+3*2048+edi*8]
   shr ecx, 16
   movzx edi, cl
   pxor mm3, [edx+1*2048+edi*8]
   movzx edi, ch
   pxor mm4, [edx+2*2048+edi*8]
   pextrw ecx, mm1, 2
   movzx edi, cl
   pxor mm3, [edx+2*2048+edi*8]
   movzx edi, ch
   pxor mm4, [edx+1*2048+edi*8]
   pextrw ecx, mm1, 3
   movzx edi, cl
   pxor mm3, [edx+3*2048+edi*8]
   psubq mm2, mm3
   movzx edi, ch
   pxor mm4, [edx+0*2048+edi*8]
   paddq mm0, mm4
   movq mm3, mm0
   psllq mm0, 2
   paddq mm0, mm3
   add ebx, 3*8

   jmp @label5

   @labellabel2_5:

   movq mm3, [esi+7*8]
   pxor mm3, mm6
   movq mm4, [esi+0*8]
   psubq mm4, mm3
   movq [esp+4+0*8], mm4
   pxor mm4, [esi+1*8]
   movq mm3, mm4
   movq [esp+4+1*8], mm4
   paddq mm4, [esi+2*8]
   pxor mm3, mm7
   psllq mm3, 19
   movq [esp+4+2*8], mm4
   pxor mm3, mm4
   movq mm4, [esi+3*8]
   psubq mm4, mm3
   movq [esp+4+3*8], mm4
   pxor mm4, [esi+4*8]
   movq mm3, mm4
   movq [esp+4+4*8], mm4
   paddq mm4, [esi+5*8]
   pxor mm3, mm7
   psrlq mm3, 23
   movq [esp+4+5*8], mm4
   pxor mm3, mm4
   movq mm4, [esi+6*8]
   psubq mm4, mm3
   movq [esp+4+6*8], mm4
   pxor mm4, [esi+7*8]
   movq mm3, mm4
   movq [esp+4+7*8], mm4
   paddq mm4, [esp+4+0*8]
   pxor mm3, mm7
   psllq mm3, 19
   movq [esp+4+0*8], mm4
   pxor mm3, mm4
   movq mm4, [esp+4+1*8]
   psubq mm4, mm3
   movq [esp+4+1*8], mm4
   pxor mm4, [esp+4+2*8]
   movq mm3, mm4
   movq [esp+4+2*8], mm4
   paddq mm4, [esp+4+3*8]
   pxor mm3, mm7
   psrlq mm3, 23
   movq [esp+4+3*8], mm4
   pxor mm3, mm4
   movq mm4, [esp+4+4*8]
   psubq mm4, mm3
   movq [esp+4+4*8], mm4
   pxor mm4, [esp+4+5*8]
   movq [esp+4+5*8], mm4
   paddq mm4, [esp+4+6*8]
   movq [esp+4+6*8], mm4
   pxor mm4, [edx+4*2048+2*8]
   movq mm3, [esp+4+7*8]
   psubq mm3, mm4
   movq [esp+4+7*8], mm3

   xor ebx, ebx

   @label7:

   pxor mm1, [esp+4+0*8+ebx]
   movd ecx, mm1
   movzx edi, cl
   movq mm3, [edx+0*2048+edi*8]
   movzx edi, ch
   movq mm4, [edx+3*2048+edi*8]
   shr ecx, 16
   movzx edi, cl
   pxor mm3, [edx+1*2048+edi*8]
   movzx edi, ch
   pxor mm4, [edx+2*2048+edi*8]
   pextrw ecx, mm1, 2
   movzx edi, cl
   pxor mm3, [edx+2*2048+edi*8]
   movzx edi, ch
   pxor mm4, [edx+1*2048+edi*8]
   pextrw ecx, mm1, 3
   movzx edi, cl
   pxor mm3, [edx+3*2048+edi*8]
   psubq mm2, mm3
   movzx edi, ch
   pxor mm4, [edx+0*2048+edi*8]
   paddq mm0, mm4
   movq mm3, mm0
   psllq mm0, 3
   psubq mm0, mm3
   pxor mm2, [esp+4+1*8+ebx]
   movd ecx, mm2
   movzx edi, cl
   movq mm3, [edx+0*2048+edi*8]
   movzx edi, ch
   movq mm4, [edx+3*2048+edi*8]
   shr ecx, 16
   movzx edi, cl
   pxor mm3, [edx+1*2048+edi*8]
   movzx edi, ch
   pxor mm4, [edx+2*2048+edi*8]
   pextrw ecx, mm2, 2
   movzx edi, cl
   pxor mm3, [edx+2*2048+edi*8]
   movzx edi, ch
   pxor mm4, [edx+1*2048+edi*8]
   pextrw ecx, mm2, 3
   movzx edi, cl
   pxor mm3, [edx+3*2048+edi*8]
   psubq mm0, mm3
   movzx edi, ch
   pxor mm4, [edx+0*2048+edi*8]
   paddq mm1, mm4
   movq mm3, mm1
   psllq mm1, 3
   psubq mm1, mm3

   cmp ebx, 6*8
   je @labellabel2_7

   pxor mm0, [esp+4+2*8+ebx]
   movd ecx, mm0
   movzx edi, cl
   movq mm3, [edx+0*2048+edi*8]
   movzx edi, ch
   movq mm4, [edx+3*2048+edi*8]
   shr ecx, 16
   movzx edi, cl
   pxor mm3, [edx+1*2048+edi*8]
   movzx edi, ch
   pxor mm4, [edx+2*2048+edi*8]
   pextrw ecx, mm0, 2
   movzx edi, cl
   pxor mm3, [edx+2*2048+edi*8]
   movzx edi, ch
   pxor mm4, [edx+1*2048+edi*8]
   pextrw ecx, mm0, 3
   movzx edi, cl
   pxor mm3, [edx+3*2048+edi*8]
   psubq mm1, mm3
   movzx edi, ch
   pxor mm4, [edx+0*2048+edi*8]
   paddq mm2, mm4
   movq mm3, mm2
   psllq mm2, 3
   psubq mm2, mm3
   add ebx, 3*8

   jmp @label7

   @labellabel2_7:

   movq mm3, [esp+4+7*8]
   pxor mm3, mm6
   movq mm4, [esp+4+0*8]
   psubq mm4, mm3
   movq [esp+4+0*8], mm4
   pxor mm4, [esp+4+1*8]
   movq mm3, mm4
   movq [esp+4+1*8], mm4
   paddq mm4, [esp+4+2*8]
   pxor mm3, mm7
   psllq mm3, 19
   movq [esp+4+2*8], mm4
   pxor mm3, mm4
   movq mm4, [esp+4+3*8]
   psubq mm4, mm3
   movq [esp+4+3*8], mm4
   pxor mm4, [esp+4+4*8]
   movq mm3, mm4
   movq [esp+4+4*8], mm4
   paddq mm4, [esp+4+5*8]
   pxor mm3, mm7
   psrlq mm3, 23
   movq [esp+4+5*8], mm4
   pxor mm3, mm4
   movq mm4, [esp+4+6*8]
   psubq mm4, mm3
   movq [esp+4+6*8], mm4
   pxor mm4, [esp+4+7*8]
   movq mm3, mm4
   movq [esp+4+7*8], mm4
   paddq mm4, [esp+4+0*8]
   pxor mm3, mm7
   psllq mm3, 19
   movq [esp+4+0*8], mm4
   pxor mm3, mm4
   movq mm4, [esp+4+1*8]
   psubq mm4, mm3
   movq [esp+4+1*8], mm4
   pxor mm4, [esp+4+2*8]
   movq mm3, mm4
   movq [esp+4+2*8], mm4
   paddq mm4, [esp+4+3*8]
   pxor mm3, mm7
   psrlq mm3, 23
   movq [esp+4+3*8], mm4
   pxor mm3, mm4
   movq mm4, [esp+4+4*8]
   psubq mm4, mm3
   movq [esp+4+4*8], mm4
   pxor mm4, [esp+4+5*8]
   movq [esp+4+5*8], mm4
   paddq mm4, [esp+4+6*8]
   movq [esp+4+6*8], mm4
   pxor mm4, [edx+4*2048+2*8]
   movq mm3, [esp+4+7*8]
   psubq mm3, mm4
   movq [esp+4+7*8], mm3

   xor ebx, ebx

   @label9:

   pxor mm0, [esp+4+0*8+ebx]
   movd ecx, mm0
   movzx edi, cl
   movq mm3, [edx+0*2048+edi*8]
   movzx edi, ch
   movq mm4, [edx+3*2048+edi*8]
   shr ecx, 16
   movzx edi, cl
   pxor mm3, [edx+1*2048+edi*8]
   movzx edi, ch
   pxor mm4, [edx+2*2048+edi*8]
   pextrw ecx, mm0, 2
   movzx edi, cl
   pxor mm3, [edx+2*2048+edi*8]
   movzx edi, ch
   pxor mm4, [edx+1*2048+edi*8]
   pextrw ecx, mm0, 3
   movzx edi, cl
   pxor mm3, [edx+3*2048+edi*8]
   psubq mm1, mm3
   movzx edi, ch
   pxor mm4, [edx+0*2048+edi*8]
   paddq mm2, mm4
   movq mm3, mm2
   psllq mm2, 3
   paddq mm2, mm3
   pxor mm1, [esp+4+1*8+ebx]
   movd ecx, mm1
   movzx edi, cl
   movq mm3, [edx+0*2048+edi*8]
   movzx edi, ch
   movq mm4, [edx+3*2048+edi*8]
   shr ecx, 16
   movzx edi, cl
   pxor mm3, [edx+1*2048+edi*8]
   movzx edi, ch
   pxor mm4, [edx+2*2048+edi*8]
   pextrw ecx, mm1, 2
   movzx edi, cl
   pxor mm3, [edx+2*2048+edi*8]
   movzx edi, ch
   pxor mm4, [edx+1*2048+edi*8]
   pextrw ecx, mm1, 3
   movzx edi, cl
   pxor mm3, [edx+3*2048+edi*8]
   psubq mm2, mm3
   movzx edi, ch
   pxor mm4, [edx+0*2048+edi*8]
   paddq mm0, mm4
   movq mm3, mm0
   psllq mm0, 3
   paddq mm0, mm3

   cmp ebx, 6*8
   je @labellabel2_9

   pxor mm2, [esp+4+2*8+ebx]
   movd ecx, mm2
   movzx edi, cl
   movq mm3, [edx+0*2048+edi*8]
   movzx edi, ch
   movq mm4, [edx+3*2048+edi*8]
   shr ecx, 16
   movzx edi, cl
   pxor mm3, [edx+1*2048+edi*8]
   movzx edi, ch
   pxor mm4, [edx+2*2048+edi*8]
   pextrw ecx, mm2, 2
   movzx edi, cl
   pxor mm3, [edx+2*2048+edi*8]
   movzx edi, ch
   pxor mm4, [edx+1*2048+edi*8]
   pextrw ecx, mm2, 3
   movzx edi, cl
   pxor mm3, [edx+3*2048+edi*8]
   psubq mm0, mm3
   movzx edi, ch
   pxor mm4, [edx+0*2048+edi*8]
   paddq mm1, mm4
   movq mm3, mm1
   psllq mm1, 3
   paddq mm1, mm3
   add ebx, 3*8
   jmp @label9

   @labellabel2_9:

   pxor mm0, [eax+0*8]
   movq [eax+0*8], mm0
   psubq mm1, mm5
   movq [eax+1*8], mm1
   paddq mm2, [eax+2*8]
   movq [eax+2*8], mm2

   pop esp

   emms
  end;

//  CurrentHash[0]:= a xor aa;
//  CurrentHash[1]:= b - bb;
//  CurrentHash[2]:= c + cc;
  Index:= 0;
  FillChar(HashBuffer,Sizeof(HashBuffer),0);
end;


class function TDCPOptimizedTiger.GetHashSize: integer;
begin
  Result:= 192;
end;

class function TDCPOptimizedTiger.GetId: integer;
begin
  Result:= DCP_tiger;
end;

class function TDCPOptimizedTiger.GetAlgorithm: string;
begin
  Result:= 'Tiger';
end;

class function TDCPOptimizedTiger.SelfTest: boolean;
const
  // *** warning surpressed by Skybuck ***
  Test1Out: array[0..2] of int64=
 (
  int64( $87FB2A9083851CF7 ),
  int64( $470D2CF810E6DF9E ),
  int64( $B586445034A5A386 )
 );
  Test2Out: array[0..2] of int64=
 (
  int64( $0C410A042968868A),
  int64( $1671DA5A3FD29A72),
  int64( $5EC1E457D3CDB303) );
var
  TestHash: TDCPOptimizedTiger;
  TestOut: array[0..2] of int64;
begin
  TestHash:= TDCPOptimizedTiger.Create;
  TestHash.Init;
  TestHash.UpdateStr('ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+-');
  TestHash.Final(TestOut);
  Result:= CompareMem(@TestOut,@Test1Out,Sizeof(Test1Out));
  TestHash.Init;
  TestHash.UpdateStr('Tiger - A Fast New Hash Function, by Ross Anderson and
Eli Biham');
  TestHash.Final(TestOut);
  Result:= CompareMem(@TestOut,@Test2Out,Sizeof(Test2Out)) and Result;
  TestHash.Free;
end;

procedure TDCPOptimizedTiger.Init;
begin
  Burn;
  fInitialized:= true;
  // *** warning surpressed by Skybuck ***
  CurrentHash[0]:= int64( $0123456789ABCDEF );
  CurrentHash[1]:= int64( $FEDCBA9876543210 );
  CurrentHash[2]:= int64( $F096A5B4C3B2E187 );
end;

procedure TDCPOptimizedTiger.Burn;
begin
  Len:= 0;
  Index:= 0;
  FillChar(HashBuffer,Sizeof(HashBuffer),0);
  FillChar(CurrentHash,Sizeof(CurrentHash),0);
  fInitialized:= false;
end;

procedure TDCPOptimizedTiger.Update(const Buffer; Size: longword);
var
  PBuf: ^byte;
begin
  if not fInitialized then
    raise EDCP_hash.Create('Hash not initialized');

  Inc(Len,Size*8);

  PBuf:= @Buffer;
  while Size> 0 do
  begin
    if (Sizeof(HashBuffer)-Index)<= DWord(Size) then
 begin
      Move(PBuf^,HashBuffer[Index],Sizeof(HashBuffer)-Index);
      Dec(Size,Sizeof(HashBuffer)-Index);
   Inc(PBuf,Sizeof(HashBuffer)-Index);
   Compress( @CurrentHash[0], @HashBuffer[0] );
 end
 else
 begin
   Move(PBuf^,HashBuffer[Index],Size);
   Inc(Index,Size);
   Size:= 0;
 end;
  end;
end;

procedure TDCPOptimizedTiger.Final(var Digest);
begin
  if not fInitialized then
 raise EDCP_hash.Create('Hash not initialized');
  HashBuffer[Index]:= $01;
  if Index>= 56 then
 Compress( @Currenthash[0], @HashBuffer[0] );
  Pint64(@HashBuffer[56])^:= Len;
  Compress( @Currenthash[0], @HashBuffer[0] );
  Move(CurrentHash,Digest,Sizeof(CurrentHash));
  Burn;
end;


end.

// *** End of Unit ***


// *** Begin of Test ***

program Project1;

{$APPTYPE CONSOLE}

uses
  SysUtils,
  Windows,
  DCPcrypt_version_201,
  DCPtiger_version_201,
  DCPtiger_optimized_version_201 in 'DCPtiger_optimized_version_201.pas';

//  Unit1 in 'Unit1.pas';

// better asm code which compiles ok.

var
 start_tics : int64;

(*

/***
*clock_t clock() - Return the processor time used by this process.
*
*Purpose:
*       This routine calculates how much time the calling process
*       has used.  At startup time, startup calls __inittime which stores
*       the initial time.  The clock routine calculates the difference
*       between the current time and the initial time.
*
*       Clock must reference _cinitime so that _cinitim.asm gets linked in.
*       That routine, in turn, puts __inittime in the startup initialization
*       routine table.
*
*Entry:
*       No parameters.
*       itime is a static structure of type timeb.
*
*Exit:
*       If successful, clock returns the number of CLK_TCKs (milliseconds)
*       that have elapsed.  If unsuccessful, clock returns -1.
*
*Exceptions:
*       None.
*
*******************************************************************************/
*)

function clock : int64;
var
 current_tics : int64;
 ct : FILETIME;
begin

  GetSystemTimeAsFileTime( ct );

  current_tics := int64(ct.dwLowDateTime) + (int64(ct.dwHighDateTime) shl
32);

  // calculate the elapsed number of 100 nanosecond units
  current_tics := current_tics - start_tics;

  // return number of elapsed milliseconds
  result := int64(current_tics div 10000);
end;

(*
/***
*int __inittime() - Initialize the time location
*
*Purpose:
*       This routine stores the time of the process startup.
*       It is only linked in if the user issues a clock runtime call.
*
*Entry:
*       No arguments.
*
*Exit:
*       Returns 0 to indicate no error.
*
*Exceptions:
*       None.
*
*******************************************************************************/
*)

function inittime : integer;
var
 st : FILETIME;

begin
 GetSystemTimeAsFileTime( st );

 start_tics := int64(st.dwLowDateTime) + (int64(st.dwHighDateTime) shl 32);

 result := 0;
end;


procedure Main1;
var
 vHash : TDCP_hash;
 vBuffer : packed array[0..2047] of byte; // same size as the cryptest.
 vBufferSize : integer;
// vTick1 : int64;
// vTick2 : int64;
 vStart : int64;
 vTicksPerSecond : int64;
 vIndex : integer;
 vBlocks : integer;

 vTimeTaken : double;
 vTimeTotal : double;
begin
 vBufferSize := 2048;

 for vIndex := 0 to 2048-1 do
 begin
  vBuffer[vIndex] := Random(256);
 end;

 QueryPerformanceFrequency( vTicksPerSecond );

 vHash := TDCPOptimizedTiger.Create;
 vHash.Init;

 vIndex := 0;
 vBlocks := 1;
 vTimeTotal := 10;

// QueryPerformanceCounter(vTick1);

 vTicksPerSecond := 1000;

 vStart := Clock;

 repeat

  vBlocks := vBlocks * 2;
  while vIndex < vBlocks do
  begin
   vHash.Update( vBuffer, vBufferSize );
   vIndex := vIndex + 1;
  end;

//  QueryPerformanceCounter(vTick2);

//  vTimeTaken := (vTick2 - vTick1) / vTicksPerSecond;


  vTimeTaken := (Clock - vStart) / vTicksPerSecond;


 until not (vTimeTaken < 2.0 / 3* vTimeTotal);

 vHash.Free;

 writeln('MiB/Sec: ', ( ( (vBlocks * vBufferSize) / vTimeTaken ) /
(1024*1024) ) :16:2 );
end;


procedure Main2;
var
 vHash : TDCP_hash;
 vBuffer : packed array[0..2047] of byte; // same size as the cryptest.
 vBufferSize : integer;
 vIndex : integer;

 vDigest : packed array[0..23] of byte;
begin
 vBufferSize := 2048;

 for vIndex := 0 to 2048-1 do
 begin
  vBuffer[vIndex] := vIndex mod 256;
 end;

 vHash := TDCPOptimizedTiger.Create;
 vHash.Init;

 for vIndex:=0 to 9 do
 begin
  vHash.Update( vBuffer, vBufferSize );
 end;

 vHash.Final( vDigest );

 for vIndex:=0 to 23 do
 begin
  writeln( vDigest[vIndex] );

 end;

 vHash.Free;

end;


begin
  try
 inittime;
 Main2;
  except
 on E:Exception do
   Writeln(E.Classname, ': ', E.Message);
  end;
  readln;
end.

// *** End of Test ***

_______________________________________________
fpc-pascal maillist  -  [hidden email]
http://lists.freepascal.org/mailman/listinfo/fpc-pascal
Reply | Threaded
Open this post in threaded view
|

Re: Crypt++ 5.5.2 Tiger::Transform converted toDCPCrypt2.Compress, AV: Compiler Optimization Bug with SSE2 ? Or just programming bug ?HELP NEEDED :) ?!

Skybuck Flying
Ok,

I already solved problem by using:

pushad

.. rest of asm ...

popad

However this might be slower than just a few pushes and pops ?

So any slight performance improvements welcome, but I am already quite happy
with this solution ! ;)

Bye,
  Skybuck.

_______________________________________________
fpc-pascal maillist  -  [hidden email]
http://lists.freepascal.org/mailman/listinfo/fpc-pascal
Reply | Threaded
Open this post in threaded view
|

Re: Crypt++ 5.5.2 Tiger::Transform converted toDCPCrypt2.Compress, AV: Compiler Optimization Bug with SSE2 ? Or just programming bug ?HELP NEEDED :) ?!

Tomas Hajny
On Wed, March 19, 2008 09:43, Skybuck Flying wrote:

> Ok,
>
> I already solved problem by using:
>
> pushad
>
> .. rest of asm ...
>
> popad
>
> However this might be slower than just a few pushes and pops ?
>
> So any slight performance improvements welcome, but I am already quite
> happy
> with this solution ! ;)

FPC needs to preserve ebx, esi and edi.

Tomas


_______________________________________________
fpc-pascal maillist  -  [hidden email]
http://lists.freepascal.org/mailman/listinfo/fpc-pascal
Reply | Threaded
Open this post in threaded view
|

Re: Crypt++ 5.5.2 Tiger::Transform converted toDCPCrypt2.Compress, AV: Compiler Optimization Bug with SSE2 ? Or just programming bug ?HELP NEEDED :) ?!

Skybuck Flying
> FPC needs to preserve ebx, esi and edi.

So it's  push ebx + push esi + push edi vs pushad. And same for pop.

According to my AMD optimization manual.

pushad requires 6 latency.
push register requires 3 latency.

So pushad should be faster.

Bye,
  Skybuck.


_______________________________________________
fpc-pascal maillist  -  [hidden email]
http://lists.freepascal.org/mailman/listinfo/fpc-pascal