Speed question for strings

classic Classic list List threaded Threaded
7 messages Options
L-9
Reply | Threaded
Open this post in threaded view
|

Speed question for strings

L-9
Why is the first and third example *so* much faster than the second example?
Significantly faster.

There's really not that much difference between the string concatenation.


//Example 1: (seems fast)
function StrLoadFile_test1(FileName: string): string;
var
  F: text;
  c: char;
  str1,Line: string;
begin
  result:= ''; //safety
  str1:= '';
  if FileExists(FileName) = false then
  begin
   result:= ''; //returns nothing if file not found
   exit;
  end;
  Assign(F, FileName);
  Reset(F);
  while not Eof(F) do
  begin
    Read(F, C);
    result:= result + C;
  end;
  close(F); //close up file
end;

//Example 2: (definitely slow)
function StrLoadFile_test2(FileName: string): string;
var
  F: text;
  str1,Line: string;
begin
  result:= ''; //safety
  str1:= '';
  if FileExists(FileName) = false then
  begin
   result:= ''; //returns nothing if file not found
   exit;
  end;
  Assign(F, FileName);
  Reset(F);
  while not Eof(F) do
  begin
    ReadLn(F, Line);
    result:= result + Line + #13#10;
  end;
  close(F); //close up file
end;

//Example 3: (seems fast)
function StrLoadFile_test3(FileName: string): string;
var
  F: text;
  str1,Line: string;
begin
  result:= ''; //safety
  str1:= '';
  if FileExists(FileName) = false then
  begin
   result:= ''; //returns nothing if file not found
   exit;
  end;
  Assign(F, FileName);
  Reset(F);
  while not Eof(F) do
  begin
    ReadLn(F, Line);
    result:= result + Line; //note: took out carriage return.
  end;
  close(F); //close up file
end;


What's the deal with one extra CRLF? The line is being concatenated each time, so I
don't see why it is *so* much slower with concatenating an extra CRLF each time. I
could see if it made a minimal impact.. but it is not minimal, it is hugely majorly
significant.

There even better ways to do this StrLoadFile of course (reading in large chunks). I
'm just wondering about this particular situation though, for the knowledge. I just
don't see how it can be significantly slower, since in all cases we are still
concatenating in the loop .. the slow one just contains one small extra
concatenation.

In order to see the speed difference, try a 1MB or 5MB file.

--
L505


_______________________________________________
fpc-pascal maillist  -  [hidden email]
http://lists.freepascal.org/mailman/listinfo/fpc-pascal
Reply | Threaded
Open this post in threaded view
|

Re: Speed question for strings

Vincent Snijders
L wrote:
> Why is the first and third example *so* much faster than the second example?
> Significantly faster.
>

Because you doubled the number of string concatenations.

Vincent.
_______________________________________________
fpc-pascal maillist  -  [hidden email]
http://lists.freepascal.org/mailman/listinfo/fpc-pascal
Reply | Threaded
Open this post in threaded view
|

Re: Speed question for strings

L505
> L wrote:
> > Why is the first and third example *so* much faster than the second example?
> > Significantly faster.
> >
>
> Because you doubled the number of string concatenations.
>
> Vincent.


Right, but it's not twice as slow :-) Worse... It's at least 50-100 times slower.
Weird.

I will have to do some bench marks. I just thought that if there was already a string
concatenation happening, that an additional concatenation would cost me twice as much
time, but not 50-100 times. Plus, the second concatenation is a small concatenation.
It's only two characters in length (#13#10) whereas the Line is much bigger.

--
L505

_______________________________________________
fpc-pascal maillist  -  [hidden email]
http://lists.freepascal.org/mailman/listinfo/fpc-pascal
Reply | Threaded
Open this post in threaded view
|

Re: Speed question for strings

Vincent Snijders
L505 wrote:

>>L wrote:
>>
>>>Why is the first and third example *so* much faster than the second example?
>>>Significantly faster.
>>>
>>
>>Because you doubled the number of string concatenations.
>>
>>Vincent.
>
>
>
> Right, but it's not twice as slow :-) Worse... It's at least 50-100 times slower.
> Weird.
>
> I will have to do some bench marks. I just thought that if there was already a string
> concatenation happening, that an additional concatenation would cost me twice as much
> time, but not 50-100 times. Plus, the second concatenation is a small concatenation.
> It's only two characters in length (#13#10) whereas the Line is much bigger.

The first string is always long, so you are doing twice adding a small
string to a long string. This causes much heap fragmentation.

Try result:= result + (Line + #13#10);
or

Line := Line + #13#10;
Result := Result + Line;

OTOH using a TStringList is probably faster, because strings are only
concatenated once.

Vicnent.
_______________________________________________
fpc-pascal maillist  -  [hidden email]
http://lists.freepascal.org/mailman/listinfo/fpc-pascal
Reply | Threaded
Open this post in threaded view
|

Re: Speed question for strings

Peter Vreman
In reply to this post by L505
At 09:46 19-11-2005, you wrote:

> > L wrote:
> > > Why is the first and third example *so* much faster than the second
> example?
> > > Significantly faster.
> > >
> >
> > Because you doubled the number of string concatenations.
> >
> > Vincent.
>
>
>Right, but it's not twice as slow :-) Worse... It's at least 50-100 times
>slower.
>Weird.
>
>I will have to do some bench marks. I just thought that if there was
>already a string
>concatenation happening, that an additional concatenation would cost me
>twice as much
>time, but not 50-100 times. Plus, the second concatenation is a small
>concatenation.
>It's only two characters in length (#13#10) whereas the Line is much bigger.

Use a good profiler like valgrind/kcachegrind to see where the most time is
spend. Also from your code it is not clear what kind of strings are used.
There is a huge difference between shortstring and ansistrings regarding
performance.


Peter

_______________________________________________
fpc-pascal maillist  -  [hidden email]
http://lists.freepascal.org/mailman/listinfo/fpc-pascal
Reply | Threaded
Open this post in threaded view
|

Re: Speed question for strings

L505

----- Original Message -----
From: "Peter Vreman" <[hidden email]>
To: "FPC-Pascal users discussions" <[hidden email]>
Sent: Saturday, November 19, 2005 1:47 AM
Subject: Re: [fpc-pascal] Speed question for strings


> At 09:46 19-11-2005, you wrote:
> > > L wrote:
> > > > Why is the first and third example *so* much faster than the second
> > example?
> > > > Significantly faster.
> > > >
> > >
> > > Because you doubled the number of string concatenations.
> > >
> > > Vincent.
> >
> >
> >Right, but it's not twice as slow :-) Worse... It's at least 50-100 times
> >slower.
> >Weird.
> >
> >I will have to do some bench marks. I just thought that if there was
> >already a string
> >concatenation happening, that an additional concatenation would cost me
> >twice as much
> >time, but not 50-100 times. Plus, the second concatenation is a small
> >concatenation.
> >It's only two characters in length (#13#10) whereas the Line is much bigger.
>
> Use a good profiler like valgrind/kcachegrind to see where the most time is
> spend.

Thanks for the tip, I downloaded a http://members.yline.com/~tom_at_work/ CPU unit to
try too. I think this is the tool we used to measure our UpperCase() optimization war
eariler this year, if you guys rememember.

> Also from your code it is not clear what kind of strings are used.
> There is a huge difference between shortstring and ansistrings regarding
> performance.
>

Ah, big thing I forgot to mention. It is the ansistring that I'm using in all the
functions. The speed I find extremely reasonable for Example 1 and Example 3, even
though this is ansistring (it's basically instant.). But for Example 2, I'm waiting
there for 10 seconds watching the command line stay idle.

In this case, optimization really is affected by a simple concatenation. Which really
intrigues me to learn more, because I usually never have bottleneck issues like this
that are so significant. You wouldn't think loading a file into a string could take
10 or so seconds by just changing one thing!

And, I was always under the impression that reading a file Line by Line was faster
than Char by Char.. but in this case, not if you add the line feeds within the loop!

--
L505


_______________________________________________
fpc-pascal maillist  -  [hidden email]
http://lists.freepascal.org/mailman/listinfo/fpc-pascal
Reply | Threaded
Open this post in threaded view
|

Re: Speed question for strings

L505
In reply to this post by Vincent Snijders

> The first string is always long, so you are doing twice adding a small
> string to a long string. This causes much heap fragmentation.
>
> Try result:= result + (Line + #13#10);
> or
>
> Line := Line + #13#10;
> Result := Result + Line;

Ahh that will probably do the trick. I'll let you know.

>
> OTOH using a TStringList is probably faster, because strings are only
> concatenated once.
>

It's funny you mention that.. because the entire point of this unit (StrWrap1) is to
reinvent a stringlist actually. The classes unit hauls in lots of kilobytes.. so I'm
reinventing simple stringlists in a procedural manner.

It is useful for CGI applications and etc. so that
 -uploading CGI application size is not so big.
 -Classes unit hauls in a lot of kilobytes even with smartlinking.
   (Not so important these days. But is with CGI)
 -StrLoadFile is cleaner than using free and create, and stringlist.text

So the code is more clear, but the advanced features that stringlists have such as
sorting are not available obviously. It's basically an alternative to stringlists, in
a more compact and cleaner manner for *some* situations (not all.. I still use
stringlists for the times when needed).

After I'm done fiddling with these string concatenations (and the results are in), I
will most likely load the file right into a string via one call (instead of char by
char or line by line). But without using a TMemoryStream. I'll be doing it the hard
way (no classes unit allowed).

--
L505

_______________________________________________
fpc-pascal maillist  -  [hidden email]
http://lists.freepascal.org/mailman/listinfo/fpc-pascal