JCL-1.98.1.2509 and FPC (Linux)

classic Classic list List threaded Threaded
14 messages Options
Reply | Threaded
Open this post in threaded view
|

JCL-1.98.1.2509 and FPC (Linux)

Wayne Sherman-4
JCL-1.98.1.2509 came out fairly recently.  Does anyone know if it
works with FPC? on Linux?

Thanks,

Wayne
_______________________________________________
fpc-pascal maillist  -  [hidden email]
http://lists.freepascal.org/mailman/listinfo/fpc-pascal
Reply | Threaded
Open this post in threaded view
|

Question on how to avoid memory trouble using FindFirst(), FindNext() and FindClose()

Cox, Stuart TRAN:EX
I am trying to EnumerateFiles (create a list of all files that match a
given filespec) across and down a whole drive.  I've directly used the
EnumerateFiles code from TurboPower's SysTools as available on
SourceForge.

The code works just as TurboPower designed and it and my wrapper code
runs find on some drives but fails miserably on the very one that I
would like to use the code on.  As the EnumerateFiles code attempts to
append one more filename to the Tstrings, it throws an exception and
dies from lack of memory.  The trouble is not in the memory used by the
Tstrings to hold the list of found files but, rather, in the way that
memory fragmentation occurs during the FindFirst(), FindNext() and
FindClose() calls.  This can be verified by eliminating the storage of
anything in the Tstrings list at all.  The routine will still die and in
the same place and at the same filename.

The error is not from a lack of stack space, either.  The error is
EOutOfMemory that indicates a less easily solved problem.

Can anyone recommend a method to search a whole drive, of arbitrary
size, without running out of memory.

Stu Cox
_______________________________________________
fpc-pascal maillist  -  [hidden email]
http://lists.freepascal.org/mailman/listinfo/fpc-pascal
Reply | Threaded
Open this post in threaded view
|

Re: Question on how to avoid memory trouble using FindFirst(), FindNext() and FindClose()

Marco van de Voort
> Can anyone recommend a method to search a whole drive, of arbitrary
> size, without running out of memory.

I don't know seen SysTools, but I worked analysing logfiles for a year. All
containertypes (TList TObjectList and TstringList included) that have a
single array as internal datastructure become prone to fragmentation or
slowdowns when the number of elements get bigger.

A rule of thumb border is 50k-500k elements.

So a simple plan would be:

- analyse what functionality is really used of TStringlist, since there is a
  lot of functionality there.
- Take a more scalable containertype and graft it into SysTools.
- Avoid TStringList in own code, or at least mark such code in documentation
and memory as scalability risk :-)

_______________________________________________
fpc-pascal maillist  -  [hidden email]
http://lists.freepascal.org/mailman/listinfo/fpc-pascal
Reply | Threaded
Open this post in threaded view
|

RE: Question on how to avoid memory troubleusing FindFirst(), FindNext() and FindClose()

Cox, Stuart TRAN:EX
Thanks for taking the time to offer help.

I don't believe that the memory is being used up by the TStringList
since I eliminated it from being populated at all.  Through the whole
run of examining the drive's directories it never gets given a file.
Yet, memory is still completely used up.  

Seems instead to be consumed by FindFirst and its friends.  TurboPower's
EnumerateFiles uses FindClose appropriately but I'll examine it closely.

I'll work some more to try to isolate what's happening.  

Stu Cox

-----Original Message-----
From: [hidden email]
[mailto:[hidden email]] On Behalf Of Marco van
de Voort
Sent: Fri, February 2, 2007 1:24 PM
To: FPC-Pascal users discussions
Subject: Re: [fpc-pascal] Question on how to avoid memory troubleusing
FindFirst(), FindNext() and FindClose()

> Can anyone recommend a method to search a whole drive, of arbitrary
> size, without running out of memory.

I don't know seen SysTools, but I worked analysing logfiles for a year.
All containertypes (TList TObjectList and TstringList included) that
have a single array as internal datastructure become prone to
fragmentation or slowdowns when the number of elements get bigger.

A rule of thumb border is 50k-500k elements.

So a simple plan would be:

- analyse what functionality is really used of TStringlist, since there
is a
  lot of functionality there.
- Take a more scalable containertype and graft it into SysTools.
- Avoid TStringList in own code, or at least mark such code in
documentation and memory as scalability risk :-)

_______________________________________________
fpc-pascal maillist  -  [hidden email]
http://lists.freepascal.org/mailman/listinfo/fpc-pascal
_______________________________________________
fpc-pascal maillist  -  [hidden email]
http://lists.freepascal.org/mailman/listinfo/fpc-pascal
Reply | Threaded
Open this post in threaded view
|

Re: Question on how to avoid memory trouble using FindFirst(), FindNext() and FindClose()

Michael Van Canneyt
In reply to this post by Marco van de Voort


On Fri, 2 Feb 2007, Marco van de Voort wrote:

> > Can anyone recommend a method to search a whole drive, of arbitrary
> > size, without running out of memory.
>
> I don't know seen SysTools, but I worked analysing logfiles for a year. All
> containertypes (TList TObjectList and TstringList included) that have a
> single array as internal datastructure become prone to fragmentation or
> slowdowns when the number of elements get bigger.

If you approximately know the number of elements in advance, you can
reduce the fragmentation to nearly zero if you set the capacity of the
list before filling it. It will also speed up the loading of the list.

Michael.
_______________________________________________
fpc-pascal maillist  -  [hidden email]
http://lists.freepascal.org/mailman/listinfo/fpc-pascal
Reply | Threaded
Open this post in threaded view
|

Re: Question on how to avoid memory trouble using FindFirst(), FindNext() and FindClose()

Marco van de Voort
> On Fri, 2 Feb 2007, Marco van de Voort wrote:
>
> > > Can anyone recommend a method to search a whole drive, of arbitrary
> > > size, without running out of memory.
> >
> > I don't know seen SysTools, but I worked analysing logfiles for a year. All
> > containertypes (TList TObjectList and TstringList included) that have a
> > single array as internal datastructure become prone to fragmentation or
> > slowdowns when the number of elements get bigger.
>
> If you approximately know the number of elements in advance, you can
> reduce the fragmentation to nearly zero if you set the capacity of the
> list before filling it.

The reallocations at the end hurt the most. So if you guess slightly too low it
still fails.

> It will also speed up the loading of the list.

(not if it is sorted).

_______________________________________________
fpc-pascal maillist  -  [hidden email]
http://lists.freepascal.org/mailman/listinfo/fpc-pascal
Reply | Threaded
Open this post in threaded view
|

Re: Question on how to avoid memory trouble using FindFirst(), FindNext() and FindClose()

Michael Van Canneyt


On Sat, 3 Feb 2007, Marco van de Voort wrote:

> > On Fri, 2 Feb 2007, Marco van de Voort wrote:
> >
> > > > Can anyone recommend a method to search a whole drive, of arbitrary
> > > > size, without running out of memory.
> > >
> > > I don't know seen SysTools, but I worked analysing logfiles for a year. All
> > > containertypes (TList TObjectList and TstringList included) that have a
> > > single array as internal datastructure become prone to fragmentation or
> > > slowdowns when the number of elements get bigger.
> >
> > If you approximately know the number of elements in advance, you can
> > reduce the fragmentation to nearly zero if you set the capacity of the
> > list before filling it.
>
> The reallocations at the end hurt the most. So if you guess slightly too low it
> still fails.

Obviously, but it'll hurt less, since it grows with 25%; At most you'll get maybe
1 reallocation, which is acceptable.

>
> > It will also speed up the loading of the list.
>
> (not if it is sorted).

Yes it will, because the reallocations don't happen as often.
The sorting introduces an overhead anyway, whether you set capacity or not.

The correct procedure IMHO is
- Set capacity
- Load
- Sort

I tested such things with an N^3 algorithm for my daytime job, and the difference
is very noticeable.

All this doesn't exclude that a specialized class may be more suitable for the job.
I just want to illustrate that, if programmed correctly, TList, TStringList and
friends can still get you a long way...

Michael.
_______________________________________________
fpc-pascal maillist  -  [hidden email]
http://lists.freepascal.org/mailman/listinfo/fpc-pascal
Reply | Threaded
Open this post in threaded view
|

Re: Question on how to avoid memory trouble using FindFirst(), FindNext() and FindClose()

Marc Santhoff
In reply to this post by Cox, Stuart TRAN:EX
Am Freitag, den 02.02.2007, 08:52 -0800 schrieb Cox, Stuart TRAN:EX:

> I am trying to EnumerateFiles (create a list of all files that match a
> given filespec) across and down a whole drive.  I've directly used the
> EnumerateFiles code from TurboPower's SysTools as available on
> SourceForge.
>
> The code works just as TurboPower designed and it and my wrapper code
> runs find on some drives but fails miserably on the very one that I
> would like to use the code on.  As the EnumerateFiles code attempts to
> append one more filename to the Tstrings, it throws an exception and
> dies from lack of memory.  The trouble is not in the memory used by the
> Tstrings to hold the list of found files but, rather, in the way that
> memory fragmentation occurs during the FindFirst(), FindNext() and
> FindClose() calls.  This can be verified by eliminating the storage of
> anything in the Tstrings list at all.  The routine will still die and in
> the same place and at the same filename.
>
> The error is not from a lack of stack space, either.  The error is
> EOutOfMemory that indicates a less easily solved problem.
>
> Can anyone recommend a method to search a whole drive, of arbitrary
> size, without running out of memory.

>From reading this thread I think you must have another problem, likely
in TurboPowers or your own implementation.

I've been doing the same (listing deep file systems) and never had any
problems with memory. My classes are made mainly for indexing storage
and backup media (CD, DVD, ...) and I've tested it five minutes ago on
an amount of:

$ wc -l storage.txt
  152811 storage.txt

lines naming a file or directory each. The list class in use is a
derivation of "TFPList" (think it's from "classes") with an new sorting
routine (qsort) attached.

I had some problems with the "FindXxxx"-implementation on *nix-like OS,
but that dealt with symlinks.

Happy bug hunting and memory profiling,
Marc


_______________________________________________
fpc-pascal maillist  -  [hidden email]
http://lists.freepascal.org/mailman/listinfo/fpc-pascal
Reply | Threaded
Open this post in threaded view
|

Re: Question on how to avoid memory trouble using FindFirst(), FindNext() and FindClose()

Marco van de Voort
In reply to this post by Michael Van Canneyt
> Yes it will, because the reallocations don't happen as often.
> The sorting introduces an overhead anyway, whether you set capacity or not.

Yes, but I was talking about slowness in general, not just from the heap.

And TStringList with those huge internal list has to move on avg half of the
array. If TStringList had an extra indirection (array of ptr to blk of ptrs)
it would be less so.
 
> The correct procedure IMHO is
> - Set capacity
> - Load
> - Sort

> I tested such things with an N^3 algorithm for my daytime job, and the
> difference is very noticeable.

With a single array or a multilevel one?
 
> All this doesn't exclude that a specialized class may be more suitable for
> the job.

To be honest, the only good point of TStringList seems to be that it is
default available on all Object Pascal. The specialised stuff (splitting
strings) is also plagued with oddities. (most notably the fact that
characters under space are always used as separator)

> I just want to illustrate that, if programmed correctly, TList,
> TStringList and friends can still get you a long way...

I think that the lengths to which people will go to stick to them paints
what is needed to make a serious effort to make them legacy.

There should be a set of container classes in a separate unit (a unit not
existing in Delphi most notably) that is Open Source and works on Delphi
too.

_______________________________________________
fpc-pascal maillist  -  [hidden email]
http://lists.freepascal.org/mailman/listinfo/fpc-pascal
Reply | Threaded
Open this post in threaded view
|

Re: Question on how to avoid memory trouble using FindFirst(), FindNext() and FindClose()

Marc Santhoff
In reply to this post by Marc Santhoff
Am Samstag, den 03.02.2007, 16:36 +0100 schrieb Marc Santhoff:
> Am Freitag, den 02.02.2007, 08:52 -0800 schrieb Cox, Stuart TRAN:EX:
...

> > Can anyone recommend a method to search a whole drive, of arbitrary
> > size, without running out of memory.
>
> >From reading this thread I think you must have another problem, likely
> in TurboPowers or your own implementation.
>
> I've been doing the same (listing deep file systems) and never had any
> problems with memory. My classes are made mainly for indexing storage
> and backup media (CD, DVD, ...) and I've tested it five minutes ago on
> an amount of:
>
> $ wc -l storage.txt
>   152811 storage.txt
>
> lines naming a file or directory each. The list class in use is a
> derivation of "TFPList" (think it's from "classes") with an new sorting
> routine (qsort) attached.
>
> I had some problems with the "FindXxxx"-implementation on *nix-like OS,
> but that dealt with symlinks.

Since you asked for a method, not the class type to use, I had a deeper
look at it:

My implementation does not stick anything into one list but uses a tree
of nested lists, one TFPList derivate for each directory at each level.

For every single file handled there is an item-class object holding file
info (name, size, ...) put into the directory (container-)list.

Works nice and fast ... only the recursion scheme seems to be somewhat
more complex.

HTH,
Marc


_______________________________________________
fpc-pascal maillist  -  [hidden email]
http://lists.freepascal.org/mailman/listinfo/fpc-pascal
Reply | Threaded
Open this post in threaded view
|

Re: Question on how to avoid memory trouble using FindFirst(), FindNext() and FindClose()

Michael Van Canneyt
In reply to this post by Marco van de Voort


On Sat, 3 Feb 2007, Marco van de Voort wrote:

> > Yes it will, because the reallocations don't happen as often.
> > The sorting introduces an overhead anyway, whether you set capacity or not.
>
> Yes, but I was talking about slowness in general, not just from the heap.
>
> And TStringList with those huge internal list has to move on avg half of the
> array. If TStringList had an extra indirection (array of ptr to blk of ptrs)
> it would be less so.

Eh ? What happens if you do a Insert(0,'SomeString') ? Don't you need to move
across all blocks ? Or will you just grow the first block ?

Anyway, it could be a nice idea to implement as TLargeStringList.

>  
> > The correct procedure IMHO is
> > - Set capacity
> > - Load
> > - Sort
>
> > I tested such things with an N^3 algorithm for my daytime job, and the
> > difference is very noticeable.
>
> With a single array or a multilevel one?

Multilevel.

>  
> > All this doesn't exclude that a specialized class may be more suitable for
> > the job.
>
> To be honest, the only good point of TStringList seems to be that it is
> default available on all Object Pascal. The specialised stuff (splitting
> strings) is also plagued with oddities. (most notably the fact that
> characters under space are always used as separator)

Why do you call this oddities ? For GUI programming (and that's what it was
implemented for) it makes perfect sense.

>
> > I just want to illustrate that, if programmed correctly, TList,
> > TStringList and friends can still get you a long way...
>
> I think that the lengths to which people will go to stick to them paints
> what is needed to make a serious effort to make them legacy.

I fail to see why, but no doubt you have your reasons. I'd like to see a complete
list of 'issues' with TStrings (not TStringList, that's just a specific implementation).

Maybe we can then create a TFPStrings which would be interface compatible,
(in the sense that it can replace TStrings without source changes. It can
have internal differences, like the space handling) but with a more sensible
behaviour ?

I mean, I know since some time you don't particularly like tstrings and tlist
and whatnot (although I don't fully know why), but, by all means:
let's then do an effort and design something better.

Starting with a list of issues seems like a good beginning.

>
> There should be a set of container classes in a separate unit (a unit not
> existing in Delphi most notably) that is Open Source and works on Delphi
> too.

Like Decal was set up to be ?

I'm all for starting such a thing. I would be the first one to use such a beast if it was better.
It was the prime reason for implementing TFPlist.

Michael.
_______________________________________________
fpc-pascal maillist  -  [hidden email]
http://lists.freepascal.org/mailman/listinfo/fpc-pascal
Reply | Threaded
Open this post in threaded view
|

Re: Question on how to avoid memory trouble using FindFirst(), FindNext() and FindClose()

Marco van de Voort
> On Sat, 3 Feb 2007, Marco van de Voort wrote:
>
> > > Yes it will, because the reallocations don't happen as often.
> > > The sorting introduces an overhead anyway, whether you set capacity or not.
> >
> > Yes, but I was talking about slowness in general, not just from the heap.
> >
> > And TStringList with those huge internal list has to move on avg half of the
> > array. If TStringList had an extra indirection (array of ptr to blk of ptrs)
> > it would be less so.
>
> Eh ? What happens if you do a Insert(0,'SomeString') ? Don't you need to move
> across all blocks ? Or will you just grow the first block ?

Assume 1M elements.


Then the tstringlist case is 1M*sizeof(pointer) bytes. Inserting a new node on avg moves
(1M*sizeof(pointer))/2 bytes. Overhead of index is only heapmgr overhead for the big
block.

Assume and (max) 1k elements per block. Means we have an array of 1000
pointers, each to a block of 1000 ptrs. Inserting means a binsearch in the
first array to locate the block, and then there are two cases:

1 there is slack in the block: insert into the block, on avg
  (1000*sizeof(pointer)/2) bytes are moved.
2 the block is full, you have to insert a new block in the toplevel index,
   ( avg 1000*sizeof(pointer)/2) and then redivide the block between the two
        (again about 1000*sizeof(pointer)/2 roughly.

If you do the splitting smart, 1 is way more common than 2.

So that is 1k*sizeof(pointer)/2 vs  1M*sizeof(pointer)/2.

> Anyway, it could be a nice idea to implement as TLargeStringList.

Yes, but I'd get rid of the string splitting stuff etc. IMHO they don't
belong in the same class.

> > > The correct procedure IMHO is
> > > - Set capacity
> > > - Load
> > > - Sort
> >
> > > I tested such things with an N^3 algorithm for my daytime job, and the
> > > difference is very noticeable.
> >
> > With a single array or a multilevel one?
>
> Multilevel.

Then it is not really tstringlist. Or do you mean tstringlists as major
index with tstringlists under each node? That is IMHO already a custom
class.

> > > All this doesn't exclude that a specialized class may be more suitable for
> > > the job.
> >
> > To be honest, the only good point of TStringList seems to be that it is
> > default available on all Object Pascal. The specialised stuff (splitting
> > strings) is also plagued with oddities. (most notably the fact that
> > characters under space are always used as separator)
>
> Why do you call this oddities ? For GUI programming (and that's what it was
> implemented for) it makes perfect sense.

The problem is that it is used for way more. It is container type, splitter,
GUI data backstore etc. One can argue about how it is was meant originally,
but this is the practice now. And a lot of code scales bad because of it.

> > > I just want to illustrate that, if programmed correctly, TList,
> > > TStringList and friends can still get you a long way...
> >
> > I think that the lengths to which people will go to stick to them paints
> > what is needed to make a serious effort to make them legacy.
>
> I fail to see why, but no doubt you have your reasons. I'd like to see a complete
> list of 'issues' with TStrings (not TStringList, that's just a specific implementation).

I think we would have to agree first on the target of TStrings and -list.
But the main reason is simply bad scaling. I want to be able to use a
container type, and not first wrap it, or risk scaling issues later.

A language/RTL should provide such type, and not position a type originally
meant for GUI bindings with internal limits as basetype for datastructures.

Not lots of new machines have memories that easily allow such large
datastructures.

The end conclusion can also be we need to promote e.g. contnrs types more. My
remark here was more the backwards delphi compat (e.g. FPC extensions should
not be in contnrs, but in a different unit that also compiles on Delphi)

> Maybe we can then create a TFPStrings which would be interface compatible,
> (in the sense that it can replace TStrings without source changes. It can
> have internal differences, like the space handling) but with a more sensible
> behaviour ?

IMHO
- linesplitting and container function should be separated.
- GUI and non GUI interfacing (data structures) should be separated.
- There should be default containers that are not vectors (there are some in
contnrs). IOW more java style iterators.

> > There should be a set of container classes in a separate unit (a unit not
> > existing in Delphi most notably) that is Open Source and works on Delphi
> > too.
>
> Like Decal was set up to be ?

More or less yes, with one big problem here. Decal abused interfaces and
variants as a way around missing generics, causing larger performance
problems than it solved.

However generics support in FPC won't solve it either, since then TStringList and TList will
still be the lowest common denomitor. And a lot of programmers need to keep
Delphi compat.  Some coordination to have some congruency between generic
and not generic types might be wise.
 
> I'm all for starting such a thing. I would be the first one to use such a beast if it was better.
> It was the prime reason for implementing TFPlist.

...Which is also in a unit that you can't simply USE in Delphi.

_______________________________________________
fpc-pascal maillist  -  [hidden email]
http://lists.freepascal.org/mailman/listinfo/fpc-pascal
Reply | Threaded
Open this post in threaded view
|

Re: Question on how to avoid memory trouble using FindFirst(), FindNext() and FindClose()

Luiz Americo Pereira Camara-2
In reply to this post by Michael Van Canneyt
Michael Van Canneyt wrote:

>>> All this doesn't exclude that a specialized class may be more suitable for
>>> the job.
>>>      
>> To be honest, the only good point of TStringList seems to be that it is
>> default available on all Object Pascal. The specialised stuff (splitting
>> strings) is also plagued with oddities. (most notably the fact that
>> characters under space are always used as separator)
>>    
>
> Why do you call this oddities ? For GUI programming (and that's what it was
> implemented for) it makes perfect sense.
>  
Early i think that the space issue in DelimitedText was only something
with very few usages. Later i found that is useless.

Take the following scenario: you want to load a csv into a TStrings.
In TStrings to add a string with space is necessary to enclose it with
quotes
For example: if you want that the csv string 'aaaa;bb bb;cccc' becomes a
TStrings with 3 items (aaaa, bb bb and  cccc), is necessary to transform
'aaaa;bb bb;cccc' into 'aaaa;"bb bb";cccc'
But to do this in a generic way, without knowing if determined value has
space than you have to parse each item of the csv, check if has space
and then quote. But in this case is better to add the string directly
through add method.

>>> I just want to illustrate that, if programmed correctly, TList,
>>> TStringList and friends can still get you a long way...
>>>      
>> I think that the lengths to which people will go to stick to them paints
>> what is needed to make a serious effort to make them legacy.
>>    
>
> I fail to see why, but no doubt you have your reasons. I'd like to see a complete
> list of 'issues' with TStrings (not TStringList, that's just a specific implementation).
>
> Maybe we can then create a TFPStrings which would be interface compatible,
> (in the sense that it can replace TStrings without source changes. It can
> have internal differences, like the space handling) but with a more sensible
> behaviour ?
>  
Good idea, i think the same.

My issue suggestion is fix the DelimitedText handling space. See above.
I have a small function that parses csv strings and add to a TStrings
but uses strutils.posex.
In terms of flexibility, and to comtemple those that want to quote a
string with space (or whatever) an event can be added to be called at
each added string though delimitedtext. but since we are talking about
efficiency i dont think a good idea.

Luiz
_______________________________________________
fpc-pascal maillist  -  [hidden email]
http://lists.freepascal.org/mailman/listinfo/fpc-pascal
Reply | Threaded
Open this post in threaded view
|

Re: Question on how to avoid memory trouble using FindFirst(), FindNext() and FindClose()

Michael Van Canneyt
In reply to this post by Marco van de Voort


On Sat, 3 Feb 2007, Marco van de Voort wrote:

> > On Sat, 3 Feb 2007, Marco van de Voort wrote:
> >
> > > > Yes it will, because the reallocations don't happen as often.
> > > > The sorting introduces an overhead anyway, whether you set capacity or not.
> > >
> > > Yes, but I was talking about slowness in general, not just from the heap.
> > >
> > > And TStringList with those huge internal list has to move on avg half of the
> > > array. If TStringList had an extra indirection (array of ptr to blk of ptrs)
> > > it would be less so.
> >
> > Eh ? What happens if you do a Insert(0,'SomeString') ? Don't you need to move
> > across all blocks ? Or will you just grow the first block ?
>
> Assume 1M elements.
>
>
> Then the tstringlist case is 1M*sizeof(pointer) bytes. Inserting a new node on avg moves
> (1M*sizeof(pointer))/2 bytes. Overhead of index is only heapmgr overhead for the big
> block.
>
> Assume and (max) 1k elements per block. Means we have an array of 1000
> pointers, each to a block of 1000 ptrs. Inserting means a binsearch in the
> first array to locate the block, and then there are two cases:
>
> 1 there is slack in the block: insert into the block, on avg
>   (1000*sizeof(pointer)/2) bytes are moved.
> 2 the block is full, you have to insert a new block in the toplevel index,
>    ( avg 1000*sizeof(pointer)/2) and then redivide the block between the two
> (again about 1000*sizeof(pointer)/2 roughly.
>
> If you do the splitting smart, 1 is way more common than 2.
>
> So that is 1k*sizeof(pointer)/2 vs  1M*sizeof(pointer)/2.

It's what I thought, yes.

>
> > Anyway, it could be a nice idea to implement as TLargeStringList.
>
> Yes, but I'd get rid of the string splitting stuff etc. IMHO they don't
> belong in the same class.

Why, they make coding much easier ?

I found myself having 10-20 custom functions, all accepting a TStrings + some args.
The coding is easier if it's just a method of TStrings.

But there is a way around that; Encapsulation.


> > > > The correct procedure IMHO is
> > > > - Set capacity
> > > > - Load
> > > > - Sort
> > >
> > > > I tested such things with an N^3 algorithm for my daytime job, and the
> > > > difference is very noticeable.
> > >
> > > With a single array or a multilevel one?
> >
> > Multilevel.
>
> Then it is not really tstringlist. Or do you mean tstringlists as major
> index with tstringlists under each node? That is IMHO already a custom
> class.

I don't consider that a custom class, I didn't have to declare extra classes for it :-)

>
> > > > All this doesn't exclude that a specialized class may be more suitable for
> > > > the job.
> > >
> > > To be honest, the only good point of TStringList seems to be that it is
> > > default available on all Object Pascal. The specialised stuff (splitting
> > > strings) is also plagued with oddities. (most notably the fact that
> > > characters under space are always used as separator)
> >
> > Why do you call this oddities ? For GUI programming (and that's what it was
> > implemented for) it makes perfect sense.
>
> The problem is that it is used for way more. It is container type, splitter,
> GUI data backstore etc. One can argue about how it is was meant originally,
> but this is the practice now. And a lot of code scales bad because of it.

Ok, that means we just have to separate out the container type.

>
> > > > I just want to illustrate that, if programmed correctly, TList,
> > > > TStringList and friends can still get you a long way...
> > >
> > > I think that the lengths to which people will go to stick to them paints
> > > what is needed to make a serious effort to make them legacy.
> >
> > I fail to see why, but no doubt you have your reasons. I'd like to see a complete
> > list of 'issues' with TStrings (not TStringList, that's just a specific implementation).
>
> I think we would have to agree first on the target of TStrings and -list.
> But the main reason is simply bad scaling. I want to be able to use a
> container type, and not first wrap it, or risk scaling issues later.
>
> A language/RTL should provide such type, and not position a type originally
> meant for GUI bindings with internal limits as basetype for datastructures.
>
> Not lots of new machines have memories that easily allow such large
> datastructures.
>
> The end conclusion can also be we need to promote e.g. contnrs types more. My
> remark here was more the backwards delphi compat (e.g. FPC extensions should
> not be in contnrs, but in a different unit that also compiles on Delphi)

I understand.

>
> > Maybe we can then create a TFPStrings which would be interface compatible,
> > (in the sense that it can replace TStrings without source changes. It can
> > have internal differences, like the space handling) but with a more sensible
> > behaviour ?
>
> IMHO
> - linesplitting and container function should be separated.

You can do that using encapsulation, see TList/TFPList.
Tlist adds the notification mechanism, which I've never
used in any code I wrote, but which exists in Delphi
and slows things down, as there is no way to switch it off.

I would say a TFPStringList/TExtStringList where TFPStringList only
provides storage, and TExtStringList provides all the rest, but  uses
TFPStringList for storage.

(names are not meant as binding, just examples to show the point)

> - GUI and non GUI interfacing (data structures) should be separated.

I'm afraid I don't understand what you mean with this ?

> - There should be default containers that are not vectors (there are some in
> contnrs). IOW more java style iterators.

I proposed exactly that in the discussion about Generics.

>
> > > There should be a set of container classes in a separate unit (a unit not
> > > existing in Delphi most notably) that is Open Source and works on Delphi
> > > too.
> >
> > Like Decal was set up to be ?
>
> More or less yes, with one big problem here. Decal abused interfaces and
> variants as a way around missing generics, causing larger performance
> problems than it solved.
>
> However generics support in FPC won't solve it either, since then TStringList and TList will
> still be the lowest common denomitor. And a lot of programmers need to keep
> Delphi compat.  Some coordination to have some congruency between generic
> and not generic types might be wise.
>  
> > I'm all for starting such a thing. I would be the first one to use such a beast if it was better.
> > It was the prime reason for implementing TFPlist.
>
> ...Which is also in a unit that you can't simply USE in Delphi.

It isn't meant to be.

I want people to move to FPC because it actually provides better algorithms,
while still enabling them to compile their old Delphi code. We must have some
unique selling proposition.

Like a well-known man once said: the goal is world domination. ;-)

Seriously: I'm all for providing better base classes.
As long as they're not generics based, it's fine with me.

Michael.
_______________________________________________
fpc-pascal maillist  -  [hidden email]
http://lists.freepascal.org/mailman/listinfo/fpc-pascal