File Enumeration speed

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

File Enumeration speed

Steve Gatenby
I am enumerating thru large numbers of files on my disk, and find I cant
come close with findfirst / findnext to matching the speed of cmd line
apps available in linux :eg ls / du

I have a fairly tight file search function, and dont see how to gain
more speed

Would anybody know what the limiting factors would be ?

does the operating system keep an index somewhere ?

Thanks - SteveG
_______________________________________________
fpc-pascal maillist  -  [hidden email]
http://lists.freepascal.org/mailman/listinfo/fpc-pascal
Reply | Threaded
Open this post in threaded view
|

Re: File Enumeration speed

Michael Van Canneyt


On Sat, 28 Jul 2012, SteveG wrote:

> I am enumerating thru large numbers of files on my disk, and find I cant come
> close with findfirst / findnext to matching the speed of cmd line apps
> available in linux :eg ls / du

A regular ls only does a getdents() call.

FindFirst/FindNext does a getdents, but then additionally, per file in the result, a stat() call.

> I have a fairly tight file search function, and dont see how to gain more
> speed
>
> Would anybody know what the limiting factors would be ?

The number of calls to stat() to get extended file information.

I suspect that if  you do a ls -l, it will be as slow as findfirst/findnext, because it does then 3 calls per file:

from strace ls -l /etc I get:

lstat("/etc/odbc.ini", {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
lgetxattr("/etc/odbc.ini", "security.selinux", 0x14de920, 255) = -1 ENODATA (No data available)
getxattr("/etc/odbc.ini", "system.posix_acl_access", 0x0, 0) = -1 EOPNOTSUPP (Operation not supported)

If you want speedier operation, and have enough file information with the name, you can simply do a getdents().

> does the operating system keep an index somewhere ?

Normally not (at least other than the regular disc cache).

Michael.
_______________________________________________
fpc-pascal maillist  -  [hidden email]
http://lists.freepascal.org/mailman/listinfo/fpc-pascal
Reply | Threaded
Open this post in threaded view
|

Re: File Enumeration speed

Steve Gatenby
On 28/07/12 19:58, Michael Van Canneyt wrote:

>
>
> On Sat, 28 Jul 2012, SteveG wrote:
>
>> I am enumerating thru large numbers of files on my disk, and find I
>> cant come close with findfirst / findnext to matching the speed of
>> cmd line apps available in linux :eg ls / du
>
> A regular ls only does a getdents() call.
>
> FindFirst/FindNext does a getdents, but then additionally, per file in
> the result, a stat() call.
>
>> I have a fairly tight file search function, and dont see how to gain
>> more speed
>>
>> Would anybody know what the limiting factors would be ?
>
> The number of calls to stat() to get extended file information.
>
> I suspect that if  you do a ls -l, it will be as slow as
> findfirst/findnext, because it does then 3 calls per file:
>
> from strace ls -l /etc I get:
>
> lstat("/etc/odbc.ini", {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
> lgetxattr("/etc/odbc.ini", "security.selinux", 0x14de920, 255) = -1
> ENODATA (No data available)
> getxattr("/etc/odbc.ini", "system.posix_acl_access", 0x0, 0) = -1
> EOPNOTSUPP (Operation not supported)
>
> If you want speedier operation, and have enough file information with
> the name, you can simply do a getdents().
>
>> does the operating system keep an index somewhere ?
>
> Normally not (at least other than the regular disc cache).
>
>
Thanks Michael - I will do some study (ie find out what getdents() does)

_______________________________________________
fpc-pascal maillist  -  [hidden email]
http://lists.freepascal.org/mailman/listinfo/fpc-pascal
Reply | Threaded
Open this post in threaded view
|

Re: File Enumeration speed

leledumbo
Administrator
In reply to this post by Steve Gatenby
> does the operating system keep an index somewhere ?

ls doesn't, find does AFAIK.
Reply | Threaded
Open this post in threaded view
|

Re: File Enumeration speed

etrusco
On Sat, Jul 28, 2012 at 9:07 AM, leledumbo <[hidden email]> wrote:
>> does the operating system keep an index somewhere ?
>
> ls doesn't, find does AFAIK.
>

"find" doesn't have any index over "ls". You may be thinking of
"locate" (actually "located") that keeps a textfile with the list of
files.

Best regards,
Flávio
_______________________________________________
fpc-pascal maillist  -  [hidden email]
http://lists.freepascal.org/mailman/listinfo/fpc-pascal
Reply | Threaded
Open this post in threaded view
|

Re: File Enumeration speed

Marco van de Voort
In reply to this post by Michael Van Canneyt
In our previous episode, Michael Van Canneyt said:
> lgetxattr("/etc/odbc.ini", "security.selinux", 0x14de920, 255) = -1 ENODATA (No data available)
> getxattr("/etc/odbc.ini", "system.posix_acl_access", 0x0, 0) = -1 EOPNOTSUPP (Operation not supported)
>
> If you want speedier operation, and have enough file information with the name, you can simply do a getdents().

I wouldn't do that for two reasons:

1) getdents is not portable at the syscall level (unix systems either implement
getdirentries or getdents, not both). This is why getdents is not available
via portable calls.
2) it is very sensitive.

Basically findfirst is a opendir/readdir/closedir loop where opendir and/or readdir
call getdents, with a stat per file.

IOW one can avoid getdents by doing opendir/readdir/closedir, and be as
fast, AND be portable.
 
> > does the operating system keep an index somewhere ?
>
> Normally not (at least other than the regular disc cache).

The getdents calls afaik simply returns blocks from the directory file. The
basic block level cache of the filesystem driver therefore works perfectly
for it.
_______________________________________________
fpc-pascal maillist  -  [hidden email]
http://lists.freepascal.org/mailman/listinfo/fpc-pascal