TSQLQuery and buffering.

classic Classic list List threaded Threaded
22 messages Options
12
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

TSQLQuery and buffering.

Gary Doades

Hi everyone,

 

Firstly, I realise this is an old subject, but I still can’t find any easy answer.

 

Really, the question is simple: Is there ANY way of TSQLQuery NOT reading the entire result set into memory?

 

This is really killing me. I’ve got quite a lot of web code using sqldb and it works really well. The classes for connection, transaction, query etc. are perfect for general use and make it so easy to write database applications. I find it excellent that I only need to change the connection class in my code to use a different database.

 

For web code (mine included), fetching a few dozen, hundred or even a few thousand rows is no real problem. The sqldb classes are *very* fast and very easy to use.

 

The problem comes when I need to write some apps to either analyse or move a lot of data. Initially all I need to do is analyse some data by simply reading through rows, doing some calculations and other analysis. For even 1 million rows this is very fast (10 seconds using MySQL) so no major problems. However, it does use quite a lot of memory.

 

Further use of this simply explodes. I have tables for analysis (and later extracting/moving) that are over 40 million rows. This currently is simply not possible in fpc, at least using the excellent sqldb classes. After blowing 16GB of RAM+swap it takes a while to recover.

 

To be fair I’ve tried looking through the code and see how it’s implemented, but I can’t see any easy way of avoiding the memory buffering of the entire result set.

 

Apart from re-implementing a lot of classes or going down to the driver level I can’t currently see any easy way round this. I know there is some old discussion on this, but I’m kind of hoping that there may have been a few changes or something recently that I’ve missed.

 

Any help appreciated.

 

Regards,

Gary.

 

 


_______________________________________________
fpc-pascal maillist  -  [hidden email]
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: TSQLQuery and buffering.

Michael Van Canneyt


On Fri, 24 Mar 2017, Gary Doades wrote:

> Hi everyone,
>
> Firstly, I realise this is an old subject, but I still can't find any easy
> answer.
>
>
>
> Really, the question is simple: Is there ANY way of TSQLQuery NOT reading
> the entire result set into memory?

Set Unidirectional to True, and it will keep only 1 row in memory.
When you are simply scanning through the result set, this is all you need.

The problem is that most drivers also do not support arbitrary navigation
in the result set, so when we want to support navigating backwards, we have
no choice but to keep everything in memory. (barring re-running the query
and refetching everything, I'm sure you don't want that either)

If you need to move back and forth in the result set, then you 'll need to
set unidirectional to true, and store the needed records yourself.  You can
copy the structure of the dataset really easy to a TBufDataset using
copyfromdataset).

in the TBufDataset you can keep as many records as you like, and just delete the old
ones (you need to call mergechangelog or somesuch).
We could maybe try to find a way to copy the buffer from one
bufdataset to another (since TSQLQuery is just a TBufDataset Descendent)

Michael.
_______________________________________________
fpc-pascal maillist  -  [hidden email]
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: TSQLQuery and buffering.

Gary Doades

On Fri, 24 Mar 2017, Gary Doades wrote:
>
>> Really, the question is simple: Is there ANY way of TSQLQuery NOT
>> reading the entire result set into memory?

> Set Unidirectional to True, and it will keep only 1 row in memory.
> When you are simply scanning through the result set, this is all you need.

Hi Michael,

Many thanks for your reply. I had read about UniDirectional and I have indeed tried this. It doesn't seem to make any significant difference. Looking through the source code for TBufDataset it looks like UniDirectional just turns off building various indexes/structures etc. and fetches the result set all in one go instead of 10 row chunks. It still buffers everything in memory.

I can see this from running a test program. When I call the SQL query .Open method then I can see from Task Manager that it is fetching all rows from the database. My test program reaches about 1.7GB of ram fetching 4.8 million rows.

Only after the complete fetching does the next line of code execute and my processing loop start. Clearly this is looping over the now buffered dataset. The memory is finally freed when the Query's Close method is called.

It may be that UniDirectional is meant to not buffer all rows, but at the moment it certainly seems to. This is FPC 3.0.2 BTW. I've also tried setting ReadOnly to true on the SQL Query.

I Understand about the "normal" use of TSQLQuery and data sets and I also use this for small result set that need the data changing some way and writing back to the DB. What I need now is an equally convenient and powerful way of processing "big data".

Thanks,
Gary.



_______________________________________________
fpc-pascal maillist  -  [hidden email]
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: TSQLQuery and buffering.

Martin Schreiber-2
On Saturday 25 March 2017 09:32:33 Gary Doades wrote:
>
> I Understand about the "normal" use of TSQLQuery and data sets and I also
> use this for small result set that need the data changing some way and
> writing back to the DB. What I need now is an equally convenient and
> powerful way of processing "big data".
>
MSEgui has TSQLResult for that purpose without the TDataset overhead.
https://gitlab.com/mseide-msegui/mseide-msegui/blob/master/lib/common/db/msesqlresult.pas

Martin
_______________________________________________
fpc-pascal maillist  -  [hidden email]
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: TSQLQuery and buffering.

Michael Van Canneyt


On Sat, 25 Mar 2017, Martin Schreiber wrote:

> On Saturday 25 March 2017 09:32:33 Gary Doades wrote:
>>
>> I Understand about the "normal" use of TSQLQuery and data sets and I also
>> use this for small result set that need the data changing some way and
>> writing back to the DB. What I need now is an equally convenient and
>> powerful way of processing "big data".
>>
> MSEgui has TSQLResult for that purpose without the TDataset overhead.
> https://gitlab.com/mseide-msegui/mseide-msegui/blob/master/lib/common/db/msesqlresult.pas
>

Can you explain in 2 words how this differs from TDataset ?
At first sight it is not fundamentally different.
The structure looks almost exactly the same, just the names are different.

And while you're at it, please explain what the 'overhead' is of TDataset.
That TBufDataset introduces overhead is a design decision. But TDataset
itself introduces no "overhead".

Michael.
_______________________________________________
fpc-pascal maillist  -  [hidden email]
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: TSQLQuery and buffering.

Michael Van Canneyt
In reply to this post by Gary Doades


On Sat, 25 Mar 2017, Gary Doades wrote:

>
> On Fri, 24 Mar 2017, Gary Doades wrote:
>>
>>> Really, the question is simple: Is there ANY way of TSQLQuery NOT
>>> reading the entire result set into memory?
>
>> Set Unidirectional to True, and it will keep only 1 row in memory.
>> When you are simply scanning through the result set, this is all you need.
>
> Hi Michael,
>
> Many thanks for your reply. I had read about UniDirectional and I have indeed tried this. It doesn't seem to make any significant difference. Looking through the source code for TBufDataset it looks like UniDirectional just turns off building various indexes/structures etc. and fetches the result set all in one go instead of 10 row chunks. It still buffers everything in memory.

In that case, it is a bug. The very purpose of UniDirectional is
exactly NOT to buffer anything, just to keep the current record in memory.

Michael.
_______________________________________________
fpc-pascal maillist  -  [hidden email]
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: TSQLQuery and buffering.

Gary Doades
In reply to this post by Martin Schreiber-2
>
> MSEgui has TSQLResult for that purpose without the TDataset overhead.
> https://gitlab.com/mseide-msegui/mseide-msegui/blob/master/lib/common/db/msesqlresult.pas

Thanks.

 Although it looks interesting I'd rather not get into another set of classes etc. at this point. I may come back to it later though!

_______________________________________________
fpc-pascal maillist  -  [hidden email]
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: TSQLQuery and buffering.

Gary Doades
In reply to this post by Michael Van Canneyt
>> I had read about UniDirectional and I have indeed tried this. It doesn't seem to make any significant difference. Looking through the source code for
>> TBufDataset it looks like UniDirectional just turns off building various indexes/structures etc. and fetches the result set all in one go instead of 10 row chunks.
>> It still buffers everything in memory.

> In that case, it is a bug. The very purpose of UniDirectional is exactly NOT to buffer anything, just to keep the current record in memory.

Ah, OK. I might have another trawl through the sqldb source then to see if it is a simple mistake I can fix or a bigger job.

I guess I should file a bug report?

Thanks,
Gary.


_______________________________________________
fpc-pascal maillist  -  [hidden email]
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: TSQLQuery and buffering.

Martin Schreiber-2
In reply to this post by Michael Van Canneyt
On Saturday 25 March 2017 10:28:37 Michael Van Canneyt wrote:

> On Sat, 25 Mar 2017, Martin Schreiber wrote:
> > On Saturday 25 March 2017 09:32:33 Gary Doades wrote:
> >> I Understand about the "normal" use of TSQLQuery and data sets and I
> >> also use this for small result set that need the data changing some way
> >> and writing back to the DB. What I need now is an equally convenient and
> >> powerful way of processing "big data".
> >
> > MSEgui has TSQLResult for that purpose without the TDataset overhead.
> > https://gitlab.com/mseide-msegui/mseide-msegui/blob/master/lib/common/db/
> >msesqlresult.pas
>
> Can you explain in 2 words how this differs from TDataset ?
> At first sight it is not fundamentally different.
> The structure looks almost exactly the same, just the names are different.
>
Please follow the datatransfer from database to the destination in application
in T*Dataset and compare it with TSQLResult and you probably will see the
difference.

> And while you're at it, please explain what the 'overhead' is of TDataset.
> That TBufDataset introduces overhead is a design decision. But TDataset
> itself introduces no "overhead".
>
TDataset/TDataSource/TDatalink/TField is a complex machine, don't you think?
Follow the code of a TDataset.Next() call step by step and you will see what
I mean.

Martin
_______________________________________________
fpc-pascal maillist  -  [hidden email]
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: TSQLQuery and buffering.

Michael Van Canneyt
In reply to this post by Gary Doades


On Sat, 25 Mar 2017, Gary Doades wrote:

>>> I had read about UniDirectional and I have indeed tried this. It doesn't seem to make any significant difference. Looking through the source code for
>>> TBufDataset it looks like UniDirectional just turns off building various indexes/structures etc. and fetches the result set all in one go instead of 10 row chunks.
>>> It still buffers everything in memory.
>
>> In that case, it is a bug. The very purpose of UniDirectional is exactly NOT to buffer anything, just to keep the current record in memory.
>
> Ah, OK. I might have another trawl through the sqldb source then to see if it is a simple mistake I can fix or a bigger job.
>
> I guess I should file a bug report?

Please do. And if you find the cause, please attach a patch :)


Michael.
_______________________________________________
fpc-pascal maillist  -  [hidden email]
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: TSQLQuery and buffering.

Michael Van Canneyt
In reply to this post by Martin Schreiber-2
r

On Sat, 25 Mar 2017, Martin Schreiber wrote:

> On Saturday 25 March 2017 10:28:37 Michael Van Canneyt wrote:
>> On Sat, 25 Mar 2017, Martin Schreiber wrote:
>> > On Saturday 25 March 2017 09:32:33 Gary Doades wrote:
>> >> I Understand about the "normal" use of TSQLQuery and data sets and I
>> >> also use this for small result set that need the data changing some way
>> >> and writing back to the DB. What I need now is an equally convenient and
>> >> powerful way of processing "big data".
>> >
>> > MSEgui has TSQLResult for that purpose without the TDataset overhead.
>> > https://gitlab.com/mseide-msegui/mseide-msegui/blob/master/lib/common/db/
>> >msesqlresult.pas
>>
>> Can you explain in 2 words how this differs from TDataset ?
>> At first sight it is not fundamentally different.
>> The structure looks almost exactly the same, just the names are different.
>>
> Please follow the datatransfer from database to the destination in application
> in T*Dataset and compare it with TSQLResult and you probably will see the
> difference.

Looking at the above unit, I can only presume you load the field directly
from the cursor returned by the native library.

Makes sense.

>
>> And while you're at it, please explain what the 'overhead' is of TDataset.
>> That TBufDataset introduces overhead is a design decision. But TDataset
>> itself introduces no "overhead".
>>
> TDataset/TDataSource/TDatalink/TField is a complex machine, don't you think?
> Follow the code of a TDataset.Next() call step by step and you will see what
> I mean.

As far as I can make out, TSQLResult is usable only for scanning a result set.
TDataset has a lot more functions, hence has a lot more overhead.

But TSQLResult is an interesting idea; I've long been looking for something
more lightweight for use in web apps, where the typical use case is just
scan a result set and transfer it to the browser. I had thought to reuse
TSQLStatement and keep TField/TFieldDef so as not to invent too much new
classes. I'll maybe use your code as inspiration.

Thanks!

Michael.
_______________________________________________
fpc-pascal maillist  -  [hidden email]
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: TSQLQuery and buffering.

Jesus Reyes A.
In reply to this post by Gary Doades
En Sat, 25 Mar 2017 02:32:33 -0600, Gary Doades <[hidden email]>  
escribió:

> Many thanks for your reply. I had read about UniDirectional and I have  
> indeed tried this. It doesn't seem to make any significant difference.  
> Looking through the source code for TBufDataset it looks like  
> UniDirectional just turns off building various indexes/structures etc.  
> and fetches the result set all in one go instead of 10 row chunks. It  
> still buffers everything in memory.
>

 From your original message one could think that you were using  
PacketRecords:=-1 which means fetch all records at once, but if you are  
using the standard setting, which is 10 and yet it is still fetching  
everything at once, it sounds like a bug to me.

Jesus Reyes A.
_______________________________________________
fpc-pascal maillist  -  [hidden email]
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: TSQLQuery and buffering.

Martin Schreiber-2
In reply to this post by Michael Van Canneyt
On Sunday 26 March 2017 18:37:36 Michael Van Canneyt wrote:
>
>
> Looking at the above unit, I can only presume you load the field directly
> from the cursor returned by the native library.
>
Correct.
>
> As far as I can make out, TSQLResult is usable only for scanning a result
> set. TDataset has a lot more functions, hence has a lot more overhead.
>
Correct. Therefore MSEgui has TmseBufdataset/TmseSQLQuery when the whole
shebang is needed and TSQLResult when it is not. There is also the descendant
TifiSQLResult which can be used as source for TConnectedifiDatasource in
order to feed ifi-datapoints with DB-data. TSQLResult also can be used as
source for TSQLLookupBuffer.

Martin
_______________________________________________
fpc-pascal maillist  -  [hidden email]
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: TSQLQuery and buffering.

Gary Doades
In reply to this post by Jesus Reyes A.

> From your original message one could think that you were using
> PacketRecords:=-1 which means fetch all records at once, but if you are using the standard setting, which is 10 and yet it is still fetching everything at once, it
> sounds like a bug to me.

As far as I can tell setting UniDirectional to true also sets PacketRecords to -1. Even if PacketRecords were still 10 though it still buffers the entire result set, but it just fetches 10 rows at a time from the DB server until it has them all.

Thanks,
Gary.

_______________________________________________
fpc-pascal maillist  -  [hidden email]
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: TSQLQuery and buffering.

LacaK
In reply to this post by Gary Doades

>
> It may be that UniDirectional is meant to not buffer all rows, but at the moment it certainly seems to.
Strange, looking at source code it seems to me, that buffering should
not happen.
As far as TUniDirectionalBufIndex should be used and his AddRecord
method does not allocate new memory.

L.

_______________________________________________
fpc-pascal maillist  -  [hidden email]
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: TSQLQuery and buffering.

Gary Doades

>
>> It may be that UniDirectional is meant to not buffer all rows, but at the moment it certainly seems to.
> Strange, looking at source code it seems to me, that buffering should not happen.
> As far as TUniDirectionalBufIndex should be used and his AddRecord method does not allocate new memory.

Indeed, that's why I can't currently see where the problem lies. It shouldn't buffer the rows/records, but it does... or at least something does and I'm pretty sure it's not my program. I've pared my test program down to the bare minimum of just looping over the result set, but all the actual fetching from the DB and buffering occurs on just the Open call on the TSQLQuery.

Thanks,
Gary.


_______________________________________________
fpc-pascal maillist  -  [hidden email]
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: TSQLQuery and buffering.

LacaK

>>> It may be that UniDirectional is meant to not buffer all rows, but at the moment it certainly seems to.
>> Strange, looking at source code it seems to me, that buffering should not happen.
>> As far as TUniDirectionalBufIndex should be used and his AddRecord method does not allocate new memory.
> Indeed, that's why I can't currently see where the problem lies. It shouldn't buffer the rows/records, but it does... or at least something does and I'm pretty sure it's not my program. I've pared my test program down to the bare minimum of just looping over the result set, but all the actual fetching from the DB and buffering occurs on just the Open call on the TSQLQuery.
Can you please attach your test program ? (most simplified version,
which I can use to test)
L.

_______________________________________________
fpc-pascal maillist  -  [hidden email]
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: TSQLQuery and buffering.

Martin Schreiber-2
In reply to this post by Gary Doades
On Monday 27 March 2017 10:20:20 Gary Doades wrote:

> >> It may be that UniDirectional is meant to not buffer all rows, but at
> >> the moment it certainly seems to.
> >
> > Strange, looking at source code it seems to me, that buffering should not
> > happen. As far as TUniDirectionalBufIndex should be used and his
> > AddRecord method does not allocate new memory.
>
> Indeed, that's why I can't currently see where the problem lies. It
> shouldn't buffer the rows/records, but it does... or at least something
> does and I'm pretty sure it's not my program.

The DB-client library maybe?

Martin
_______________________________________________
fpc-pascal maillist  -  [hidden email]
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: TSQLQuery and buffering.

Gary Doades
>
>> Indeed, that's why I can't currently see where the problem lies. It
>> shouldn't buffer the rows/records, but it does... or at least
>> something does and I'm pretty sure it's not my program.

> The DB-client library maybe?

Yup!

I was running some other tests and copied the table contents over to both postgres and MS SQL Server.

Postgres has the same issue, all rows are (seemingly) buffered on the TSQLQuery.Open and huge amounts of RAM used. However, using MS SQL Server almost no memory is used at all! All program results are identical and all I did in the program was swap out the TSQLConnection with different "drivers".

The problem therefore lies in either the Pascal layer on top of the native client libs or in the way the client libs themselves work :( I suspect the latter.

I will need to do some more investigation in a different way if I am to work round this one.

I apologise for potentially wasting people's time. It might be good for people to know the above issues (limitations?) anyway.

Thanks,
Gary.




_______________________________________________
fpc-pascal maillist  -  [hidden email]
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: TSQLQuery and buffering.

Michael Van Canneyt


On Mon, 27 Mar 2017, Gary Doades wrote:

>>>> Indeed, that's why I can't currently see where the problem lies. It
>>> shouldn't buffer the rows/records, but it does... or at least
>>> something does and I'm pretty sure it's not my program.
>
>> The DB-client library maybe?
>
> Yup!
>
> I was running some other tests and copied the table contents over to both postgres and MS SQL Server.
>
> Postgres has the same issue, all rows are (seemingly) buffered on the
> TSQLQuery.Open and huge amounts of RAM used.  However, using MS SQL Server
> almost no memory is used at all!  All program results are identical and
> all I did in the program was swap out the TSQLConnection with different
> "drivers".
>
> The problem therefore lies in either the Pascal layer on top of the native client libs or in the way the client libs themselves work :( I suspect the latter.

As far as I know, the DB-Specific pascal layer does not buffer anything, it just fetches the result.
There is of course little to no control over how the client lib fetches the result.

If memory serves well, mySQL has 2 separate calls: mysql_use_result and mysql_store_result
where the difference is exactly how it fetches the result set.

Postgres probably fetches everything;
I don't see how PQGetValue can work index-based otherwise.

Fore interbase/firebird, I don't know.

>
> I will need to do some more investigation in a different way if I am to work round this one.
>
> I apologise for potentially wasting people's time. It might be good for people to know the above issues (limitations?) anyway.

Please report your findings, if any :)

Michael.
_______________________________________________
fpc-pascal maillist  -  [hidden email]
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
12
Loading...