Multi-threaded project with few locks (no Thread.waitfor). Memory consumption keeps increasing on Ubuntu 10.10 x64

classic Classic list List threaded Threaded
82 messages Options
12345
Reply | Threaded
Open this post in threaded view
|

Multi-threaded project with few locks (no Thread.waitfor). Memory consumption keeps increasing on Ubuntu 10.10 x64

Andrew Brunner
After reading up on multi-core systems and memory barriers I decided
to do a few FPC stress tests with a brand new 6 core x64.
I tested this app on Ubuntu 10.10 x64 desktop release.   Using today's
FPC/Lazarus from svn.

The test case (attached) requires that you initialise before using.
The destroy button terminates the engine and all threads.  The engine
can be commanded to start/stop/set thread count.

If you initialise and start you will see the memory go up as expected.
 The problem is when you destroy the engine instance only to
initialise and start again, not all memory is re-claimed.

uThreads.TTestThread.Execute loops 0.5 billion iterations allocating
random string entries in a TStringArray. Inside the loop the engine
calls Empty(StringArray) to empty and reclaim memory of the simulated
log entry.  The empty procedure calls finalize on each element of the
string array and then setlength(list,0); to be sure there are no leaks
there.

The problem is noticeable when you create and destroy the engine (over
and over again) I used the Task Manager in Ubuntu for system resource
monitoring.  It shows the ThreadMemTest app eating more and more
resident memory as the system is initialised.  Starting and stopping
the engine also does a bit too but not nearly as bad.

The application is stable at running.  It can be
started/stopped/freed/created on the fly and when you exit the
application the program closes gracefully.  I don't get any read
access violations.  Meaning code execution order is certainly not a
suspect here.

This test case illustrates a FPC memory leak.  The thing I don't get
is where are the access violations?  Why is Linux reporting that
memory as still in use?

The interesting thing I have noticed was that Arrays[n] of boolean can
be used without memory barriers.  There is not one lock associated
with the boolean arrays and it always proves non-problematic on a 6
core system with 4gig ram.  There are boolean value checks that I did
inside the loops to see if any values were assigned out-of-order and
over the hours of tests I ran across up to 1200 threads... not one
false positive!

Any Memory sleuths interested to help?  Anyone interested in helping
me isolate the memory leak to submit a bug?  Anyone have comments on
this test case I'd like to submit?

If you could compile this on your box and report back if you had any
read access violations or suggestions to bypass/resolve the memory
issue I'm seeing here. Since the program is written to determine the
stability of FPC compiled apps, I didn't have time to add logging
features or anything.

_______________________________________________
fpc-pascal maillist  -  [hidden email]
http://lists.freepascal.org/mailman/listinfo/fpc-pascal

ThreadMemTest.zip (9K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Multi-threaded project with few locks (no Thread.waitfor). Memory consumption keeps increasing on Ubuntu 10.10 x64

Florian Klämpfl
Am 13.10.2010 00:51, schrieb Andrew Brunner:
>
> This test case illustrates a FPC memory leak.  

What makes you think so? Internally freed memory is not immediatly
released to the OS as well.
_______________________________________________
fpc-pascal maillist  -  [hidden email]
http://lists.freepascal.org/mailman/listinfo/fpc-pascal
Reply | Threaded
Open this post in threaded view
|

Re: Multi-threaded project with few locks (no Thread.waitfor). Memory consumption keeps increasing on Ubuntu 10.10 x64

Andrew Brunner
Since the number of threads is set at compile time, I didn't expect to
see the much memory creep.  The resource monitor for the process shows
about 15Mb of memory used.  When the first group of 50 threads was
created and ran that climbed another 5 to about 20MB.  Then every time
the system created / destroyed / created itself it added an additional
100kb.

So I ran an automation of this last night.  I was expecting to see the
system to be well into 100s of MB of RAM and the virtual memory way
past the 3gigs as reported.  That was not the case.  Physical memory
stopped creaping up at 45mb and the virtual memory peaked out at 4gigs
over the course of the entire night.  Meaning threads were
created/destroyed/ and new ones were created over and over again.  The
program exited gracefully as well this morning.

>> This test case illustrates a FPC memory leak.

I realised when I wrote that, that it may not be entirely accurate so
that's why I did the automation all night to see how bad things might
be.  It turns out that things remained stable and creep levelled out
but it was very strange to see.

On Wed, Oct 13, 2010 at 1:58 AM, Florian Klaempfl
<[hidden email]> wrote:
> Am 13.10.2010 00:51, schrieb Andrew Brunner:
>>
>> This test case illustrates a FPC memory leak.
>
> What makes you think so? Internally freed memory is not immediatly
> released to the OS as well.

On windows I know there is a call to flush all virtual memory out.  Is
there a wrapper in fpc for *nix and windows?
_______________________________________________
fpc-pascal maillist  -  [hidden email]
http://lists.freepascal.org/mailman/listinfo/fpc-pascal
Reply | Threaded
Open this post in threaded view
|

Re: Multi-threaded project with few locks (no Thread.waitfor). Memory consumption keeps increasing on Ubuntu 10.10 x64

Andrew Brunner
In reply to this post by Andrew Brunner
On Tue, Oct 12, 2010 at 5:51 PM, Andrew Brunner
<[hidden email]> wrote:

Another problem demonstrated with this application is the limiting
factor of thread creation.  I'd like to make a complaint using this
code as well.  Change the number of threads to 3000.  The system gets
to about 1,000 and starts to "bog down".  I have a significantly fast
computer here and I can tell that thread creation is not supposed to
be this slow.

Under delphi 2006 (windows) I was able to create up to 3000 threads.
I recall stories of other programs in Java running well past that
3000.  Why does fpc handle threads in a way that causes creation to
slow down the more you have (as they approach 1,000) things slow to a
stall.

Can someone explain the difference?  It seems to me like there is a
list of threads being tracked by FPC or the memory manager is using
locks and the other threads are borrowing the locks causing a
sigificant reduction in performance over competing development
platforms (such as C# or Java or Delphi).

I'd like to get this resolved b/c my main system will potentially have
tens of thousands of threads on serious hardware.
_______________________________________________
fpc-pascal maillist  -  [hidden email]
http://lists.freepascal.org/mailman/listinfo/fpc-pascal
Reply | Threaded
Open this post in threaded view
|

Re: Multi-threaded project with few locks (no Thread.waitfor). Memory consumption keeps increasing on Ubuntu 10.10 x64

Jonas Maebe-2
In reply to this post by Andrew Brunner

On 13 Oct 2010, at 00:51, Andrew Brunner wrote:

> The interesting thing I have noticed was that Arrays[n] of boolean can
> be used without memory barriers.  There is not one lock associated
> with the boolean arrays and it always proves non-problematic on a 6
> core system with 4gig ram.  There are boolean value checks that I did
> inside the loops to see if any values were assigned out-of-order and
> over the hours of tests I ran across up to 1200 threads... not one
> false positive!

See also http://en.wikipedia.org/wiki/Memory_ordering#cite_note- 
table-2 for an overview of what kind of memory reordering is performed  
by different architectures . It shows that x86 CPUs only perform one  
kind of memory reordering (except if it supports and is explicitly put  
into oostore mode). The reordering it supports by default can execute  
stores that come before a load in the program code, after that load  
instead. This means that if you use a regular variable (such as a  
boolean) for synchronisation

1) on entry of the "critical section" protected by this variable, you  
can have problems, because this sequence:

locked:=true;
local:=shared_global_var;

may actually be executed in this order:

local:=shared_global_var;
locked:=true;

So you can get speculative reads into the "critical section"

2) when exiting the "critical section", there are no problems, because  
none of the loads or stores before the one that sets the boolean  
"lock" variable to false, can be moved past that store.


In summary, the fact that a particular program runs fine on your  
particular machine does not mean anything:
a) your particular machine may not perform any kind of reordering that  
results in problems
b) your particular program may not expose any kind of reordering that  
results in problems

That does not mean that automatically the program "can be used without  
memory barriers". It is virtually impossible to prove correctness of  
multi-threaded code running on multi-cores through testing, and it is  
literally impossible to prove it for all possible machines by testing  
on a single machine (even if that machine has 4096 cores and runs  
16000 threads), simply because other machines may use different memory  
consistency models.


Jonas
_______________________________________________
fpc-pascal maillist  -  [hidden email]
http://lists.freepascal.org/mailman/listinfo/fpc-pascal
Reply | Threaded
Open this post in threaded view
|

Re: Multi-threaded project with few locks (no Thread.waitfor). Memory consumption keeps increasing on Ubuntu 10.10 x64

Michael Van Canneyt
In reply to this post by Andrew Brunner


On Wed, 13 Oct 2010, Andrew Brunner wrote:

> On Tue, Oct 12, 2010 at 5:51 PM, Andrew Brunner
> <[hidden email]> wrote:
>
> Another problem demonstrated with this application is the limiting
> factor of thread creation.  I'd like to make a complaint using this
> code as well.  Change the number of threads to 3000.  The system gets
> to about 1,000 and starts to "bog down".  I have a significantly fast
> computer here and I can tell that thread creation is not supposed to
> be this slow.
>
> Under delphi 2006 (windows) I was able to create up to 3000 threads.
> I recall stories of other programs in Java running well past that
> 3000.  Why does fpc handle threads in a way that causes creation to
> slow down the more you have (as they approach 1,000) things slow to a
> stall.

Probably because it uses a heap manager per thread.

You may try to use 'cmem', which will replace the heap manager with the C
memory manager (one for the whole app, not per thread). That will allow you
to test this hypothesis.

Michael.
_______________________________________________
fpc-pascal maillist  -  [hidden email]
http://lists.freepascal.org/mailman/listinfo/fpc-pascal
Reply | Threaded
Open this post in threaded view
|

Re: Multi-threaded project with few locks (no Thread.waitfor). Memory consumption keeps increasing on Ubuntu 10.10 x64

Jonas Maebe-2
In reply to this post by Jonas Maebe-2

On 13 Oct 2010, at 15:27, Jonas Maebe wrote:

> 1) on entry of the "critical section" protected by this variable,  
> you can have problems, because this sequence:
>
> locked:=true;
> local:=shared_global_var;

Of course, you normally need an atomic operation here to set "locked"  
to true (otherwise multiple threads can set it true at the same time),  
which is presumably why the x86 performs this kind of reordering (it  
will not reorder past atomic loads/stores). And you don't need an  
atomic operation to unlock, which is why it presumably does not  
perform any reordering in that situation.


Jonas
_______________________________________________
fpc-pascal maillist  -  [hidden email]
http://lists.freepascal.org/mailman/listinfo/fpc-pascal
Reply | Threaded
Open this post in threaded view
|

Re: Multi-threaded project with few locks (no Thread.waitfor). Memory consumption keeps increasing on Ubuntu 10.10 x64

Andrew Brunner
In reply to this post by Jonas Maebe-2
On Wed, Oct 13, 2010 at 8:27 AM, Jonas Maebe <[hidden email]> wrote:

>
> 1) on entry of the "critical section" protected by this variable, you can
> have problems, because this sequence:
>
> locked:=true;
> local:=shared_global_var;
>
> may actually be executed in this order:
>
> local:=shared_global_var;
> locked:=true;

Thanks btw... Yes, I didn't know that fact until you posted a link in
another thread.  I had a curios problem with pointers and a link list
that a manager thread managed but another server thread had access to
read/write.  Once in a while the order of ops would change and cause
read access violations.

> So you can get speculative reads into the "critical section"
>
> 2) when exiting the "critical section", there are no problems, because none
> of the loads or stores before the one that sets the boolean "lock" variable
> to false, can be moved past that store.

Into meaning inside or outside the section?  I was under the
assumption that inside the section - ops were thread safe from reads.
But multi-core systems - I'd bet that order can be executed
differently.

> In summary, the fact that a particular program runs fine on your particular
> machine does not mean anything:
> a) your particular machine may not perform any kind of reordering that
> results in problems
> b) your particular program may not expose any kind of reordering that
> results in problems

After reading the wikipedia article, and AMD's engineer's blog
postings with suggested code, and considering I'm exclusively using
AMD cpus, I would say this is true.  Problems could certainly prove
difficult to resolve in cases involving for worker objects waiting for
other workers to solve (recursion or so) acting as a logic gate -
potentially a serious issue.

> That does not mean that automatically the program "can be used without
> memory barriers". It is virtually impossible to prove correctness of
> multi-threaded code running on multi-cores through testing, and it is
> literally impossible to prove it for all possible machines by testing on a
> single machine (even if that machine has 4096 cores and runs 16000 threads),
> simply because other machines may use different memory consistency models.

After reading up I would say that only under certain circumstances
memory barriers can be avoided by engineering via thread isolation
(see my commands in uThreads.pas) and limited access (indexed boolean
arrays in uThreads.pas with no order necessary due to polling); and a
good understanding of challenges is required when coding
multi-threaded apps for multi-core systems.  Sometimes memory barriers
aren't even needed or germane to a particular aspect of an application
feature or functionality.   Knowing which aspects go with what method
is what makes for stable and fast applications.  Lastly, because the
polling concept was already established, I would say that order of
execution with regard to the architecture set forth in my test case,
proves just that.  Polling for all true or false, does not require
concern for pre-emptive positives, or false positives.  As designed it
proves true IFF all threads are complete.  And these facts will remain
true on all cpus.

Thanks for the info, help, and discussion.
_______________________________________________
fpc-pascal maillist  -  [hidden email]
http://lists.freepascal.org/mailman/listinfo/fpc-pascal
Reply | Threaded
Open this post in threaded view
|

Re: Multi-threaded project with few locks (no Thread.waitfor). Memory consumption keeps increasing on Ubuntu 10.10 x64

Andrew Brunner
In reply to this post by Michael Van Canneyt
On Wed, Oct 13, 2010 at 8:28 AM, Michael Van Canneyt
<[hidden email]> wrote:
> Probably because it uses a heap manager per thread.
>
> You may try to use 'cmem', which will replace the heap manager with the C
> memory manager (one for the whole app, not per thread). That will allow you
> to test this hypothesis.

Ok.  Trying to speed up the creating of threads I have enabled cmem by
putting it as the first unit in the project.  cmem,cthreads, are the
first two units to be exact.

looking at the status of the process using the system monitor it shows
me that process switches from ptrace_stop to futex_wait_queue_me
(cycling).  None of my CPUs look to be all consumed so this is clearly
an inefficiency with the way threads are created and added internally
to the fpc rtl.  I tried upping the nice level to -15 and it does
speed up a little, it still takes minutes to create anything over
1,500 threads. Futexs can be avoided with interlocked calls to assign
pointer values. I really hate to see the application thread slowed
down to a crawl just because of memory or class creation.

Please (anyone is welcome) help me speed fpc thread creation up.  I
might be able to suggest improvements to the threading system.  Anyone
intimately know FPC and it's linux guts w/r/t/ TThread that is willing
to take a look at this with me?
_______________________________________________
fpc-pascal maillist  -  [hidden email]
http://lists.freepascal.org/mailman/listinfo/fpc-pascal
Reply | Threaded
Open this post in threaded view
|

Re: Multi-threaded project with few locks (no Thread.waitfor). Memory consumption keeps increasing on Ubuntu 10.10 x64

Michael Van Canneyt


On Wed, 13 Oct 2010, Andrew Brunner wrote:

> On Wed, Oct 13, 2010 at 8:28 AM, Michael Van Canneyt
> <[hidden email]> wrote:
>> Probably because it uses a heap manager per thread.
>>
>> You may try to use 'cmem', which will replace the heap manager with the C
>> memory manager (one for the whole app, not per thread). That will allow you
>> to test this hypothesis.
>
> Ok.  Trying to speed up the creating of threads I have enabled cmem by
> putting it as the first unit in the project.  cmem,cthreads, are the
> first two units to be exact.
>
> looking at the status of the process using the system monitor it shows
> me that process switches from ptrace_stop to futex_wait_queue_me
> (cycling).  None of my CPUs look to be all consumed so this is clearly
> an inefficiency with the way threads are created and added internally
> to the fpc rtl.

FPC doesn't have anything to say about CPU allocation.
The threads are created by the C pthread library and Linux kernel.
They do the heavy work.

> I tried upping the nice level to -15 and it does
> speed up a little, it still takes minutes to create anything over
> 1,500 threads. Futexs can be avoided with interlocked calls to assign
> pointer values. I really hate to see the application thread slowed
> down to a crawl just because of memory or class creation.
>
> Please (anyone is welcome) help me speed fpc thread creation up.  I
> might be able to suggest improvements to the threading system.  Anyone
> intimately know FPC and it's linux guts w/r/t/ TThread that is willing
> to take a look at this with me?

Why do you think I answered you ? :-)

Michael.
_______________________________________________
fpc-pascal maillist  -  [hidden email]
http://lists.freepascal.org/mailman/listinfo/fpc-pascal
Reply | Threaded
Open this post in threaded view
|

Re: Multi-threaded project with few locks (no Thread.waitfor). Memory consumption keeps increasing on Ubuntu 10.10 x64

Andrew Brunner
On Wed, Oct 13, 2010 at 2:12 PM, Michael Van Canneyt
<[hidden email]> wrote:

> FPC doesn't have anything to say about CPU allocation. The threads are
> created by the C pthread library and Linux kernel.
> They do the heavy work.

Is it possible I have the pthread library in some sort of debug mode
that is slowing down the process of thread creation?

What is needed to get directly to kernel api to /queue
events/create/destroy/suspend/resume threads?

Are there any alternatives to the cthreads unit?

Thank for the help Michael.
_______________________________________________
fpc-pascal maillist  -  [hidden email]
http://lists.freepascal.org/mailman/listinfo/fpc-pascal
Reply | Threaded
Open this post in threaded view
|

Re: Multi-threaded project with few locks (no Thread.waitfor). Memory consumption keeps increasing on Ubuntu 10.10 x64

Michael Van Canneyt


On Wed, 13 Oct 2010, Andrew Brunner wrote:

> On Wed, Oct 13, 2010 at 2:12 PM, Michael Van Canneyt
> <[hidden email]> wrote:
>
>> FPC doesn't have anything to say about CPU allocation. The threads are
>> created by the C pthread library and Linux kernel.
>> They do the heavy work.
>
> Is it possible I have the pthread library in some sort of debug mode
> that is slowing down the process of thread creation?

I seriously doubt it.
What you could do to test, is write your program using direct Pthread
calls. This way we can ascertain whether it is the FPC or Pthread code
which is the bottleneck.

If need be, I can dig up my texts for the Kylix book, it describes how
to do this in Object Pascal. I suspect the code would still be pretty
much the same.

>
> What is needed to get directly to kernel api to /queue
> events/create/destroy/suspend/resume threads?
>
> Are there any alternatives to the cthreads unit?

Yes, write an object pascal version of it, which accesses the kernel
directly and bypasses the C library.

It has been on many people's wish list for ages.

Michael.
_______________________________________________
fpc-pascal maillist  -  [hidden email]
http://lists.freepascal.org/mailman/listinfo/fpc-pascal
Reply | Threaded
Open this post in threaded view
|

Re: Multi-threaded project with few locks (no Thread.waitfor). Memory consumption keeps increasing on Ubuntu 10.10 x64

Andrew Brunner
On Wed, Oct 13, 2010 at 3:24 PM, Michael Van Canneyt
<[hidden email]> wrote:

>> Is it possible I have the pthread library in some sort of debug mode
>> that is slowing down the process of thread creation?
>
> I seriously doubt it.
> What you could do to test, is write your program using direct Pthread calls.
> This way we can ascertain whether it is the FPC or Pthread code which is the
> bottleneck.

Right.  I'm going to do more reading on the POSIX threading system.  I
did get to trace into the threading unit under linux and the first
thing I noticed was that a mutex was used to suspend and create the
thread instead of using the ThreadManager.Suspend and Resume features.
 My local copy has removed the semaphore and I instantly noticed a
speed increase in thread creation due to the lack of the extra
semaphore per thread.

Based on what I see as far as performance goes, the current version of
threading under Unix takes 2 semaphores per thread. One by use in the
threads.inc file and at least one by the pthreads.so !

> If need be, I can dig up my texts for the Kylix book, it describes how to do
> this in Object Pascal. I suspect the code would still be pretty much the
> same.

I would say let's try to obtain source to pthreads or something.  I'd
bet we can just do a straight shoot into something from there.  If
it's open source i'd bet we can bother them perhaps for a newer
version more high-scale friendly.

> Yes, write an object pascal version of it, which accesses the kernel
> directly and bypasses the C library.

That's exactly what I'm thinking.  There are only like 36 methods to
implement.  Depending on how hard will be to hook into the kernel...
The only thing is we're going not going to have diversity as found in
the pthreads.so.  I'd bet they have tons of code for Darwin, Debian,
Redhat, etc.  I guess it's unknown at this point but well worth the
time to explore.

> It has been on many people's wish list for ages.

I presently can create 1,500 threads in 2 min 27seconds. That's 2 1/2
threads per second... 25min! Pathetic...
_______________________________________________
fpc-pascal maillist  -  [hidden email]
http://lists.freepascal.org/mailman/listinfo/fpc-pascal
Reply | Threaded
Open this post in threaded view
|

Re: Multi-threaded project with few locks (no Thread.waitfor). Memory consumption keeps increasing on Ubuntu 10.10 x64

Sven Barth-2
Am 14.10.2010 04:15, schrieb Andrew Brunner:

> On Wed, Oct 13, 2010 at 3:24 PM, Michael Van Canneyt
> <[hidden email]>  wrote:
>
>>> Is it possible I have the pthread library in some sort of debug mode
>>> that is slowing down the process of thread creation?
>>
>> I seriously doubt it.
>> What you could do to test, is write your program using direct Pthread calls.
>> This way we can ascertain whether it is the FPC or Pthread code which is the
>> bottleneck.
>
> Right.  I'm going to do more reading on the POSIX threading system.  I
> did get to trace into the threading unit under linux and the first
> thing I noticed was that a mutex was used to suspend and create the
> thread instead of using the ThreadManager.Suspend and Resume features.
>   My local copy has removed the semaphore and I instantly noticed a
> speed increase in thread creation due to the lack of the extra
> semaphore per thread.
>
> Based on what I see as far as performance goes, the current version of
> threading under Unix takes 2 semaphores per thread. One by use in the
> threads.inc file and at least one by the pthreads.so !
>

It could be that "RTL initialization" of the thread slows down as well
(just a possibility). You might want to disable the calls to
"InitThread" and "DoneThread" in "ThreadMain" inside in
rtl/unix/cthreads.pp. But be careful: now you must not use Pascal's I/O,
Heap and Exceptions inside your thread function, cause they aren't
initialized now (use direct syscalls like fpwrite and allocate the
memory manually).

>> If need be, I can dig up my texts for the Kylix book, it describes how to do
>> this in Object Pascal. I suspect the code would still be pretty much the
>> same.
>
> I would say let's try to obtain source to pthreads or something.  I'd
> bet we can just do a straight shoot into something from there.  If
> it's open source i'd bet we can bother them perhaps for a newer
> version more high-scale friendly.
>
>> Yes, write an object pascal version of it, which accesses the kernel
>> directly and bypasses the C library.
>
> That's exactly what I'm thinking.  There are only like 36 methods to
> implement.  Depending on how hard will be to hook into the kernel...
> The only thing is we're going not going to have diversity as found in
> the pthreads.so.  I'd bet they have tons of code for Darwin, Debian,
> Redhat, etc.  I guess it's unknown at this point but well worth the
> time to explore.

The problem with an own version of pthreads is that those threads will
be limited to Pascal code only, cause other (C based) libraries will
still use pthreads.

An interesting lecture on this topic is in the wiki of Wine, cause they
had to implement their own threading implementation as well (the old
pthreads library wasn't capable enough for the needs of the WinAPI).
eYou can find the article here:
http://www.winehq.org/docs/winedev-guide/threading
(the interesting paragraph is "POSIX threading vs. kernel threading").

Regards,
Sven
_______________________________________________
fpc-pascal maillist  -  [hidden email]
http://lists.freepascal.org/mailman/listinfo/fpc-pascal
Reply | Threaded
Open this post in threaded view
|

Re: Multi-threaded project with few locks (no Thread.waitfor). Memory consumption keeps increasing on Ubuntu 10.10 x64

Michael Van Canneyt
In reply to this post by Andrew Brunner


On Wed, 13 Oct 2010, Andrew Brunner wrote:

> On Wed, Oct 13, 2010 at 3:24 PM, Michael Van Canneyt
> <[hidden email]> wrote:
>
>>> Is it possible I have the pthread library in some sort of debug mode
>>> that is slowing down the process of thread creation?
>>
>> I seriously doubt it.
>> What you could do to test, is write your program using direct Pthread calls.
>> This way we can ascertain whether it is the FPC or Pthread code which is the
>> bottleneck.
>
> Right.  I'm going to do more reading on the POSIX threading system.  I
> did get to trace into the threading unit under linux and the first
> thing I noticed was that a mutex was used to suspend and create the
> thread instead of using the ThreadManager.Suspend and Resume features.
> My local copy has removed the semaphore and I instantly noticed a
> speed increase in thread creation due to the lack of the extra
> semaphore per thread.
>
> Based on what I see as far as performance goes, the current version of
> threading under Unix takes 2 semaphores per thread. One by use in the
> threads.inc file and at least one by the pthreads.so !
>
>> If need be, I can dig up my texts for the Kylix book, it describes how to do
>> this in Object Pascal. I suspect the code would still be pretty much the
>> same.
>
> I would say let's try to obtain source to pthreads or something.  I'd
> bet we can just do a straight shoot into something from there.  If
> it's open source i'd bet we can bother them perhaps for a newer
> version more high-scale friendly.
>
>> Yes, write an object pascal version of it, which accesses the kernel
>> directly and bypasses the C library.
>
> That's exactly what I'm thinking.  There are only like 36 methods to
> implement.  Depending on how hard will be to hook into the kernel...

The kernel thing is easy, just a vfork() call, if I'm correct.
It's all the rest that is difficult: synchronization, mutex,
semaphores :)

> The only thing is we're going not going to have diversity as found in
> the pthreads.so.  I'd bet they have tons of code for Darwin, Debian,
> Redhat, etc.  I guess it's unknown at this point but well worth the
> time to explore.

I don't think the various Linux distros work different in this respect.
They all use pthreads now, and they all use the same kernel interface.

>> It has been on many people's wish list for ages.
>
> I presently can create 1,500 threads in 2 min 27seconds. That's 2 1/2
> threads per second... 25min! Pathetic...

Remains the question why anyone would want to create 1500 threads ?
Any number of threads above the number of CPUs is no longer efficient
anyway (give or take some corner cases with lots of I/O).

Michael.
_______________________________________________
fpc-pascal maillist  -  [hidden email]
http://lists.freepascal.org/mailman/listinfo/fpc-pascal
Reply | Threaded
Open this post in threaded view
|

Re: Multi-threaded project with few locks (no Thread.waitfor). Memory consumption keeps increasing on Ubuntu 10.10 x64

Sven Barth-2
Am 14.10.2010 09:28, schrieb Michael Van Canneyt:
>>> Yes, write an object pascal version of it, which accesses the kernel
>>> directly and bypasses the C library.
>>
>> That's exactly what I'm thinking. There are only like 36 methods to
>> implement. Depending on how hard will be to hook into the kernel...
>
> The kernel thing is easy, just a vfork() call, if I'm correct.
> It's all the rest that is difficult: synchronization, mutex, semaphores :)
>

I don't know about other Unix variants, but for Linux one should use
"clone" to create a thread.

See also this example by - I believe - Linus himself:
http://www.ibiblio.org/pub/Linux/docs/faqs/Threads-FAQ/html/clone.c

Regards,
Sven
_______________________________________________
fpc-pascal maillist  -  [hidden email]
http://lists.freepascal.org/mailman/listinfo/fpc-pascal
Reply | Threaded
Open this post in threaded view
|

Re: Multi-threaded project with few locks (no Thread.waitfor). Memory consumption keeps increasing on Ubuntu 10.10 x64

Michael Van Canneyt
In reply to this post by Sven Barth-2


On Thu, 14 Oct 2010, Sven Barth wrote:

> The problem with an own version of pthreads is that those threads will be
> limited to Pascal code only, cause other (C based) libraries will still use
> pthreads.

This is not a problem for pascal-only libs.

>
> An interesting lecture on this topic is in the wiki of Wine, cause they had
> to implement their own threading implementation as well (the old pthreads
> library wasn't capable enough for the needs of the WinAPI). eYou can find the
> article here:
> http://www.winehq.org/docs/winedev-guide/threading
> (the interesting paragraph is "POSIX threading vs. kernel threading").

It makes you wonder how Andrew's programs would behave under wine :-)

Michael.
_______________________________________________
fpc-pascal maillist  -  [hidden email]
http://lists.freepascal.org/mailman/listinfo/fpc-pascal
Reply | Threaded
Open this post in threaded view
|

Re: Multi-threaded project with few locks (no Thread.waitfor). Memory consumption keeps increasing on Ubuntu 10.10 x64

Henry Vermaak
In reply to this post by Andrew Brunner
On 14/10/10 03:15, Andrew Brunner wrote:
>
> I would say let's try to obtain source to pthreads or something.  I'd
> bet we can just do a straight shoot into something from there.  If
> it's open source i'd bet we can bother them perhaps for a newer
> version more high-scale friendly.

NPTL source is in glibc.

> I presently can create 1,500 threads in 2 min 27seconds. That's 2 1/2
> threads per second... 25min! Pathetic...

Ingo Molnar have started and stopped 100,000 threads in less than 2
seconds on a dual P4 machine in some of the early NPTL tests.

Henry
_______________________________________________
fpc-pascal maillist  -  [hidden email]
http://lists.freepascal.org/mailman/listinfo/fpc-pascal
Reply | Threaded
Open this post in threaded view
|

Re: Multi-threaded project with few locks (no Thread.waitfor). Memory consumption keeps increasing on Ubuntu 10.10 x64

Sven Barth-2
In reply to this post by Michael Van Canneyt
Am 14.10.2010 09:35, schrieb Michael Van Canneyt:

>
>
> On Thu, 14 Oct 2010, Sven Barth wrote:
>
>> The problem with an own version of pthreads is that those threads will
>> be limited to Pascal code only, cause other (C based) libraries will
>> still use pthreads.
>
> This is not a problem for pascal-only libs.
>

As long as you don't need to use non-pascal libraries, you are correct.
But according to the Wine article it is already a problem if you use
libc, because it only enables its thread safety routines if pthreads is
detected... :(

>>
>> An interesting lecture on this topic is in the wiki of Wine, cause
>> they had to implement their own threading implementation as well (the
>> old pthreads library wasn't capable enough for the needs of the
>> WinAPI). eYou can find the article here:
>> http://www.winehq.org/docs/winedev-guide/threading
>> (the interesting paragraph is "POSIX threading vs. kernel threading").
>
> It makes you wonder how Andrew's programs would behave under wine :-)

Indeed ^^

Regards,
Sven
_______________________________________________
fpc-pascal maillist  -  [hidden email]
http://lists.freepascal.org/mailman/listinfo/fpc-pascal
Reply | Threaded
Open this post in threaded view
|

Re: Multi-threaded project with few locks (no Thread.waitfor). Memory consumption keeps increasing on Ubuntu 10.10 x64

Michael Van Canneyt
In reply to this post by Henry Vermaak


On Thu, 14 Oct 2010, Henry Vermaak wrote:

> On 14/10/10 03:15, Andrew Brunner wrote:
>>
>> I would say let's try to obtain source to pthreads or something.  I'd
>> bet we can just do a straight shoot into something from there.  If
>> it's open source i'd bet we can bother them perhaps for a newer
>> version more high-scale friendly.
>
> NPTL source is in glibc.
>
>> I presently can create 1,500 threads in 2 min 27seconds. That's 2 1/2
>> threads per second... 25min! Pathetic...
>
> Ingo Molnar have started and stopped 100,000 threads in less than 2 seconds
> on a dual P4 machine in some of the early NPTL tests.

One after the other ? That is not a meaningful test in this case ?
You should know how many threads existed in parallel.

Michael.
_______________________________________________
fpc-pascal maillist  -  [hidden email]
http://lists.freepascal.org/mailman/listinfo/fpc-pascal
12345