|
12
|
Hi, reading the recent discussion about lazarus memory consumption, I thought about collecting some statistics. The idea is similar to some disk utilities that collects and sort the sizes of directories. You know when every folder on the computer is scanned and all the resulting paths are sorted by the summed size. Such utilities usually help to find space on the hard drive to free.
With memory for every memory allocation we can call the path the addresses of procedures in stack. So when a function funci() is called from the function parentfunc() and getmem is called inside func() then we should add this size to the corresponding entries both for func() and parentfunc(). Sure for every freemem we should decrease from every entry related to the previous getmem (that might be a challenge since freemem not necessary follows the same calling pattern). Anyway, if this system is implemented, in any moment during the program run, the developer can query N most memory hungry procedures to the moment. And this will include non only the procedures that call getmem,new etc, but also higher level ones giving better hints for further optimization.
Do tools(units) like this already exist? If not, is developing such unit technically possible/hard with the currently available debug information? Thanks, Max
_______________________________________________
fpc-pascal maillist - [hidden email]
http://lists.freepascal.org/mailman/listinfo/fpc-pascal
|
|
In our previous episode, Max Vlasov said:
> about collecting some statistics.
>
> The idea is similar to some disk utilities that collects and sort the sizes
> of directories. You know when every folder on the computer is scanned and
> all the resulting paths are sorted by the summed size. Such utilities
> usually help to find space on the hard drive to free.
>
> With memory for every memory allocation we can call the path the addresses
> of procedures in stack. So when a function funci() is called from the
> function parentfunc() and getmem is called inside func() then we should add
> this size to the corresponding entries both for func() and parentfunc().
> Sure for every freemem we should decrease from every entry related to the
> previous getmem (that might be a challenge since freemem not necessary
> follows the same calling pattern). Anyway, if this system is implemented, in
> any moment during the program run, the developer can query N most memory
> hungry procedures to the moment. And this will include non only the
> procedures that call getmem,new etc, but also higher level ones giving
> better hints for further optimization.
>
> Do tools(units) like this already exist? If not, is developing such unit
> technically possible/hard with the currently available debug information?
This is basically what valgrind (or fulldebugmode of fastmm) does. But they
do this by parsing the stack on each call to the memory manager, and then
keep track of it.
Note that all these techniques can be very, very slowing. E.g. I tried to debug
the CHM support with valgrind, and I terminated the valgrind process after 5
hours because it was not even half way to where the bug was.
Without valgrind the program reached the point in 1-2 minutes.....
_______________________________________________
fpc-pascal maillist - [hidden email]
http://lists.freepascal.org/mailman/listinfo/fpc-pascal
|
|
> In our previous episode, Max Vlasov said:
> > about collecting some statistics.
> >
> > The idea is similar to some disk utilities that collects
> and sort the
> > sizes of directories. You know when every folder on the computer is
> > scanned and all the resulting paths are sorted by the summed size.
> > Such utilities usually help to find space on the hard drive to free.
> >
> > With memory for every memory allocation we can call the path the
> > addresses of procedures in stack. So when a function
> funci() is called
> > from the function parentfunc() and getmem is called inside
> func() then
> > we should add this size to the corresponding entries both
> for func()
> > and parentfunc(). Sure for every freemem we should decrease
> from every
> > entry related to the previous getmem (that might be a
> challenge since
> > freemem not necessary follows the same calling pattern). Anyway, if
> > this system is implemented, in any moment during the
> program run, the
> > developer can query N most memory hungry procedures to the
> moment. And
> > this will include non only the procedures that call
> getmem,new etc,
> > but also higher level ones giving better hints for further
> > optimization.
> >
> > Do tools(units) like this already exist? If not, is developing such
> > unit technically possible/hard with the currently available debug
> > information?
>
> This is basically what valgrind (or fulldebugmode of fastmm)
> does. But they do this by parsing the stack on each call to
> the memory manager, and then keep track of it.
>
> Note that all these techniques can be very, very slowing.
> E.g. I tried to debug the CHM support with valgrind, and I
> terminated the valgrind process after 5 hours because it was
> not even half way to where the bug was.
>
> Without valgrind the program reached the point in 1-2 minutes.....
>
The valgrind massif tool (valgrind --tool=massif) does the same but reports
on a sampling basis. It'll look at the heap used and report who allocated
what on a cumulative basis. It won't trace every call nor report leaks but
it allows you to track the big users quite well. I'm currently tracking the
memory "eaters" in lazarus with it. It slows down the program but with a
factor 5 to 10 which makes it more practical than valgrid itself.
Ludo
_______________________________________________
fpc-pascal maillist - [hidden email]
http://lists.freepascal.org/mailman/listinfo/fpc-pascal
|
|
On Mon, Aug 15, 2011 at 1:03 PM, Marco van de Voort <[hidden email]> wrote:
In our previous episode, Max Vlasov said:
> about collecting some statistics.
>
> The idea is similar to some disk utilities that collects and sort the sizes
> of directories. You know when every folder on the computer is scanned and
> all the resulting paths are sorted by the summed size. Such utilities
> usually help to find space on the hard drive to free.
>
....
This is basically what valgrind (or fulldebugmode of fastmm) does. But they
do this by parsing the stack on each call to the memory manager, and then
keep track of it.
Note that all these techniques can be very, very slowing. E.g. I tried to debug
the CHM support with valgrind, and I terminated the valgrind process after 5
hours because it was not even half way to where the bug was.
Without valgrind the program reached the point in 1-2 minutes.....
Thanks, I suspected that it's already implemented and that it should be very slow :) But looking at your numbers, if one implements something and finds a way to decrease this 150x factor (300 min/2min=150), the tool will be more useful
On Mon, Aug 15, 2011 at 1:17 PM, Ludo Brands <[hidden email]> wrote:
The valgrind massif tool (valgrind --tool=massif) does the same but reports
on a sampling basis. It'll look at the heap used and report who allocated
what on a cumulative basis. It won't trace every call nor report leaks but
it allows you to track the big users quite well. I'm currently tracking the
memory "eaters" in lazarus with it. It slows down the program but with a
factor 5 to 10 which makes it more practical than valgrid itself.
if there's a common ancestor allocating something (let's call it NewBlock() and it uses GetMem) and I have many descendants using this method, is there a chance to find the "eater" amongst these descendants or massif tool will drop all the stats to NewBlock()?
Max
_______________________________________________
fpc-pascal maillist - [hidden email]
http://lists.freepascal.org/mailman/listinfo/fpc-pascal
|
|
Message
if there's a common ancestor
allocating something (let's call it NewBlock() and it uses GetMem) and I have
many descendants using this method, is there a chance to find the "eater"
amongst these descendants or massif tool will drop all the stats to
NewBlock()?
Max
Here
is part of the massif output from lazarus memory usage that shows what
you are looking for. The ms_print tool that comes with valgrind does the actual formatting.
massif and ms_print both have a threshold value which is the percentage below
which they don't report details. This is very helpfull in quickly finding
problem areas.
100.00% (288,964,608B) (page allocation syscalls) mmap/mremap/brk,
--alloc-fns, etc. ->37.20% (107,483,136B) 0x80C7DD3:
SYSTEM_FPSYSCALL$LONGINT$LONGINT$$LONGINT (in /home/ludo/lazarus/lazarus) |
->37.20% (107,483,136B) 0x80C86DF:
SYSTEM_FPMMAP$POINTER$LONGWORD$LONGINT$LONGINT$LONGINT$INT64$$POINTER (in
/home/ludo/lazarus/lazarus) | ->37.20% (107,483,136B)
0x80DE2EE: SYSTEM_SYSOSALLOC$LONGWORD$$POINTER (in
/home/ludo/lazarus/lazarus) | ->13.06%
(37,748,736B) 0x80DEFB7:
SYSTEM_ALLOC_OSCHUNK$PFREELISTS$LONGWORD$LONGWORD$$POINTER (in
/home/ludo/lazarus/lazarus) | | ->13.06%
(37,748,736B) 0x80DF396: SYSTEM_SYSGETMEM_VAR$LONGWORD$$POINTER (in
/home/ludo/lazarus/lazarus) | |
->10.52% (30,408,704B) 0x8382C3A:
LCONVENCODING_ISO_8859_1TOUTF8$ANSISTRING$$ANSISTRING
(lconvencoding.pas:4619) | | |
->10.52% (30,408,704B) 0x8389C25:
LCONVENCODING_CONVERTENCODING$ANSISTRING$ANSISTRING$ANSISTRING$$ANSISTRING
(lconvencoding.pas:6389) | |
| ->10.52% (30,408,704B) 0x81593E7:
MAIN_TMAINIDE_$__ONCODEBUFFERDECODELOADED$TCODEBUFFER$ANSISTRING$ANSISTRING$ANSISTRING$ANSISTRING
(main.pp:14590) | |
| ->10.52% (30,408,704B) 0x8368709:
CODECACHE_TCODECACHE_$__DECODELOADED$TCODEBUFFER$ANSISTRING$ANSISTRING$ANSISTRING$ANSISTRING
(codecache.pas:784) | |
| ->10.52% (30,408,704B) 0x8369C80:
CODECACHE_TCODEBUFFER_$__DECODELOADED$ANSISTRING$ANSISTRING$ANSISTRING$ANSISTRING
(codecache.pas:1153) | |
| ->10.52% (30,408,704B)
0x836D3D9: SOURCELOG_TSOURCELOG_$__LOADFROMFILE$ANSISTRING$$BOOLEAN
(sourcelog.pas:825) | |
| ->09.07%
(26,214,400B) 0x83697B0:
CODECACHE_TCODEBUFFER_$__LOADFROMFILE$ANSISTRING$$BOOLEAN
(codecache.pas:1045) | |
| | ->09.07%
(26,214,400B) 0x83671DF:
CODECACHE_TCODECACHE_$__LOADFILE$ANSISTRING$$TCODEBUFFER
(codecache.pas:347) | |
| |
->09.07% (26,214,400B) 0x83A63E1:
CODETOOLMANAGER_TCODETOOLMANAGER_$__LOADFILE$ANSISTRING$BOOLEAN$BOOLEAN$$TCODEBUFFER
(codetoolmanager.pas:1109) | |
|
| ->09.07% (26,214,400B) 0x85A0B18:
DIALOGPROCS_LOADCODEBUFFER$TCODEBUFFER$ANSISTRING$TLOADBUFFERFLAGS$BOOLEAN$$TMODALRESULT
(dialogprocs.pas:224) | |
|
| ->09.07% (26,214,400B) 0x813E5B7:
MAIN_TMAINIDE_$__DOOPENUNKNOWNFILE$ANSISTRING$TOPENFLAGS$TUNITINFO$BOOLEAN$$TMODALRESULT
(main.pp:6395) | |
|
| ->09.07% (26,214,400B)
0x8147F58: MAIN_TMAINIDE_$__DOOPENEDITORFILE$crc4B94F828
(main.pp:9380) | |
|
| ->09.07%
(26,214,400B) 0x8147565:
MAIN_TMAINIDE_$__DOOPENEDITORFILE$ANSISTRING$LONGINT$LONGINT$TOPENFLAGS$$TMODALRESULT
(main.pp:9123) | |
|
|
->09.07% (26,214,400B) 0x812EBE3: MAIN_TMAINIDE_$__SETUPSTARTPROJECT
(main.pp:2253) | |
|
|
->09.07% (26,214,400B) 0x812B951: MAIN_TMAINIDE_$__STARTIDE
(main.pp:1456) | |
|
|
->09.07% (26,214,400B) 0x80C7AFF: main
(lazarus.pp:113) | |
|
|
| |
| ->01.45%
(4,194,304B) 0x83698A2: CODECACHE_TCODEBUFFER_$__REVERT$$BOOLEAN
(codecache.pas:1081) | |
|
->01.45% (4,194,304B) 0x83A63F6:
CODETOOLMANAGER_TCODETOOLMANAGER_$__LOADFILE$ANSISTRING$BOOLEAN$BOOLEAN$$TCODEBUFFER
(codetoolmanager.pas:1112) | |
|
->01.45% (4,194,304B) 0x85A0B18:
DIALOGPROCS_LOADCODEBUFFER$TCODEBUFFER$ANSISTRING$TLOADBUFFERFLAGS$BOOLEAN$$TMODALRESULT
(dialogprocs.pas:224) | |
|
->01.45% (4,194,304B) 0x813E5B7:
MAIN_TMAINIDE_$__DOOPENUNKNOWNFILE$ANSISTRING$TOPENFLAGS$TUNITINFO$BOOLEAN$$TMODALRESULT
(main.pp:6395) | |
|
->01.45% (4,194,304B) 0x8147F58:
MAIN_TMAINIDE_$__DOOPENEDITORFILE$crc4B94F828
(main.pp:9380) | |
|
->01.45% (4,194,304B) 0x8147565:
MAIN_TMAINIDE_$__DOOPENEDITORFILE$ANSISTRING$LONGINT$LONGINT$TOPENFLAGS$$TMODALRESULT
(main.pp:9123) | |
|
->01.45% (4,194,304B) 0x812EBE3: MAIN_TMAINIDE_$__SETUPSTARTPROJECT
(main.pp:2253) | |
|
->01.45% (4,194,304B) 0x812B951: MAIN_TMAINIDE_$__STARTIDE
(main.pp:1456) | |
|
->01.45% (4,194,304B) 0x80C7AFF: main
(lazarus.pp:113) | |
|
Ludo
_______________________________________________
fpc-pascal maillist - [hidden email]
http://lists.freepascal.org/mailman/listinfo/fpc-pascal
|
|
On 15 Aug 2011, at 12:03, Ludo Brands wrote:
> 100.00% (288,964,608B) (page allocation syscalls) mmap/mremap/brk,
> --alloc-fns, etc.
> ->37.20% (107,483,136B) 0x80C7DD3: SYSTEM_FPSYSCALL$LONGINT$LONGINT$$LONGINT
> (in /home/ludo/lazarus/lazarus)
> | ->37.20% (107,483,136B) 0x80C86DF:
> SYSTEM_FPMMAP$POINTER$LONGWORD$LONGINT$LONGINT$LONGINT$INT64$$POINTER (in
> /home/ludo/lazarus/lazarus)
> | ->37.20% (107,483,136B) 0x80DE2EE: SYSTEM_SYSOSALLOC$LONGWORD$$POINTER
> (in /home/ludo/lazarus/lazarus)
Note that if you want to see what part of the memory allocation is due to the program and what part due to internal fragmentation in the memory manager, you should compile with -gv (or add the "cmem" unit as the first unit to the uses-clause of Lazarus). Then the program will use libc's malloc/free/realloc/..., and Valgrind will be able to track every allocation/deallocation individually.
Jonas_______________________________________________
fpc-pascal maillist - [hidden email]
http://lists.freepascal.org/mailman/listinfo/fpc-pascal
|
|
On Mon, Aug 15, 2011 at 2:03 PM, Ludo Brands <[hidden email]> wrote:
if there's a common ancestor
allocating something (let's call it NewBlock() and it uses GetMem) and I have
many descendants using this method, is there a chance to find the "eater"
amongst these descendants or massif tool will drop all the stats to
NewBlock()?
Max
Here
is part of the massif output from lazarus memory usage that shows what
you are looking for. The ms_print tool that comes with valgrind does the actual formatting.
massif and ms_print both have a threshold value which is the percentage below
which they don't report details. This is very helpfull in quickly finding
problem areas.
Thank, Ludo, I think the other questions (if they appear) will be asked when I really used it on a real project
Max
_______________________________________________
fpc-pascal maillist - [hidden email]
http://lists.freepascal.org/mailman/listinfo/fpc-pascal
|
|
> On 15 Aug 2011, at 12:03, Ludo Brands wrote:
>
> > 100.00% (288,964,608B) (page allocation syscalls) mmap/mremap/brk,
> > --alloc-fns, etc.
> > ->37.20% (107,483,136B) 0x80C7DD3:
> > ->SYSTEM_FPSYSCALL$LONGINT$LONGINT$$LONGINT
> > (in /home/ludo/lazarus/lazarus)
> > | ->37.20% (107,483,136B) 0x80C86DF:
> >
> SYSTEM_FPMMAP$POINTER$LONGWORD$LONGINT$LONGINT$LONGINT$INT64$$POINTER
> > (in
> > /home/ludo/lazarus/lazarus)
> > | ->37.20% (107,483,136B) 0x80DE2EE:
> > | SYSTEM_SYSOSALLOC$LONGWORD$$POINTER
> > (in /home/ludo/lazarus/lazarus)
>
> Note that if you want to see what part of the memory
> allocation is due to the program and what part due to
> internal fragmentation in the memory manager, you should
> compile with -gv (or add the "cmem" unit as the first unit to
> the uses-clause of Lazarus). Then the program will use libc's
> malloc/free/realloc/..., and Valgrind will be able to track
> every allocation/deallocation individually.
>
>
Yes, that is what I'm doing lately. This is an older trace I took to
illustrate to Max how the memory allocated by "custom" memory allocators
such as "SYSTEM_ALLOC_OSCHUNK" are split up by the massif tool.
Ludo
_______________________________________________
fpc-pascal maillist - [hidden email]
http://lists.freepascal.org/mailman/listinfo/fpc-pascal
|
|
On Mon, Aug 15, 2011 at 1:03 PM, Marco van de Voort <[hidden email]> wrote:
In our previous episode, Max Vlasov said:
> about collecting some statistics.
>
> The idea is similar to some disk utilities that collects and sort the sizes
> of directories. You know when every folder on the computer is scanned and
> all the resulting paths are sorted by the summed size. Such utilities
> usually help to find space on the hard drive to free.
>
> With memory for every memory allocation we can call the path the addresses
> of procedures in stack. So when a function funci() is called from the
> function parentfunc() and getmem is called inside func() then we should add
> this size to the corresponding entries both for func() and parentfunc(). > ...
> Do tools(units) like this already exist? If not, is developing such unit
> technically possible/hard with the currently available debug information?
This is basically what valgrind (or fulldebugmode of fastmm) does. But they
do this by parsing the stack on each call to the memory manager, and then
keep track of it.
Note that all these techniques can be very, very slowing. E.g. I tried to debug
the CHM support with valgrind, and I terminated the valgrind process after 5
hours because it was not even half way to where the bug was.
Without valgrind the program reached the point in 1-2 minutes.....
Based on this discussion and the following research I made an attempt to implement something like this for fpc/lazarus. The final result are a couple of units and a dialog that allows to see the results in any time inside the program. You can see the real example in the screenshot:
http://www.maxerist.net/downloads/procmemstat_ss.pngThe download link for pascal sources: http://www.maxerist.net/downloads/procmemstat.zip (5k)
The module traditionally installs its memory manager replacement procs and collects statistics. To make things faster on this step it just detects and saves addresses on the stack that falls into code segment range of the main module. When a request for actual dialog is made then it parses all allocated data and resolves the addresses to symbols with GetLineInfo proc. This step is much longer, the dialog in the screenshot took 15 seconds to appear (on my 1.7 GHz Celeron).
Currently the monitor is only win32-compatible, but the only platform-specific code is about getting current stack range (I took it according this information http://en.wikipedia.org/wiki/Win32_Thread_Information_Block) and getting code segment range that is made with toolhelp32 snapshots and virtualquery. Finding a way to do the same on linux will possibly make it compatible with linux also.
The usage: - add uProcMemMon first in the lpr - define -dPROCMEMMON in the project options-other-custom options - add uProcMemDlg to unit (form) from where you want to show the statistics. - call ProcMemShowDialog from any click handler. Sure you can call the dialog as many times as you want looking what's wrong or right with different states of you program
- if you don't want gui then don't use uProcMemDlg, just call PMMCollectStat that will return TStringList with objects as sizes (unsorted) Current limitations: - As I already said, currently it's win32-only
- I assume it's currently not thread-safe because of global variable usage - Several first lines of the dialog are not very useful since they're either getmem related chain or main-related procedures, those are always on stack.
- The speed of collecting is ok, but although it uses linked list so with many allocation it can drop. Resolving to symbols takes some time by default (GetLineInfo is not very good for thousands of queries) so in these areas some further optimization might possible.
- No dynamic loading support. This is due to the fact that the monitor uses the address range of the main module. - Initially I got numbers for winners that were bigger than total
allocated memory. This was because of exception blocks so from the point
view of the parser this getmem were called at once from multiply lines
of the same function. I fixed this by ignoring function duplicate
while parsing the same memory block. It seems it worked, but maybe there
are cases when some new trick will be necessary
If someone finds time to test this approach on real projects, that would be great. As for usefulness the time will show, this time I'd like to know that it at least provide sane results :) Thanks Max Vlasov
_______________________________________________
fpc-pascal maillist - [hidden email]
http://lists.freepascal.org/mailman/listinfo/fpc-pascal
|
|
On Fri, Aug 19, 2011 at 3:26 PM, Ludo Brands <[hidden email]> wrote:
Where can I find units uNtSnapshot, uMem32,
uUniCompat?
Looking at the screenshot, I'm confused. Memory allocated by FreeMem. How
should that be interpreted?
thanks, Ludo
Sorry about that, I uploaded the archive once again with only uNtSnapshot dependency. http://www.maxerist.net/downloads/procmemstat.zip
As for the numbers, they're not supposed to be pieces to be summed. I mentioned utilities for finding space on drive, for example my FindFatFolder ( http://www.maxerist.net/main/soft-for-win/find-fat-folder), every reachable folder (also nodes, not only leafs) is used and calculated, if you want to find the eater, you just look from the top to the bottom and wait for 'bingo'. In other words If you see a big number and the name of the function looks like a strange source of such big number (cumulative) you look deeper. For folders it works, but will it work for function I still don't know :)
Max
_______________________________________________
fpc-pascal maillist - [hidden email]
http://lists.freepascal.org/mailman/listinfo/fpc-pascal
|
|
Message
OK compiles now. I tested this with 2 programs
but when I include uProcMemMon as the
first unit in lpr and define -dPROCMEMMON then they both crash with a sigsegv.
Undefine PROCMEMMON and they run fine. I've attached a small testprogram that
crashes in bufdataset.
As for
the numbers, they're not supposed to be pieces to be summed. I mentioned
utilities for finding space on drive, for example my FindFatFolder ( http://www.maxerist.net/main/soft-for-win/find-fat-folder),
every reachable folder (also nodes, not only leafs) is used and calculated, if
you want to find the eater, you just look from the top to the bottom and wait
for 'bingo'. In other words If you see a big number and the name of the
function looks like a strange source of such big number (cumulative) you look
deeper. For folders it works, but will it work for function I still don't know
:)
I can understand
that the numbers are not cumulative. My point was regarding Freemem allocating
memory. Freemem should free memory, not
allocate.
Ludo
_______________________________________________
fpc-pascal maillist - [hidden email]
http://lists.freepascal.org/mailman/listinfo/fpc-pascal
|
|
Message
OK compiles now. I tested this with 2 programs
but when I include uProcMemMon as the
first unit in lpr and define -dPROCMEMMON then they both crash with a sigsegv.
Undefine PROCMEMMON and they run fine. I've attached a small testprogram that
crashes in bufdataset.
Just a small copy paste problem:
change line 295
Result := OldMemMgr.GetMem(Size);
into
Result := OldMemMgr.AllocMem(Size);
Ludo
_______________________________________________
fpc-pascal maillist - [hidden email]
http://lists.freepascal.org/mailman/listinfo/fpc-pascal
|
|
On Fri, Aug 19, 2011 at 10:36 PM, Ludo Brands <[hidden email]> wrote:
OK compiles now. I tested this with 2 programs
but when I include uProcMemMon as the
first unit in lpr and define -dPROCMEMMON then they both crash with a sigsegv.
Undefine PROCMEMMON and they run fine. I've attached a small testprogram that
crashes in bufdataset.
Just a small copy paste problem:
change line 295
Result := OldMemMgr.GetMem(Size);
into
Result := OldMemMgr.AllocMem(Size);
Ludo
Ludo, thanks, it's a good news and a reason once again to call this type of error to the top of most frequent causes of errors :) Uploaded the zip with the fix. If you meant MMM_Freemem that took about 200k according to the screenshot, I think it can be related to saving dealocated block, i.e. when the memory manager needs new memory to save the locations of freed blocks. Just a guess than possibly needs further investigation.
Max
_______________________________________________
fpc-pascal maillist - [hidden email]
http://lists.freepascal.org/mailman/listinfo/fpc-pascal
|
|
Message
Attached a patch for the following:
-
ported to win64
-
added a Depth parameter to PMMCollectStat (default 15). When the lcl is build
with debug line info, the output from PMMCollectStat is cluthered with LCL
functions such as message loop/handling functions. Depth is the maximum stack
"backtrace" used to create the list.
Ludo
_______________________________________________
fpc-pascal maillist - [hidden email]
http://lists.freepascal.org/mailman/listinfo/fpc-pascal
|
|
On Sun, Aug 21, 2011 at 3:06 PM, Ludo Brands <[hidden email]> wrote:
Attached a patch for the following:
-
ported to win64
-
added a Depth parameter to PMMCollectStat (default 15). When the lcl is build
with debug line info, the output from PMMCollectStat is cluthered with LCL
functions such as message loop/handling functions. Depth is the maximum stack
"backtrace" used to create the list.
These are useful additions, thanks. I suppose the limit automatically solves the problem of first winners whose are not usable in real life.
I tried to use it with Lazarus (with opening 150 files afterwards) and the results were bad in terms of the speed, 30 minutes just to start and a couple hours to work (1.7 Ghz Celeron), obviously something like hash table is needed. Also showing the number of memory requests for particular function overall can be also useful, it could add more sense to some puzzling entries.
As for lazarus results, some entries are still unexplained to me (the first ~300 are in the attachment) [ 19,832,531 ] TMAINIDE__DOOPENEDITORFILE [ 19,801,582 ] TWIN32WIDGETSET__SETFOREGROUNDWINDOW the code inside is only Windows.SetForegroundWindow(HWnd), but maybe windows forces sending message to the tab and this forces the tab (synedit) to make postponed operations. It's the only explanation I have currently
... [ 19,749,554 ] GETWIN32WINDOWINFO ...no explanation [ 19,686,289 ] CLEANANDEXPANDFILENAME ...no explanation [ 14,367,684 ] FILEGETATTRUTF8 ...no explanation Possibly those were false positives i.e. some numbers on the stack that falls into usable range, but actually is something else.
Max
_______________________________________________
fpc-pascal maillist - [hidden email]
http://lists.freepascal.org/mailman/listinfo/fpc-pascal
|
|
Message
These are useful additions, thanks.
I suppose the limit automatically solves the problem of first winners whose
are not usable in real life.
That's
the purpose of the limit. In a lot of gui programs, actions, and thus memory
allocations, are triggered by events (button/menu clicks, callback
functions for all kind of window updates,...). So there are always a lot of
message related functions on the stack. Since the unit records only addresses in
the main program, all OS / graphics related functions aren't shown and
interpreting the stack can be really challenging.
IMO, a
big improvement would be to unwind the stack instead of checking
every value found on the stack. This would avoid false positives caused by
random data falling in the main address range or by passing function addresses
as a parameter. You could then even drop the check against the main program
address range and include dynamically loaded libraries. Downside of
unwinding a stack is that it becomes difficult when stack frames are omitted.
But for debugging purposes, forcing stack frames is always a good thing.
I tried to use it with
Lazarus (with opening 150 files afterwards) and the results were bad in terms
of the speed, 30 minutes just to start and a couple hours to work (1.7 Ghz
Celeron), obviously something like hash table is needed. Also showing the
number of memory requests for particular function overall can be also useful,
it could add more sense to some puzzling entries.
A
hashtable for MonBlocks and AddrList will help a lot. Keeping AddrList and
FuncNames outside PMMCollectStat would also help for subsequent
uses.
The
number of requests is something I'm also missing most in the valgrind massif
tool. You get a detailled stack of the memory allocations but you need to check
the code at all levels to check where there are loops and figure out the count
of the loops. 10M allocated by a function executed 10000 times or 10 times
changes a lot when trying to find memory "eaters".
As for lazarus results,
some entries are still unexplained to me (the first ~300 are in the
attachment)
[ 19,832,531 ] TMAINIDE__DOOPENEDITORFILE [
19,801,582 ] TWIN32WIDGETSET__SETFOREGROUNDWINDOW the code inside is
only Windows.SetForegroundWindow(HWnd), but maybe windows forces sending
message to the tab and this forces the tab (synedit) to make postponed
operations. It's the only explanation I have currently ... [ 19,749,554
] GETWIN32WINDOWINFO ...no explanation [ 19,686,289 ]
CLEANANDEXPANDFILENAME ...no explanation [ 14,367,684 ]
FILEGETATTRUTF8 ...no explanation
Possibly those were false
positives i.e. some numbers on the stack that falls into usable range, but
actually is something else.
Where
these results obtained with or withouth the Depth patch?
Max
_______________________________________________
fpc-pascal maillist - [hidden email]
http://lists.freepascal.org/mailman/listinfo/fpc-pascal
|
|
On Mon, Aug 22, 2011 at 1:30 PM, Ludo Brands <[hidden email]> wrote:
IMO, a
big improvement would be to unwind the stack instead of checking
every value found on the stack. This would avoid false positives caused by
random data falling in the main address range or by passing function addresses
as a parameter. You could then even drop the check against the main program
address range and include dynamically loaded libraries.
Interesting suggestion, can you point to some code maybe in the lazarus that does a similar job. Does GetLineInfo inside do something like this?
I tried to use it with
Lazarus (with opening 150 files afterwards) and the results were bad in terms
of the speed, 30 minutes just to start and a couple hours to work (1.7 Ghz
Celeron), obviously something like hash table is needed. Also showing the
number of memory requests for particular function overall can be also useful,
it could add more sense to some puzzling entries.
Keeping AddrList and
FuncNames outside PMMCollectStat would also help for subsequent
uses.
You're right about FuncNames, I checked for GetLineInfo speed, it's about 100 per second on my machine. The speed is good for lazarus "call stack" window, but for resolving dozens of addresses the caching of the results might become a necessity.
As for lazarus results,
some entries are still unexplained to me (the first ~300 are in the
attachment) ... Possibly those were false
positives i.e. some numbers on the stack that falls into usable range, but
actually is something else.
Where
these results obtained with or withouth the Depth patch?
It was before and looking at how much cpu time spent, I will be probably ready for the next measurement only after the hashing is implemented :). In you question I see a guess, that the fewer the depth, the less likely false positives get into view, right?
Max
_______________________________________________
fpc-pascal maillist - [hidden email]
http://lists.freepascal.org/mailman/listinfo/fpc-pascal
|
|
Message
> Interesting suggestion, can you point to
some code maybe in the lazarus that does a similar job. Does GetLineInfo inside
do something like this?
I'm
currently experimenting with a variation of dump_stack in system.inc. The first try skips a lot of the
functions found by your initial implementation. Still looking into that.
It was before and looking at how much cpu time spent, I will be
probably ready for the next measurement only after the hashing is implemented
:). In you question I see a guess, that the fewer the depth, the less likely
false positives get into view, right?
Right. Statistically, the smaller the stack you look at, the
less falls positives you get.
One reason for
false positives is also that local variables aren't initialised. Procedures
declaring local variables like buf:array [0..1000] of char; can cause a lot of
false positives since the stack space used by the array can contain return
addresses of previously excuted functions. Here the stack unwinding should
help a lot, once it works.
Ludo
_______________________________________________
fpc-pascal maillist - [hidden email]
http://lists.freepascal.org/mailman/listinfo/fpc-pascal
|
|
On Mon, Aug 22, 2011 at 4:08 PM, Ludo Brands < [hidden email]> wrote:
>
>
> Right. Statistically, the smaller the stack you look at, the less falls positives you get.
>
> One reason for false positives is also that local variables aren't initialised. Procedures declaring local variables like buf:array [0..1000] of char; can cause a lot of false positives since the stack space used by the array can contain return addresses of previously excuted functions. Here the stack unwinding should help a lot, once it works.
>
Never thought about this, but looks like local data described by you
is a very natural source of false positives :). Returning to the stack
frames, can we assume stack frames is just a linked list starting the
current ebp and the return address is always the previous entry on the
stack? I could not check this in lazarus cpu window (I could not find
stack view) but in Delphi the rule seems to work. Hmm... except for
short procedures not using local variables at all.On the other side,
if some of the sub-functions are bypassed due to the absence of the
stack frame, it would be better than false positives.
Max
_______________________________________________
fpc-pascal maillist - [hidden email]
http://lists.freepascal.org/mailman/listinfo/fpc-pascal
|
12
|