FPC 2.6.2 throws SEGV in fpc_AnsiStr_Decr_Ref(). How is this possible?

classic Classic list List threaded Threaded
16 messages Options
Reply | Threaded
Open this post in threaded view
|

FPC 2.6.2 throws SEGV in fpc_AnsiStr_Decr_Ref(). How is this possible?

Bruce Tulloch
 After a random but very long period of time (i.e. very many successful calls) I get a SEGV in the built-in function fpc_AnsiStr_Decr_Ref.

GDB reports the argument to fpc_AnsiStr_Decr_Ref (the string who's reference is to be decremented) is nil (i.e. 0x0).

Prima facie, that's the reason for the SEGV, but how is it possible that the compiler would pass a nil pointer to this function the first place?

To put this into context, I'm running FPC 2.6.2 on a 32 bit Linux system executing in a multi-threaded application (which uses python threads and fpc threads). I have not found obvious evidence of memory corruption from other execution contexts or shared memory handling problems.

The SEGV occurs when called from a function, let's call it foo, that looks like this:

function foo : AnsiString;
begin
  Result := '';
 <other stuff>
end;

The AnsiString pointer that fpc_AnsiStr_Decr_Ref throws a SEGV is Result, at the first line of the function foo.

It appears the compiler is passing Result to fpc_AnsiStr_Decr_Ref even though Result (at this point in the function) must be nil (having only just come into scope).

How is is possible that fpc_AnsiStr_Decr_Ref is being called at all?

 Any/all advice gratefully received.

Cheers, Bruce.

_______________________________________________
fpc-pascal maillist  -  [hidden email]
http://lists.freepascal.org/mailman/listinfo/fpc-pascal
Reply | Threaded
Open this post in threaded view
|

Re: FPC 2.6.2 throws SEGV in fpc_AnsiStr_Decr_Ref(). How is this possible?

Bruce Tulloch
I should clarify, foo is a virtual method of an object, not a regular function. -b


On Wed, May 8, 2013 at 4:13 PM, Bruce Tulloch <[hidden email]> wrote:
 After a random but very long period of time (i.e. very many successful calls) I get a SEGV in the built-in function fpc_AnsiStr_Decr_Ref.

GDB reports the argument to fpc_AnsiStr_Decr_Ref (the string who's reference is to be decremented) is nil (i.e. 0x0).

Prima facie, that's the reason for the SEGV, but how is it possible that the compiler would pass a nil pointer to this function the first place?

To put this into context, I'm running FPC 2.6.2 on a 32 bit Linux system executing in a multi-threaded application (which uses python threads and fpc threads). I have not found obvious evidence of memory corruption from other execution contexts or shared memory handling problems.

The SEGV occurs when called from a function, let's call it foo, that looks like this:

function foo : AnsiString;
begin
  Result := '';
 <other stuff>
end;

The AnsiString pointer that fpc_AnsiStr_Decr_Ref throws a SEGV is Result, at the first line of the function foo.

It appears the compiler is passing Result to fpc_AnsiStr_Decr_Ref even though Result (at this point in the function) must be nil (having only just come into scope).

How is is possible that fpc_AnsiStr_Decr_Ref is being called at all?

 Any/all advice gratefully received.

Cheers, Bruce.


_______________________________________________
fpc-pascal maillist  -  [hidden email]
http://lists.freepascal.org/mailman/listinfo/fpc-pascal
Reply | Threaded
Open this post in threaded view
|

Re: FPC 2.6.2 throws SEGV in fpc_AnsiStr_Decr_Ref(). How is this possible?

Michael Van Canneyt
In reply to this post by Bruce Tulloch


On Wed, 8 May 2013, Bruce Tulloch wrote:

>  After a random but very long period of time (i.e. very many successful calls) I get a SEGV in the built-in function fpc_AnsiStr_Decr_Ref.
>
> GDB reports the argument to fpc_AnsiStr_Decr_Ref (the string who's reference is to be decremented) is nil (i.e. 0x0).
>
> Prima facie, that's the reason for the SEGV, but how is it possible that the compiler would pass a nil pointer to this function the first place?
>
> To put this into context, I'm running FPC 2.6.2 on a 32 bit Linux system executing in a multi-threaded application (which uses python threads and fpc threads). I have not found obvious
> evidence of memory corruption from other execution contexts or shared memory handling problems.
>
> The SEGV occurs when called from a function, let's call it foo, that looks like this:
>
> function foo : AnsiString;
> begin
>   Result := '';
>  <other stuff>
> end;
>
> The AnsiString pointer that fpc_AnsiStr_Decr_Ref throws a SEGV is Result, at the first line of the function foo.
>
> It appears the compiler is passing Result to fpc_AnsiStr_Decr_Ref even though Result (at this point in the function) must be nil (having only just come into scope).
This is not correct. Result is NOT guaranteed to be nil.

About a year ago,  I was as surprised as you are to discover this, but it is so.
It is even so in Delphi.

> How is is possible that fpc_AnsiStr_Decr_Ref is being called at all?

Roughly:

What happens is that the caller gives the address of the location where the result must go.
The function receives this address, and then treats it as a normal variable, meaning that
as soon as it is used,  fpc_AnsiStr_Decr_Ref and friends come into play.

The exact behaviour also depends on the compiler version.

One of the compiler maintainers can describe this in more detail.

Michael.
_______________________________________________
fpc-pascal maillist  -  [hidden email]
http://lists.freepascal.org/mailman/listinfo/fpc-pascal
Reply | Threaded
Open this post in threaded view
|

Re: FPC 2.6.2 throws SEGV in fpc_AnsiStr_Decr_Ref(). How is this possible?

Jonas Maebe-2
In reply to this post by Bruce Tulloch

On 08 May 2013, at 08:13, Bruce Tulloch wrote:

> After a random but very long period of time (i.e. very many successful
> calls) I get a SEGV in the built-in function fpc_AnsiStr_Decr_Ref.
>
> GDB reports the argument to fpc_AnsiStr_Decr_Ref (the string who's
> reference is to be decremented) is nil (i.e. 0x0).
>
> Prima facie, that's the reason for the SEGV, but how is it possible  
> that
> the compiler would pass a nil pointer to this function the first  
> place?

The first thing fpc_AnsiStr_Decr_Ref does is check whether its  
parameter is nil, and if so it immediately exists. It can be nil in  
case the ansistring contains an empty string.

That routine itself also sets its argument to nil in case this was not  
the case initially (it's a var-parameter), and I assume your crash  
happens after this has been done.

> To put this into context, I'm running FPC 2.6.2 on a 32 bit Linux  
> system
> executing in a multi-threaded application (which uses python threads  
> and
> fpc threads). I have not found obvious evidence of memory corruption  
> from
> other execution contexts or shared memory handling problems.

It's nevertheless most likely memory corruption. You can try compiling  
with -gv and running your program under valgrind to see whether it  
finds anything (you will probably get some false positives about  
certain RTL pchar routines such as strscan and strlen, but you can  
ignore those).


Jonas
_______________________________________________
fpc-pascal maillist  -  [hidden email]
http://lists.freepascal.org/mailman/listinfo/fpc-pascal
Reply | Threaded
Open this post in threaded view
|

Re: FPC 2.6.2 throws SEGV in fpc_AnsiStr_Decr_Ref(). How is this possible?

Bruce Tulloch
Thanks Jonas, that confirms what I suspected. Next time I trap an instance of this (rare) fault I will inspect exactly which CPU instruction raised the SEGV inside fpc_AnsiStr_Decr_Ref in search of a source of memory corruption.


Bruce.


On Wed, May 8, 2013 at 11:49 PM, Jonas Maebe <[hidden email]> wrote:

On 08 May 2013, at 08:13, Bruce Tulloch wrote:

After a random but very long period of time (i.e. very many successful
calls) I get a SEGV in the built-in function fpc_AnsiStr_Decr_Ref.

GDB reports the argument to fpc_AnsiStr_Decr_Ref (the string who's
reference is to be decremented) is nil (i.e. 0x0).

Prima facie, that's the reason for the SEGV, but how is it possible that
the compiler would pass a nil pointer to this function the first place?

The first thing fpc_AnsiStr_Decr_Ref does is check whether its parameter is nil, and if so it immediately exists. It can be nil in case the ansistring contains an empty string.

That routine itself also sets its argument to nil in case this was not the case initially (it's a var-parameter), and I assume your crash happens after this has been done.


To put this into context, I'm running FPC 2.6.2 on a 32 bit Linux system
executing in a multi-threaded application (which uses python threads and
fpc threads). I have not found obvious evidence of memory corruption from
other execution contexts or shared memory handling problems.

It's nevertheless most likely memory corruption. You can try compiling with -gv and running your program under valgrind to see whether it finds anything (you will probably get some false positives about certain RTL pchar routines such as strscan and strlen, but you can ignore those).


Jonas
_______________________________________________
fpc-pascal maillist  -  [hidden email]
http://lists.freepascal.org/mailman/listinfo/fpc-pascal


_______________________________________________
fpc-pascal maillist  -  [hidden email]
http://lists.freepascal.org/mailman/listinfo/fpc-pascal
Reply | Threaded
Open this post in threaded view
|

Re: FPC 2.6.2 throws SEGV in fpc_AnsiStr_Decr_Ref(). How is this possible?

Bruce Tulloch
In reply to this post by Michael Van Canneyt
Michael, thanks for your feedback.

One thing that confuses me in light of Jonas' reply, if what you say is correct (that local variables that have just come into scope are not guaranteed to be nil) then assignment of Result := ''; at the first line of foo may arbitrarily SEGV because fpc_AnsiStr_Decr_Ref will interpret the (possibly) non-nil value (of Result) as an AnsiString which (being a random uninitialized value) will likely be incorrect and blow up.

Surely the semantics of string handling relies on FPC guaranteeing automatic variables are always preassigned nil when they come into scope?

Put another way, how does fpc_AnsiStr_Decr_Ref and friends, which receive the address of the caller's Result variable via their var parameter know that the value of this parameter (which may not be initialized if what you say is correct) is or is not a valid string?

Bruce.

On Wed, May 8, 2013 at 5:18 PM, Michael Van Canneyt <[hidden email]> wrote:


On Wed, 8 May 2013, Bruce Tulloch wrote:

 After a random but very long period of time (i.e. very many successful calls) I get a SEGV in the built-in function fpc_AnsiStr_Decr_Ref.

GDB reports the argument to fpc_AnsiStr_Decr_Ref (the string who's reference is to be decremented) is nil (i.e. 0x0).

Prima facie, that's the reason for the SEGV, but how is it possible that the compiler would pass a nil pointer to this function the first place?

To put this into context, I'm running FPC 2.6.2 on a 32 bit Linux system executing in a multi-threaded application (which uses python threads and fpc threads). I have not found obvious
evidence of memory corruption from other execution contexts or shared memory handling problems.

The SEGV occurs when called from a function, let's call it foo, that looks like this:

function foo : AnsiString;
begin
  Result := '';
 <other stuff>
end;

The AnsiString pointer that fpc_AnsiStr_Decr_Ref throws a SEGV is Result, at the first line of the function foo.

It appears the compiler is passing Result to fpc_AnsiStr_Decr_Ref even though Result (at this point in the function) must be nil (having only just come into scope).

This is not correct. Result is NOT guaranteed to be nil.

About a year ago,  I was as surprised as you are to discover this, but it is so.
It is even so in Delphi.


How is is possible that fpc_AnsiStr_Decr_Ref is being called at all?

Roughly:

What happens is that the caller gives the address of the location where the result must go.
The function receives this address, and then treats it as a normal variable, meaning that as soon as it is used,  fpc_AnsiStr_Decr_Ref and friends come into play.

The exact behaviour also depends on the compiler version.

One of the compiler maintainers can describe this in more detail.

Michael.

_______________________________________________
fpc-pascal maillist  -  [hidden email]
http://lists.freepascal.org/mailman/listinfo/fpc-pascal


_______________________________________________
fpc-pascal maillist  -  [hidden email]
http://lists.freepascal.org/mailman/listinfo/fpc-pascal
Reply | Threaded
Open this post in threaded view
|

Re: FPC 2.6.2 throws SEGV in fpc_AnsiStr_Decr_Ref(). How is this possible?

Bruce Tulloch
In reply to this post by Bruce Tulloch
I've not managed to trap it again, but based on the information I have from the last time it occurred I can say the error happened here:

--- a/rtl/i386/i386.inc
+++ b/rtl/i386/i386.inc
@@ -1523,7 +1523,7 @@
         movl    (%eax),%edx
         subl    $8,%edx
 // [102] If l^<0 then exit;
         cmpl    $0,(%edx) <-- SEGV OCCURS HERE
         jl      .Lj3596
 .Lj3603:
 // [104] If declocked(l^) then

That is, when testing the string length, the address of the length variable appears to be duff.

I don't know what %edx was pointing to at the time (I hope to know next time I trap it) but it was obviously wrong.

-b


On Thu, May 9, 2013 at 9:33 AM, Bruce Tulloch <[hidden email]> wrote:
Thanks Jonas, that confirms what I suspected. Next time I trap an instance of this (rare) fault I will inspect exactly which CPU instruction raised the SEGV inside fpc_AnsiStr_Decr_Ref in search of a source of memory corruption.


Bruce.


On Wed, May 8, 2013 at 11:49 PM, Jonas Maebe <[hidden email]> wrote:

On 08 May 2013, at 08:13, Bruce Tulloch wrote:

After a random but very long period of time (i.e. very many successful
calls) I get a SEGV in the built-in function fpc_AnsiStr_Decr_Ref.

GDB reports the argument to fpc_AnsiStr_Decr_Ref (the string who's
reference is to be decremented) is nil (i.e. 0x0).

Prima facie, that's the reason for the SEGV, but how is it possible that
the compiler would pass a nil pointer to this function the first place?

The first thing fpc_AnsiStr_Decr_Ref does is check whether its parameter is nil, and if so it immediately exists. It can be nil in case the ansistring contains an empty string.

That routine itself also sets its argument to nil in case this was not the case initially (it's a var-parameter), and I assume your crash happens after this has been done.


To put this into context, I'm running FPC 2.6.2 on a 32 bit Linux system
executing in a multi-threaded application (which uses python threads and
fpc threads). I have not found obvious evidence of memory corruption from
other execution contexts or shared memory handling problems.

It's nevertheless most likely memory corruption. You can try compiling with -gv and running your program under valgrind to see whether it finds anything (you will probably get some false positives about certain RTL pchar routines such as strscan and strlen, but you can ignore those).


Jonas
_______________________________________________
fpc-pascal maillist  -  [hidden email]
http://lists.freepascal.org/mailman/listinfo/fpc-pascal



_______________________________________________
fpc-pascal maillist  -  [hidden email]
http://lists.freepascal.org/mailman/listinfo/fpc-pascal
Reply | Threaded
Open this post in threaded view
|

Re: FPC 2.6.2 throws SEGV in fpc_AnsiStr_Decr_Ref(). How is this possible?

Bruce Tulloch
So here's some more diagnostic at the point of the SEGV:
(gdb) disass 
Dump of assembler code for function _$SYSTEM$_Ll1637:
=> 0x0118ace1 <+0>:     cmpl   $0x0,(%edx)
End of assembler dump.
(gdb) i reg
eax            0xb6c77158       -1228443304
ecx            0xb6c76c04       -1228444668
edx            0xfffffff8       -8
ebx            0x12adbf8        19586040
esp            0xb6c75f5c       0xb6c75f5c
ebp            0xb6c75f70       0xb6c75f70
esi            0xb6c77020       -1228443616
edi            0xb6c77020       -1228443616
eip            0x118ace1        0x118ace1 <_$SYSTEM$_Ll1637>
eflags         0x210293 [ CF AF SF IF RF ID ]
cs             0x73     115
ss             0x7b     123
ds             0x7b     123
es             0x7b     123
fs             0x0      0
gs             0x33     51
(gdb) p $eax^
$4 = 0
This tells me that the test at the top of fpc_AnsiStr_Decr_Ref:
        cmpl $0,(%eax)
        jne .Ldecr_ref_continue
        ret
.Ldecr_ref_continue:
passed (i.e. (%eax) was NOT nil) but sometime during the execution of the following code:
// Temps allocated between ebp-24 and ebp+0
        subl    $4,%esp
// Var S located in register
// Var l located in register
        movl    %eax,(%esp)
// [101] l:[hidden email](S-FirstOff)^.Ref;
        movl    (%eax),%edx
        subl    $8,%edx
// [102] If l^<0 then exit;
        cmpl    $0,(%edx)
the variable (%eax) MUST have been changed (to nil) BY ANOTHER THREAD.

Is there any other plausible explanation I may have missed?

If there is no other explanation, then it means I need to find out how the string variable referred to by (%eax) could have been been accessed (or even known to exist) by any other thread in the same address space.

If that variable is local to a function (i.e. foo's Result with SEGV upon its assignment immediately it first comes into scope, per my earlier email) then absent a bug in FPC's handling string references and allocation, it seems impossible that it could be known or referenced by any other other thread.

I'm reasonably confident there's no other way it could be overwritten by another thread (i.e. I don't think there are any range or buffer pointer errors anywhere else) so logic tells me I must have the wrong thesis or there's a string handling error in FPC.

Any clues or insight, gratefully received :-)

Cheers, Bruce.

PS: I can't use valgrind in practice for a variety of reasons, not the least of which is that I'm not likely to see the error for an extraordinary long time given that slight changes to the (execution time of the) code made so far have had a dramatic effect on the likelihood of the occurrence of this problem at all but it's clearly some sort of race condition over unprotected memory somewhere.



On Thu, May 9, 2013 at 9:47 AM, Bruce Tulloch <[hidden email]> wrote:
I've not managed to trap it again, but based on the information I have from the last time it occurred I can say the error happened here:

--- a/rtl/i386/i386.inc
+++ b/rtl/i386/i386.inc
@@ -1523,7 +1523,7 @@
         movl    (%eax),%edx
         subl    $8,%edx
 // [102] If l^<0 then exit;
         cmpl    $0,(%edx) <-- SEGV OCCURS HERE
         jl      .Lj3596
 .Lj3603:
 // [104] If declocked(l^) then

That is, when testing the string length, the address of the length variable appears to be duff.

I don't know what %edx was pointing to at the time (I hope to know next time I trap it) but it was obviously wrong.

-b


On Thu, May 9, 2013 at 9:33 AM, Bruce Tulloch <[hidden email]> wrote:
Thanks Jonas, that confirms what I suspected. Next time I trap an instance of this (rare) fault I will inspect exactly which CPU instruction raised the SEGV inside fpc_AnsiStr_Decr_Ref in search of a source of memory corruption.


Bruce.


On Wed, May 8, 2013 at 11:49 PM, Jonas Maebe <[hidden email]> wrote:

On 08 May 2013, at 08:13, Bruce Tulloch wrote:

After a random but very long period of time (i.e. very many successful
calls) I get a SEGV in the built-in function fpc_AnsiStr_Decr_Ref.

GDB reports the argument to fpc_AnsiStr_Decr_Ref (the string who's
reference is to be decremented) is nil (i.e. 0x0).

Prima facie, that's the reason for the SEGV, but how is it possible that
the compiler would pass a nil pointer to this function the first place?

The first thing fpc_AnsiStr_Decr_Ref does is check whether its parameter is nil, and if so it immediately exists. It can be nil in case the ansistring contains an empty string.

That routine itself also sets its argument to nil in case this was not the case initially (it's a var-parameter), and I assume your crash happens after this has been done.


To put this into context, I'm running FPC 2.6.2 on a 32 bit Linux system
executing in a multi-threaded application (which uses python threads and
fpc threads). I have not found obvious evidence of memory corruption from
other execution contexts or shared memory handling problems.

It's nevertheless most likely memory corruption. You can try compiling with -gv and running your program under valgrind to see whether it finds anything (you will probably get some false positives about certain RTL pchar routines such as strscan and strlen, but you can ignore those).


Jonas
_______________________________________________
fpc-pascal maillist  -  [hidden email]
http://lists.freepascal.org/mailman/listinfo/fpc-pascal




_______________________________________________
fpc-pascal maillist  -  [hidden email]
http://lists.freepascal.org/mailman/listinfo/fpc-pascal
Reply | Threaded
Open this post in threaded view
|

Re: FPC 2.6.2 throws SEGV in fpc_AnsiStr_Decr_Ref(). How is this possible?

Ludo Brands
On 05/09/2013 05:19 AM, Bruce Tulloch wrote:

>
> This tells me that the test at the top of fpc_AnsiStr_Decr_Ref:
>
>         cmpl $0,(%eax)
>         jne .Ldecr_ref_continue
>         ret
> .Ldecr_ref_continue:
>
> passed (i.e. (%eax) was NOT nil) but sometime during the execution of
> the following code:
>
> // Temps allocated between ebp-24 and ebp+0
>         subl    $4,%esp
> // Var S located in register
> // Var l located in register
>         movl    %eax,(%esp)
> // [101] l:=@PAnsiRec <mailto:=@PAnsiRec>(S-FirstOff)^.Ref;
>         movl    (%eax),%edx
>         subl    $8,%edx
> // [102] If l^<0 then exit;
>         cmpl    $0,(%edx)
>
> the variable (%eax) MUST have been changed (to nil) BY ANOTHER THREAD.
>
> Is there any other plausible explanation I may have missed?
>

SIGSEGV is caused by an access to any memory outside the process address
space. Not only nil. So the first test only checks if the address is not
nil but will let other, even invalid, addresses pass on.


> If there is no other explanation, then it means I need to find out how
> the string variable referred to by (%eax) could have been been accessed
> (or even known to exist) by any other thread in the same address space.
>
> If that variable is local to a function (i.e. foo's Result with SEGV
> upon its assignment immediately it first comes into scope, per my
> earlier email) then absent a bug in FPC's handling string references and
> allocation, it seems impossible that it could be known or referenced by
> any other other thread.
>
> I'm reasonably confident there's no other way it could be overwritten by
> another thread (i.e. I don't think there are any range or buffer pointer
> errors anywhere else) so logic tells me I must have the wrong thesis or
> there's a string handling error in FPC.
>
> Any clues or insight, gratefully received :-)
>

Result in foo is initialized with the address of the left side variable
in the call to foo. If you have
  s:=foo;
result will point to s. If you just call
  foo;
and drop the result, the compiler will create and use a hidden temp
string variable. Strings are managed types and initialized to nil.

So you are looking at the wrong location for your bug. You should look
at what has corrupted the string variable that receives the result of foo.

Ludo
_______________________________________________
fpc-pascal maillist  -  [hidden email]
http://lists.freepascal.org/mailman/listinfo/fpc-pascal
Reply | Threaded
Open this post in threaded view
|

Re: FPC 2.6.2 throws SEGV in fpc_AnsiStr_Decr_Ref(). How is this possible?

José Mejuto
In reply to this post by Bruce Tulloch
El 09/05/2013 5:19, Bruce Tulloch escribió:

> If there is no other explanation, then it means I need to find out how
> the string variable referred to by (%eax) could have been been accessed
> (or even known to exist) by any other thread in the same address space.--

Hello,

In the past I had suffered a problem like yours and the culprit was
another different function that passes result (string) as a parameter
when calling a function without initialization, something like this:

function foo(var para: string): string;
begin
   //Something with para
end;

function bar(): string;
begin
   result:=foo(result);
end;

I hope this helps...
_______________________________________________
fpc-pascal maillist  -  [hidden email]
http://lists.freepascal.org/mailman/listinfo/fpc-pascal
Reply | Threaded
Open this post in threaded view
|

Re: FPC 2.6.2 throws SEGV in fpc_AnsiStr_Decr_Ref(). How is this possible?

Bruce Tulloch
In reply to this post by Ludo Brands
Thanks Ludo, but I know the value in (%eax) in this case is nil (see the cpu register dump in my email) because the address of the string length (in edx) is 0xfffffff8 (which is "8 less than nil") per the instruction just before the one that fails with SEGV. The SEGV itself is caused by an attempt to read the address in edx, i.e. 0xfffffff8 at the instruction cmpl $0,(%edx).

The corruption is not occurring when the return value of foo is used, it's occurring when the Result variable in foo is first assigned (a valid string, '') when Result first appears in scope of the body of the function foo.

Thanks for your feedback. Cheers, Bruce.


On Thu, May 9, 2013 at 4:21 PM, Ludo Brands <[hidden email]> wrote:
On 05/09/2013 05:19 AM, Bruce Tulloch wrote:
>
> This tells me that the test at the top of fpc_AnsiStr_Decr_Ref:
>
>         cmpl $0,(%eax)
>         jne .Ldecr_ref_continue
>         ret
> .Ldecr_ref_continue:
>
> passed (i.e. (%eax) was NOT nil) but sometime during the execution of
> the following code:
>
> // Temps allocated between ebp-24 and ebp+0
>         subl    $4,%esp
> // Var S located in register
> // Var l located in register
>         movl    %eax,(%esp)
> // [101] l:=@PAnsiRec <mailto:=@PAnsiRec>(S-FirstOff)^.Ref;
>         movl    (%eax),%edx
>         subl    $8,%edx
> // [102] If l^<0 then exit;
>         cmpl    $0,(%edx)
>
> the variable (%eax) MUST have been changed (to nil) BY ANOTHER THREAD.
>
> Is there any other plausible explanation I may have missed?
>

SIGSEGV is caused by an access to any memory outside the process address
space. Not only nil. So the first test only checks if the address is not
nil but will let other, even invalid, addresses pass on.


> If there is no other explanation, then it means I need to find out how
> the string variable referred to by (%eax) could have been been accessed
> (or even known to exist) by any other thread in the same address space.
>
> If that variable is local to a function (i.e. foo's Result with SEGV
> upon its assignment immediately it first comes into scope, per my
> earlier email) then absent a bug in FPC's handling string references and
> allocation, it seems impossible that it could be known or referenced by
> any other other thread.
>
> I'm reasonably confident there's no other way it could be overwritten by
> another thread (i.e. I don't think there are any range or buffer pointer
> errors anywhere else) so logic tells me I must have the wrong thesis or
> there's a string handling error in FPC.
>
> Any clues or insight, gratefully received :-)
>

Result in foo is initialized with the address of the left side variable
in the call to foo. If you have
  s:=foo;
result will point to s. If you just call
  foo;
and drop the result, the compiler will create and use a hidden temp
string variable. Strings are managed types and initialized to nil.

So you are looking at the wrong location for your bug. You should look
at what has corrupted the string variable that receives the result of foo.

Ludo
_______________________________________________
fpc-pascal maillist  -  [hidden email]
http://lists.freepascal.org/mailman/listinfo/fpc-pascal


_______________________________________________
fpc-pascal maillist  -  [hidden email]
http://lists.freepascal.org/mailman/listinfo/fpc-pascal
Reply | Threaded
Open this post in threaded view
|

Re: FPC 2.6.2 throws SEGV in fpc_AnsiStr_Decr_Ref(). How is this possible?

Bruce Tulloch
In reply to this post by José Mejuto
Thanks José, I can see that might cause a problem given bar passes result by reference to foo without initializing result first. My question to Jonas or others more knowledgeable than me about what the compiler does, is whether result (in your example and my own case) is guaranteed to be initialized to nil when it first appears in scope (i.e. before it's been assigned any value in our code). If it is initialized to nil, then foo would receive a reference to bar's result variable (via para) and the value of that variable would be nil (and all would be okay). If it isn't initialized to nil, the same rule applies but the value of result (as seen by foo via para) would likely be invalid and would probably blow up in foo when dereferenced (as a string).

My problem is similar except that I know it's not nil when passed in (because the initial test in fpc_AnsiStr_Decr_Ref looking for nil passes) but that it becomes nil very soon afterward (because the SEGV arises as an indirect result of it being nil, as I explained in my reply to Ludo just now).

I'm pretty sure I have a shared memory problem somewhere between threads in my code but I can't understand how this could be given the "erroneously shared" variable appears to be an automatic variable (i.e. Result) that has just been created on the stack in the function foo that calls fpc_AnsiStr_Decr_Ref where the SEGV occurs.

I'll keep looking :-) Bruce.


On Thu, May 9, 2013 at 9:48 PM, José Mejuto <[hidden email]> wrote:
El 09/05/2013 5:19, Bruce Tulloch escribió:

If there is no other explanation, then it means I need to find out how
the string variable referred to by (%eax) could have been been accessed
(or even known to exist) by any other thread in the same address space.--

Hello,

In the past I had suffered a problem like yours and the culprit was another different function that passes result (string) as a parameter when calling a function without initialization, something like this:

function foo(var para: string): string;
begin
  //Something with para
end;

function bar(): string;
begin
  result:=foo(result);
end;

I hope this helps...

_______________________________________________
fpc-pascal maillist  -  [hidden email]
http://lists.freepascal.org/mailman/listinfo/fpc-pascal


_______________________________________________
fpc-pascal maillist  -  [hidden email]
http://lists.freepascal.org/mailman/listinfo/fpc-pascal
Reply | Threaded
Open this post in threaded view
|

Re: FPC 2.6.2 throws SEGV in fpc_AnsiStr_Decr_Ref(). How is this possible?

Jonas Maebe-2

On 09 May 2013, at 14:39, Bruce Tulloch wrote:

> Thanks José, I can see that might cause a problem given bar passes result
> by reference to foo without initializing result first. My question to Jonas
> or others more knowledgeable than me about what the compiler does, is
> whether result (in your example and my own case) is guaranteed to be
> initialized to nil when it first appears in scope (i.e. before it's been
> assigned any value in our code).

Every instance of an automated type, whether it was explicitly declared or implicitly created as a temp, initially gets the value "nil".

However, as Michael and Ludo explained, the "result" variable of a function returning an ansistring/unicodestring is not created inside that function itself. The compiler turns such functions into procedures with an implicit var-parameter and the *caller* passes the location where the function result should go via that parameter. This location can be a temporary location, but the compiler can also optimize this by directly passing the location of the variable to which you assign the result of that function call. Such optimizations only occur in safe situations (e.g., not when assigning to a global variable, because otherwise assigning something to the function result would immediately change the value of that global variable too), but as Ludo explains this means that you are looking in the wrong place for the data race.

So you are probably writing in two threads to whatever you are assigning the result of that function to.


Jonas_______________________________________________
fpc-pascal maillist  -  [hidden email]
http://lists.freepascal.org/mailman/listinfo/fpc-pascal
Reply | Threaded
Open this post in threaded view
|

Re: FPC 2.6.2 throws SEGV in fpc_AnsiStr_Decr_Ref(). How is this possible?

Sven Barth-2
In reply to this post by Bruce Tulloch
On 09.05.2013 14:39, Bruce Tulloch wrote:

> Thanks José, I can see that might cause a problem given bar passes
> result by reference to foo without initializing result first. My
> question to Jonas or others more knowledgeable than me about what the
> compiler does, is whether result (in your example and my own case) is
> guaranteed to be initialized to nil when it first appears in scope (i.e.
> before it's been assigned any value in our code). If it is initialized
> to nil, then foo would receive a reference to bar's result variable (via
> para) and the value of that variable would be nil (and all would be
> okay). If it isn't initialized to nil, the same rule applies but the
> value of result (as seen by foo via para) would likely be invalid and
> would probably blow up in foo when dereferenced (as a string).
>
> My problem is similar except that I know it's not nil when passed in
> (because the initial test in fpc_AnsiStr_Decr_Ref looking for nil
> passes) but that it becomes nil very soon afterward (because the SEGV
> arises as an indirect result of it being nil, as I explained in my reply
> to Ludo just now).
>
> I'm pretty sure I have a shared memory problem somewhere between threads
> in my code but I can't understand how this could be given the
> "erroneously shared" variable appears to be an automatic variable (i.e.
> Result) that has just been created on the stack in the function foo that
> calls fpc_AnsiStr_Decr_Ref where the SEGV occurs.
>
> I'll keep looking :-) Bruce.

Do you play around with pointers anywhere? I once had it that I
overwrote something in a parent stackframe, so maybe you could by
accident access the memory location of the Result variable...

Regards,
Sven

_______________________________________________
fpc-pascal maillist  -  [hidden email]
http://lists.freepascal.org/mailman/listinfo/fpc-pascal
Reply | Threaded
Open this post in threaded view
|

Re: FPC 2.6.2 throws SEGV in fpc_AnsiStr_Decr_Ref(). How is this possible?

Bruce Tulloch
In reply to this post by Jonas Maebe-2
> The compiler turns such functions into procedures with an implicit var-parameter
> and the *caller* passes the location where the function result should go via that
> parameter.

Okay, thanks, that clarifies, now I understand how a variable in the caller's scope can be affected while making assignments to Result in the callee's scope BEFORE callee has finished executing.

Another way of stating this is; Result is a local variable of a function, initialized to nil and passed by value to the caller upon completion ONLY if Result not a reference to a dynamic type, otherwise it's an implicit var argument with scope beyond that of the function.

Is that correct? If so, it would seem to be a bit of semantic trap for the unwary :-)

> Such optimizations only occur in safe situations (e.g., not when assigning to a
> global variable...

Does the compiler consider ANY non-local variable to be global?

For example, fields of an object?

> So you are probably writing in two threads to whatever you are assigning the
> result of that function to.

Yep, makes sense, we will look carefully to see if that's what we're doing.

The functions concerned are actually methods of the TBlockSocket class of the synapse library. We use an instance of this class in two threads; one sending, the other receiving.

These threads have full shared memory protection in our own code but having a look at the TBlockSocket implementation I can see at least one suspect; FLastErrorDesc.

This field is changed by methods that send and receive on the socket which means it's assigned values in the context of two different threads (given our usage). Indeed it suggests TBlockSocket is not thread safe as currently coded. Looks like a smoking gun to me.

Thanks one and all for all your helpful feedback!

Bruce.



On Thu, May 9, 2013 at 10:55 PM, Jonas Maebe <[hidden email]> wrote:

On 09 May 2013, at 14:39, Bruce Tulloch wrote:

> Thanks José, I can see that might cause a problem given bar passes result
> by reference to foo without initializing result first. My question to Jonas
> or others more knowledgeable than me about what the compiler does, is
> whether result (in your example and my own case) is guaranteed to be
> initialized to nil when it first appears in scope (i.e. before it's been
> assigned any value in our code).

Every instance of an automated type, whether it was explicitly declared or implicitly created as a temp, initially gets the value "nil".

However, as Michael and Ludo explained, the "result" variable of a function returning an ansistring/unicodestring is not created inside that function itself. The compiler turns such functions into procedures with an implicit var-parameter and the *caller* passes the location where the function result should go via that parameter. This location can be a temporary location, but the compiler can also optimize this by directly passing the location of the variable to which you assign the result of that function call. Such optimizations only occur in safe situations (e.g., not when assigning to a global variable, because otherwise assigning something to the function result would immediately change the value of that global variable too), but as Ludo explains this means that you are looking in the wrong place for the data race.

So you are probably writing in two threads to whatever you are assigning the result of that function to.


Jonas_______________________________________________


_______________________________________________
fpc-pascal maillist  -  [hidden email]
http://lists.freepascal.org/mailman/listinfo/fpc-pascal
Reply | Threaded
Open this post in threaded view
|

Re: FPC 2.6.2 throws SEGV in fpc_AnsiStr_Decr_Ref(). How is this possible?

Jonas Maebe-2

On 10 May 2013, at 03:19, Bruce Tulloch wrote:

>> The compiler turns such functions into procedures with an implicit
>> var-parameter
>> and the *caller* passes the location where the function result  
>> should go
>> via that
>> parameter.
>
> Okay, thanks, that clarifies, now I understand how a variable in the
> caller's scope can be affected while making assignments to Result in  
> the
> callee's scope BEFORE callee has finished executing.
>
> Another way of stating this is; Result is a local variable of a  
> function,
> initialized to nil and passed by value to the caller upon completion  
> ONLY
> if Result not a reference to a dynamic type, otherwise it's an  
> implicit var
> argument with scope beyond that of the function.
>
> Is that correct?

Yes, apart from the fact that result is never initialized to nil.

> If so, it would seem to be a bit of semantic trap for the
> unwary :-)

Differences in the execution because of the above change can only  
occur in case you have memory corruption. On the other hand, in that  
case anything is possible regardless of what optimisation have or have  
not been performed by the compiler.

>> Such optimizations only occur in safe situations (e.g., not when
>> assigning to a
>> global variable...
>
> Does the compiler consider ANY non-local variable to be global?
>
> For example, fields of an object?

These are indeed global. And so are e.g. local variables whose address  
has been taken, that are used in assembler code, or that have been  
passed to a var-parameter (because the called routine may then have  
stored its address). There are no cases that I know of where the  
compiler can perform that optimisation in an unsafe scenario.


Jonas
_______________________________________________
fpc-pascal maillist  -  [hidden email]
http://lists.freepascal.org/mailman/listinfo/fpc-pascal