13 Replies Latest reply on Mar 16, 2005 12:08 AM by 807575

    memory access violation caused in string class

    807575
      Hello all,
      I have a multithreaded application. In my main thread I create a link list using the STL list container. The list is an doubly link list of structs and each struct contains a string* to identify the name of the struct.
      The string* is allocated in the main thread using new as shown below:-
      typedef struct{
      string* name;
      ..
      ..
      }GRM;

      GRM* lgrm = new GRM;
      //I have a pre-allocated GRM struct with values filled in and I am making a local copy of it here..
      _localCopy(GRM &lgrm, const GRM &grm);
      ..
      ..
      ..
      mylist.push_back(*lgrm);
      }

      void _localCopy(GRM &lgrm, const GRM &grm)
      {
      ..
      ...
      name = new string( grm.name->c_str());
      ...
      ..
      }

      somewhere else in this I invoke multiple threads with each thread trying to print the name of a single struct.
      Now the problem is even before the threads are invoked, the address of the location pointed by the lgrm.name gets changed. Not sure why??. when I trace this variable ( lgrm.name ) inside the list, it remains intact till I start allocating memory for my thread objects:
      pthread_t *child = new pthread_t[maxThreads];

      The problem completely disappears when I use char* instead of string* ..
      Please help!!!!
      Is it do with some STL library requirements. I use -mt and -D_REENTRANT option and the library=-lpthread
        • 1. Re: memory access violation caused in string class
          807575
          There isn't quite enough detail about how _localCopy works. If you just copy the striing pointer member of GRM, that might be your problem.  When a string is deleted from the heap or a string object goes out of scope, you can be left with a dangling pointer if you made a copy of the string pointer instead of the string object.

          Another possibility is a known bug in the std::string class that has recently been fixed in Sun Studio 8 (C++ 5.5) and later. The bug shows up under the following circumstances:

          A std::string object is shared by multiple threads, the code is compiled using the default -xarch=v7 or -xarch=v8 , the program uses the default libCstd, and the code is run on an UltraSPARC system.

          In this one case, thread locking for the std::string class does not work properly, and programs can crash or behave strangely.

          The bug does not occur if you compile on x86 systems, or use -xarch=v8plus[ab] or -xarch=v9[ab], or if you use the optional STLport library instead of the default libCstd.

          The workaround is to use -xarch=v8plus on every CC command line, compiling and linking. The resulting code can be run only on UltraSPARC systems. (Pre-Ultra systems have been EOL for years, will not run current versions of Solaris, and are no longer supported by Sun.) This workaround will result in smaller and faster code, so if you do not deploy on obsolete sparcstations, this is the preferred fix. You won't need compiler or runtime library patches.
          • 2. Re: memory access violation caused in string class
            807575
            I am the original author. .my old username was not recognized when I logged in the next time.
            Thanks for the reply.. sorry for not being clear with _localCopy function. All the the function does is that it creates local copy of the string name from the global GRM grm.

            lgrm.Name =new string(grm.Name->c_str());
            So the question is if the grm goes out of scope, why should lgrm.Name point to an invalid address. lgrm.Name has a new memory allocation and is initalized by the value pointed by grm.Name->c_str();

            If we were to visualize, think grm as a global struct with a data variable called name of type string and is allocated on the heap with a value. This grm struct contents are then copied into local structs of the same type by creating new memory locations to hold the name ( using lgrm.name = new string(grm.name->c_str()); Therefore, even if the grm goes out of scope, the local lgrm should still contain the value - am I correct? -PLEASE HELP!!
            • 3. Re: memory access violation caused in string class
              807575
              I did insert the -xarch=v8plus , but it did not help. Let me share with you the output in debugger:-
              -------------------------------------------------------------------
              (/opt/SUNWspro/bin/../WS6U2/bin/sparcv9/dbx) print *clist.__node.next.data.gs
              >
              *clist.__node->next->data.gslist.__node->next->data.groupName = {
              __data_ = {
              __data_ = 0x153068 "239.2.2.2"
              }
              __nullref = struct __rwstd::__null_string_ref_rep<char,std::char_traits<cha
              r>,std::allocator<char>,__rwstd::__string_ref_rep<std::allocator<char> > > /* S
              TATIC CLASS */
              }
              (/opt/SUNWspro/bin/../WS6U2/bin/sparcv9/dbx) next
              t@1 (l@1) stopped in MgroupManager::forkChildren at line 2579 in file "MgroupMa
              nager.cc"
              2579 pthread_t *child = new pthread_t[maxThreads];
              (/opt/SUNWspro/bin/../WS6U2/bin/sparcv9/dbx) next
              t@1 (l@1) stopped in MgroupManager::forkChildren at line 2580 in file "MgroupMa
              nager.cc"
              2580 int *timeArray = new int[maxThreads +1];
              (/opt/SUNWspro/bin/../WS6U2/bin/sparcv9/dbx) next
              t@1 (l@1) stopped in MgroupManager::forkChildren at line 2581 in file "MgroupMa
              nager.cc"
              2581 int *index = new int[maxThreads +1];
              (/opt/SUNWspro/bin/../WS6U2/bin/sparcv9/dbx) print *clist.__node.next.data.gs >
              *clist.__node->next->data.gslist.__node->next->data.groupName = {
              __data_ = {
              __data_ = 0x13e6b0 ""
              }
              __nullref = struct __rwstd::__null_string_ref_rep<char,std::char_traits<cha
              r>,std::allocator<char>,__rwstd::__string_ref_rep<std::allocator<char> > > /* S
              TATIC CLASS */


              -----------------------------------------------------------------------
              As you can see above the data looks valid until new thread objects are created. Infact __data_ memory location itself changes. I am kind of stuck here!!
              • 4. Re: memory access violation caused in string class
                807575
                Are you compiling and linking with "-mt" on every CC command line?
                The option is required for multithreaded programs, and must be used consistently.
                • 5. Re: memory access violation caused in string class
                  807575
                  Yes, I am compiling and linking with -mt option. This is what is happening. I am giving you the debug output. I have function called readXMLInput() where I create a struct called grm of type GSstruct.
                  typedef struct{
                  string* groupName;
                  ..
                  ..
                  }_GSstruct;
                  typedef<_GSstruct> GSlist
                  typedef struct{
                  string *rName;
                  ..
                  GSlist gslist;
                  } Combo;

                  list<Combo> clist;
                  --------------------------debug output--------------------------
                  2836 _grm.groupName = new string(uc.groupAddress);
                  (/opt/SUNWspro/bin/../WS6U2/bin/sparcv9/dbx) next
                  t@1 (l@1) stopped in MgroupManager::readXMLInput at line 2837 in file "MgroupManager.cc"
                  2837 _grm.problemCondition=0;
                  (/opt/SUNWspro/bin/../WS6U2/bin/sparcv9/dbx) print _grm.groupName
                  _grm.groupName = 0x14d6c0
                  (/opt/SUNWspro/bin/../WS6U2/bin/sparcv9/dbx) print _grm.groupName
                  _grm.groupName = {
                  __data_ = {
                  __data_ = 0x14d280 "239.2.2.2"
                  }
                  __nullref = struct __rwstd::__null_string_ref_rep<char,std::char_traits<char>,std::allocator<char>,__rwstd::__string_ref_rep<std::allocator<char> > > /* STATIC CLASS */
                  }
                  -----------------------debug output end---------------------
                  Now somewhere down I pass this struct as reference pointer to another function called
                  insertToComboList(&grm) where I make a copy of the contents of this struct _grm by copying its contents into another struct lgrm of the same type but allocated on the heap: see below:-
                  ---------------------------debug output-------------------------
                  GSstruct* lgrm   = new GSstruct;
                  _localGSCopy(*lgrm,grm);
                  .....
                  localGSCopy(GSstruct&lgrm, GSstruct &grm)
                  {
                  595 lgrm.groupName =new string();
                  (/opt/SUNWspro/bin/../WS6U2/bin/sparcv9/dbx) next
                  t@1 (l@1) stopped in McastTopo::_localGSCopy at line 596 in file "McastTopo.cc"
                  596 lgrm.groupName->assign(*(grm.groupName));
                  (/opt/SUNWspro/bin/../WS6U2/bin/sparcv9/dbx) next
                  t@1 (l@1) stopped in McastTopo::_localGSCopy at line 597 in file "McastTopo.cc"
                  597 lgrm.sourceLen = grm.sourceLen;
                  (/opt/SUNWspro/bin/../WS6U2/bin/sparcv9/dbx) print lgrm.groupName
                  lgrm.groupName = 0x14d6d0
                  (/opt/SUNWspro/bin/../WS6U2/bin/sparcv9/dbx) print *lgrm.groupName
                  *lgrm.groupName = {
                  __data_ = {
                  __data_ = 0x14d188 "239.2.2.2"
                  }
                  __nullref = struct __rwstd::__null_string_ref_rep<char,std::char_traits<char>,std::allocator<char>,__rwstd::__string_ref_rep<std::allocator<char> > > /* STATIC CLASS */
                  }
                  -------------------------debug output end-------------------
                  3. Now this struct is inserted into a list of type GSstruct which belongs to another struct of type Combo as mentioned in step 1. sorry if it is getting complicated and unfortunately it is complicated..
                  Combo *rs = new Combo;
                  combo.gslist.pushback(*lgrm);
                  ..
                  ..

                  867 rs->gslist.push_back(*lgrm);
                  (/opt/SUNWspro/bin/../WS6U2/bin/sparcv9/dbx) next
                  t@1 (l@1) stopped in McastTopo::_insertToComboList at line 868 in file "McastTopo.cc"
                  868 rlist.push_back(*rs);(/opt/SUNWspro/bin/../WS6U2/bin/sparcv9/dbx) st.__node.next.data.groupName <print
                  *clist.__node->next->data.gslist.__node->next->data.groupName = {
                  __data_ = {
                  __data_ = 0x14d188 "239.2.2.2"
                  }
                  __nullref = struct __rwstd::__null_string_ref_rep<char,std::char_traits<char>,std::allocator<char>,__rwstd::__string_ref_rep<std::allocator<char> > > /* STATIC CLASS */
                  }
                  -----------------------end of debug output------------------
                  4. as you can see from above, the struct still contains the groupName with valid data content.

                  5. Now down the line we stop before we starts the thread to see if the contents have changed in the above memory location. see below. we are in a method called forkChildren()
                  ----------------------------debug output-----------------------
                  t@1 (l@1) stopped in MgroupManager::forkChildren at line 2584 in file "MgroupManager.cc"
                  2584 cout << "printing list 1 in fork children.."<<endl<<endl;
                  (/opt/SUNWspro/bin/../WS6U2/bin/sparcv9/dbx) print *clist.__node.next.data.gs >
                  *clist.__node->next->data.gslist.__node->next->data.groupName = {
                  __data_ = {
                  __data_ = 0x14d188 "239.2.2.2"
                  }
                  __nullref = struct __rwstd::__null_string_ref_rep<char,std::char_traits<char>,std::allocator<char>,__rwstd::__string_ref_rep<std::allocator<char> > > /* STATIC CLASS */
                  }
                  ----------------------output end---------------------------------
                  6. Now we will trace thru this method to see where the contents are changing. see that I am going to allocate memory for pthread_t and I check the data content of groupName inside the list before I execute this line:-
                  ---------------------------debug out start------------------
                  (/opt/SUNWspro/bin/../WS6U2/bin/sparcv9/dbx)next
                  2620 pthread_t *child = new pthread_t[maxThreads];
                  (/opt/SUNWspro/bin/../WS6U2/bin/sparcv9/dbx) print *clist.__node.next.data.gs >
                  *clist.__node->next->data.gslist.__node->next->data.groupName = {
                  __data_ = {
                  __data_ = 0x14d188 "239.2.2.2"
                  }
                  __nullref = struct __rwstd::__null_string_ref_rep<char,std::char_traits<char>,std::allocator<char>,__rwstd::__string_ref_rep<std::allocator<char> > > /* STATIC CLASS */
                  }
                  --------------------------debug output end-------------
                  7. as you can see from above it is stil intact. Now I will go ahead and execute this line and see the contents inside my clist
                  ----------------------------debug starts------------------
                  (/opt/SUNWspro/bin/../WS6U2/bin/sparcv9/dbx)next
                  2621 int *timeArray = new int[maxThreads +1];
                  (/opt/SUNWspro/bin/../WS6U2/bin/sparcv9/dbx) print *clist.__node.next.data.gs >
                  *clist.__node->next->data.gslist.__node->next->data.groupName = {
                  __data_ = {
                  __data_ = 0x14d708 ""
                  }
                  __nullref = struct __rwstd::__null_string_ref_rep<char,std::char_traits<char>,std::allocator<char>,__rwstd::__string_ref_rep<std::allocator<char> > > /* STATIC CLASS */
                  }
                  ------------------------debug ends here.----------------------

                  As you can see from above as I moved from line number 2620 to 2621, my clist memory content holding the groupName got erased. NOT SURE HOW!!

                  Please help. I am very tempted to move to char* way of doing things rather than using STL string as I have spend enough time on this.
                  Thanyou all in advance for help here.. I am frustrated!- could be a silly error on my part but I am clueless.
                  • 6. Re: memory access violation caused in string class
                    807575
                    My problems do not seem to end. I moved lot of my code from string to old char* to see if my problems disappear but it is now taking different forms. Now malloc are failing at places where it never fails. I am very sure that I have enough heap to spare . I installed the SUN recommended patch:
                    108434 -18 for string class problems. But it did not solve my problem at all. I am using -mt option. code uses localtime_r for time. I tried -v8plus option, but it did not help either. I am very sure that it is a libC related issue. Please help!!. This is really driving me nuts!!.
                    The funny part was that it used to work except for this string memory corruption and I thought if use the following good behaviors , it would go away:_
                    -mt option
                    -v8plus option
                    -use localtime_r instead of locatime
                    -get the correct patch.
                    Nothing helped!!. Finally I took out the string and moved to char*. but now it I do not have string problems but it gives me a wide variety of segmentation issues with ifstream, new() complaining :-
                    [1] t_splay(0xa0142378, 0x7987c, 0x0, 0x0, 0x0, 0x0), at 0xfe942b24
                    [2] t_delete(0x160890, 0xfe9bc008, 0x0, 0x160840, 0x1610b0, 0x48), at 0xfe9427d8
                    [3] realfree(0x160888, 0xfe9c2858, 0xfe9bc008, 0x160840, 0x49, 0x160848), at 0xfe9423dc
                    [4] cleanfree(0x0, 0xfe9bc008, 0xfe9c27cc, 0xfe9c284c, 0xfe9c27d0, 0x0), at 0xfe942cb0
                    [5] mallocunlocked(0x1, 0x0, 0xfe9bc008, 0x8, 0x0, 0x0), at 0xfe941de4
                    [6] malloc(0x1, 0x46e58, 0x0, 0x0, 0x0, 0x0), at 0xfe941cd8
                    [7] operator new(0x1, 0x9c55c, 0x0, 0x9c4ff, 0xe67e8, 0xe6710), at 0xff239c88
                    =>[8] threadExecute(inst = 0xc7850), line 195 in "MgroupManager.cc"

                    I am crying loud here.. please help!!
                    • 7. Re: memory access violation caused in string class
                      807575
                      Just to let you know I use solaris v8
                      • 8. Re: memory access violation caused in string class
                        807575
                        Something that happened to me on two different OSes, (VxWorks and Windows) that caused absolutely weird things to happen - variables changing without reason, jumping out of function calls too soon, etc. was running out of stack space. Double check your stack allocation per thread/process and verify that you aren't using any huge memory structures/arrays that are not malloced/new'ed.

                        Just a stab.
                        • 9. Re: memory access violation caused in string class
                          807575
                          Thanks for the reply,
                          It compiles & works perfectly fine on HPUX platform. Only solaris it does not run. It is case of memory corruption using the heap while allocating.
                          How do you control the stack size .. any tips??
                          • 10. Re: memory access violation caused in string class
                            807575
                            Well it doesn't sound like small stacks are to blame if malloc/new fails, however stack size is set when you create your threads. The default for Solaris threads is about 1 MB I believe, however this can be modified with a call to pthread_attr_setstacksize() before the pthread_create(). You should/must init the attr variable by calling pthread_attr_init() before trying to change any thread attributes.

                            Try this link for Sun's thread attributes topic:
                            http://docs.sun.com/app/docs/doc/806-6867/6jfpgdcnc?a=view
                            • 11. Re: memory access violation caused in string class
                              807575
                              I am doing that but it fails now in tipthread_attr_init with following errors:-
                              [1] realfree(0x161218, 0xfe9c2858, 0xfe9bc008, 0x1611d8, 0x41, 0x1611e0), at 0xfe9424ec
                              [2] mallocunlocked(0x24, 0x0, 0xfe9bc008, 0x28, 0x1611a8, 0x0), at 0xfe941f40
                              [3] malloc(0x24, 0xfe9bc008, 0x1611b0, 0xfe9bc008, 0x1, 0x0), at 0xfe941cd8
                              [4] allocattr(0x24, 0x176cc, 0x6d, 0xff239cf8, 0xfe9fc000, 0x0), at 0xfe9e426c
                              [5] tipthread_attr_init(0xffbef0e0, 0xfe9fc000, 0xff288b90, 0xff288b90, 0x0, 0xff280ac4), at 0xfe9e4944
                              =>[6] MgroupManager::forkChildren(this = 0xc7ad0), line 2660 in "MgroupManager.cc"

                              As you can see it is ALWAYS while mallocing.
                              • 12. Re: memory access violation caused in string class
                                807575
                                Well as you well know, mallocs and news should not fail like that, so my suggestion would be to step back and re-evaluate the code leading up to your allocation failure. Try commenting out large sections of code to see if it makes a difference. I know you stated that it works on HPUX, but some subtle difference between OSes could be the culprit. Sorry I couldn't be of more assistance. Make sure to post if you do find the solution.
                                • 13. Re: memory access violation caused in string class
                                  807575
                                  My same code runs on HPUX and has the exactly same data to work on like its solaris counterpart. I strongly suspect that the problem lies with the libC version. I am not sure if we need the following patch(just released):
                                  108993-43