Skip to Main Content

GoldenGate

Announcement

For appeals, questions and feedback about Oracle Forums, please email oracle-forums-moderators_us@oracle.com. Technical questions should be asked in the appropriate category. Thank you!

Interested in getting your voice heard by members of the Developer Marketing team at Oracle? Check out this post for AppDev or this post for AI focus group information.

OGG 12.3 - terminate called after throwing an instance of 'std::bad_alloc'

User_ZCQ67Jun 15 2020 — edited Jun 22 2020

Team,

I recently implemented parallel replicat in my environment and I started seeing below error in rpt file.

terminate called after throwing an instance of 'std::bad_alloc'

  what():  std::bad_alloc

I have 33G of SGA and 6G of streams_pool_size. I had 2 integrated replicat earlier and I was hitting performance issue due to large number of transactions. So I changed one integrated replicat to parallel integrate. It ran fine for a day but now it gets abended with above error after processing few trails. Can someone please help if you have come across this error? Thank you

Comments

ORASCN

Hi,

Please share the below,

Report file with the Error message in it.

Full version of OGG

Full version of DB

Regards,

Veera

ORASCN

Hi ,

On analyzing further on the below,

terminate called after throwing an instance of 'std::bad_alloc'

  what():  std::bad_alloc

The parallel process (Integrated / Parallel Integrated) reads the trail files and keeps the transactions in the memory in batches. As it applies batch by batch, it will flush the applied data from the memory. Logically, may be a huge transaction would have come and the server would have failed to allocate the requested space.

It happens that while processing some huge amount of data whcih is generated and a huge allocation is performed. There is still lots of memory available, but because memory is fragmented there is no single chunk to be found capable of storing the required amount of data. This is neatly reported by throwing std::bad_alloc.

I suspect this to be a bug in the OGG Parallel Replicat process. Please log an SR with Oracle Support to check on this.

For a try, we can increase the Streams_pool_size and allocate more memory to the Parallel Integrated Replicat process and check once if we are getting rid of this issue.

Regards,

Veera

User_ZCQ67

Report File Content:

2020-06-13 07:43:24  INFO    OGG-02232  Switching to next trail file /ogg/ggtrails/rmttrails/adwrppg1/AA003754077 at 2020-06-13 07:43:24.038899 due to EOF. with current RBA 9,998,467.

2020-06-13 07:43:24  INFO    OGG-02232  Switching to next trail file /ogg/ggtrails/rmttrails/adwrppg1/AA003754078 at 2020-06-13 07:43:24.327021 due to EOF. with current RBA 9,999,010.

2020-06-13 07:43:24  INFO    OGG-02232  Switching to next trail file /ogg/ggtrails/rmttrails/adwrppg1/AA003754079 at 2020-06-13 07:43:24.614270 due to EOF. with current RBA 9,999,019.

2020-06-13 07:43:24  INFO    OGG-02232  Switching to next trail file /ogg/ggtrails/rmttrails/adwrppg1/AA003754080 at 2020-06-13 07:43:24.902967 due to EOF. with current RBA 9,999,276.

2020-06-13 07:43:25  INFO    OGG-02232  Switching to next trail file /ogg/ggtrails/rmttrails/adwrppg1/AA003754081 at 2020-06-13 07:43:25.192873 due to EOF. with current RBA 9,999,103.

2020-06-13 07:43:25  INFO    OGG-02232  Switching to next trail file /ogg/ggtrails/rmttrails/adwrppg1/AA003754082 at 2020-06-13 07:43:25.485184 due to EOF. with current RBA 9,999,796.

2020-06-13 07:43:25  INFO    OGG-02232  Switching to next trail file /ogg/ggtrails/rmttrails/adwrppg1/AA003754083 at 2020-06-13 07:43:25.776070 due to EOF. with current RBA 9,999,082.

2020-06-13 07:43:26  INFO    OGG-02232  Switching to next trail file /ogg/ggtrails/rmttrails/adwrppg1/AA003754084 at 2020-06-13 07:43:26.076796 due to EOF. with current RBA 9,998,799.

2020-06-13 07:43:26  INFO    OGG-02232  Switching to next trail file /ogg/ggtrails/rmttrails/adwrppg1/AA003754085 at 2020-06-13 07:43:26.381597 due to EOF. with current RBA 9,998,550.

2020-06-13 07:43:26  INFO    OGG-02232  Switching to next trail file /ogg/ggtrails/rmttrails/adwrppg1/AA003754086 at 2020-06-13 07:43:26.685572 due to EOF. with current RBA 9,999,027.

terminate called after throwing an instance of 'std::bad_alloc'

  what():  std::bad_alloc

User_ZCQ67

OGG Version -  OGGCORE_12.3.0.1.0_180415.0359

DB Version -  12.2.0.1.0

User_ZCQ67

I am using below parameters for parallel replicat:

MAP_PARALLELISM 2

MIN_APPLY_PARALLELISM 2

MAX_APPLY_PARALLELISM 10

I am not using SPLIT_TRANS_RECS, do you think this is making difference here?

ORASCN

I couldn't see this reported in ERROR? This is just an INFO message.

std::bad_alloc is a type of exception and may be it is getting printed in the report file. Do you see any considerable lag in the Parallel IR due to this?

Regards,

Veera

ORASCN

Yes, you can try that as it splits the huge transactions in to multiple chunks. Also, I would suggest you to apply the latest patch available for your OGG 12.3.

Regards,

Veera

User_ZCQ67

I agree that it is kind of INFO in rpt file but replicat gets abended soon after this info appear in rpt file.

I have just added SPLIT_TRANS_RECS 1000 in parameter file. I will monitor to see if that helpful. I will also apply latest patch of OGG 12.3

ORASCN

Sure.. That's the reason why I requested for full report file to see when and where is it is getting abended

Please monitor it and let me know for any issues..

Cheers,

Veera

User_ZCQ67

When I used SPLIT_TRANS_RECS value as 1000 my replicat  was initially very slow and then got abended with below error:

ERROR   OGG-01029  Oracle GoldenGate Delivery for Oracle, adwr1.prm.backup:  Extract reposition err -

After which I re-positioned the replicat seqno and rba accordingly but it was still processing very slow. Out of curiosity, I increased it to 5000 and again replicat abended with "OGG-01668". I changed this value to 3000 then 2000 as well but it continued to abend with same error code. OGG-01668.

2020-06-17T01:54:38.518+0100  ERROR   OGG-01668  Oracle GoldenGate Delivery for Oracle, adwr1.prm:  PROCESS ABENDING.

I dont see any further detail in rpt or ggserr log to investigate whats causing this OGG error.

However I have currently set up SPLIT_TRANS_RECS value to 500, so far it is running but comparatively very slow than integrated replicat. Bit doubtful if any misconfiguration is causing the slowness or we are any additional parameter to include?

I am using BATCHSQL Mode with below parameters:

BATCHSQL BATCHESPERQUEUE 300, OPSPERBATCH 2000

MAP_PARALLELISM 2

MIN_APPLY_PARALLELISM 2

MAX_APPLY_PARALLELISM 10

SPLIT_TRANS_RECS 500

ORASCN

OGG-01029 is mostly because replicate is unable to position itself to the trail record from checkpoint file.

Remove or comment out the below parameter,

BATCHSQL BATCHESPERQUEUE 300, OPSPERBATCH 2000

Parallel Replicat got introduced in OGG 12.3, but had many bugs in it initially. But now Oracle is asking the customers to go for Parallel Replicat as it got stable. But still need to test that.

Regards,

Veera

User_ZCQ67

Hi Veera,

Paralle integrated replicat started hitting below error:

2020-06-22 05:17:52  INFO    OGG-02232  Switching to next trail file /ogg/ggtrails/rmttrails/adwrppg1/AA003865143 at 2020-06-22 05:17:52.867380 due to EOF. with current RBA 9,999,653.

Source Context :

  SourceModule            : [er.replicat.coord.master]

  SourceID                : [/scratch/aime/adestore/views/aime_adc00jza/oggcore/OpenSys/src/app/er/replicat/coord/Master.cpp]

  SourceMethod            : [StartWorker]

  SourceLine              : [544]

  ThreadBacktrace         : [14] elements

                          : [/ogg/oggbase/product/ogg/G12.3/O12.1/libgglog.so(CMessageContext::AddThreadContext())]

                          : [/ogg/oggbase/product/ogg/G12.3/O12.1/libgglog.so(CMessageFactory::CreateMessage(CSourceContext*, unsigned int, ...))]

                          : [/ogg/oggbase/product/ogg/G12.3/O12.1/libgglog.so(_MSG_String_String(CSourceContext*, int, char const*, char const*, CMessageFactory::MessageDisposition)À]

                          : [/ogg/oggbase/product/ogg/G12.3/O12.1/replicat(ggs::Coord::MasterWithWorkerList<ggs::Coord::SchedulerApplyThread>::CreateWorker(int))]

                          : [/ogg/oggbase/product/ogg/G12.3/O12.1/replicat(ggs::Coord::MasterWithWorkerList<ggs::Coord::SchedulerApplyThread>::CreateWorker(int))]

                          : [/ogg/oggbase/product/ogg/G12.3/O12.1/replicat(ggs::Coord::Scheduler::ActivateApplier())]

                          : [/ogg/oggbase/product/ogg/G12.3/O12.1/replicat(ggs::Coord::Scheduler::CheckUpdateParams())]

                          : [/ogg/oggbase/product/ogg/G12.3/O12.1/replicat(ggs::Coord::Scheduler::MainLoopIter())]

                          : [/lib64/libpthread.so.0()]

                          : [/ogg/oggbase/product/ogg/G12.3/O12.1/replicat(ggs::Coord::GroupController::Run())]

                          : [/ogg/oggbase/product/ogg/G12.3/O12.1/replicat(ggs::Coord::Scheduler::Main(void*))]

                          : [/ogg/oggbase/product/ogg/G12.3/O12.1/replicat(ggs::gglib::MultiThreading::Thread::RunThread(ggs::gglib::MultiThreading::Thread::ThreadArgs*))]

                          : [/lib64/libpthread.so.0()]

                          : [/lib64/libc.so.6(clone)]

2020-06-22 05:21:50  ERROR   OGG-06000  Replicat Coordinator failed to start Replicat thread ADWR1A01.  Reason Cannot create process '/ogg/oggbase/product/ogg/G12.3/O12.1/replicat'. fork() failed creating new process.

ORASCN

Hey,

Whether the restart worked?

Also, Please check the OS parameters ulimit values of nproc

Regards,

Veera

User_ZCQ67

Hi Veera,

restart worked but it failed again with similar for another Replicat thread:

Source Context :

  SourceModule            : [er.replicat.coord.master]

  SourceID                : [/scratch/aime/adestore/views/aime_adc00jza/oggcore/OpenSys/src/app/er/replicat/coord/Master.cpp]

  SourceMethod            : [StartWorker]

  SourceLine              : [544]

  ThreadBacktrace         : [14] elements

                          : [/ogg/oggbase/product/ogg/G12.3/O12.1/replicat(ggs::Coord::Scheduler::Main(void*))]

                          : [/ogg/oggbase/product/ogg/G12.3/O12.1/libgglog.so(CMessageFactory::CreateMessage(CSourceContext*, unsigned int, ...))]

                          : [/ogg/oggbase/product/ogg/G12.3/O12.1/libgglog.so(_MSG_String_String(CSourceContext*, int, char const*, char const*, CMessageFactory::MessageDisposition)À]

                          : [/ogg/oggbase/product/ogg/G12.3/O12.1/replicat(ggs::Coord::MasterWithWorkerList<ggs::Coord::SchedulerApplyThread>::StartWorker(int))]

                          : [/ogg/oggbase/product/ogg/G12.3/O12.1/replicat(ggs::Coord::MasterWithWorkerList<ggs::Coord::SchedulerApplyThread>::CreateWorker(int))]

                          : [/ogg/oggbase/product/ogg/G12.3/O12.1/replicat(ggs::Coord::Scheduler::Main(void*))]

                          : [/ogg/oggbase/product/ogg/G12.3/O12.1/replicat(ggs::Coord::Scheduler::Main(void*))]

                          : [/ogg/oggbase/product/ogg/G12.3/O12.1/replicat(ggs::Coord::Scheduler::Main(void*))]

                          : [/lib64/libpthread.so.0()]

                          : [/ogg/oggbase/product/ogg/G12.3/O12.1/replicat(ggs::Coord::Scheduler::Main(void*))]

                          : [/ogg/oggbase/product/ogg/G12.3/O12.1/replicat(ggs::Coord::Scheduler::Main(void*))]

                          : [/ogg/oggbase/product/ogg/G12.3/O12.1/replicat(ggs::gglib::MultiThreading::Thread::RunThread(ggs::gglib::MultiThreading::Thread::ThreadArgs*))]

                          : [/lib64/libpthread.so.0()]

                          : [/lib64/libc.so.6(clone)]

2020-06-22 07:15:23  ERROR   OGG-06000  Replicat Coordinator failed to start Replicat thread ADWR1A04.  Reason Cannot create process '/ogg/oggbase/product/ogg/G12.3/O12.1/replicat'. fork() failed creating new process.

ulimit and vm map count is mentioned below:

[ggownerXXXXXXX:/etc]$ cat /proc/sys/vm/max_map_count

65530

[ggownerXXXXXXX:/etc]$ ulimit -s

8192

ORASCN

Hey,

Could you please provide the output of ulimit -a

check the nproc value in it

Regards,

Veera

User_ZCQ67

[ggowner@XXXXXXX:/ogg/oggbase/product/ogg/G12.3/O12.1]$ ulimit -a

address space limit (Kibytes)  (-M)  unlimited

core file size (blocks)        (-c)  0

cpu time (seconds)             (-t)  unlimited

data size (Kibytes)            (-d)  unlimited

file size (blocks)             (-f)  unlimited

locks                          (-x)  unlimited

locked address space (Kibytes) (-l)  64

message queue size (Kibytes)   (-q)  800

nice                           (-e)  0

nofile                         (-n)  1024

nproc                          (-u)  4096

pipe buffer size (bytes)       (-p)  4096

max memory size (Kibytes)      (-m)  unlimited

rtprio                         (-r)  0

socket buffer size (bytes)     (-b)  4096

sigpend                        (-i)  127973

stack size (Kibytes)           (-s)  8192

swap size (Kibytes)            (-w)  not supported

threads                        (-T)  not supported

process size (Kibytes)         (-v)  unlimited

ORASCN

Can you try increasing the nproc value to a greater number and check if the issue is persisting.

Regards,

Veera

User_ZCQ67

Hi Veera,

increasing nproc will be a very complex process for me due to server level changes restriction. It may take from couple of days upto a week.

Rather than increasing the limit of system processes and resources through nproc, is there any way to limit the demand from integrated parallel replicat so that it can accommodate available processes and resources?

It keep failing frequently for different-2 replicat threads after restart and processing is also very slow.

ORASCN

Could you please share the current parameter file of the replicat process.

Regards,

Veera

ORASCN

Did you comment out or removed the below parameter?

BATCHSQL BATCHESPERQUEUE 300, OPSPERBATCH 2000

Also, please reduce the MAX_APPLY_PARALLELISM to 6 and then check if you are hitting with the same issue.

Regards,

Veera

User_ZCQ67

I am not sure how to attach a file in this portal hence I am providing content here.

GGSCI (XXXXXXX) 14> view param ADWR1

REPLICAT adwr1

MAP_PARALLELISM 2

MIN_APPLY_PARALLELISM 2

MAX_APPLY_PARALLELISM 10

SPLIT_TRANS_RECS 1000

include ./dirprm/ext_env_peg15p.inc

--include ./dirmac/exception_tab.mac

DISCARDFILE ./dirrpt/adwr1.dsc, APPEND, MEGABYTES 500

--SOURCECHARSET PASSTHRU

DDL INCLUDE MAPPED

--EXCLUDE INSTR 'ALTER TABLE CLM_OWNER.CONS_MKTG_HISTORY DROP PARTITION' &

--EXCLUDE INSTR 'ALTER TABLE CLM_OWNER.CONS_MKTG_HISTORY ADD PARTITION'

DDLOPTIONS REPORT

DDLERROR DEFAULT IGNORE

--BATCHSQL

--INSERTMISSINGUPDATES

--HANDLECOLLISIONS

REPERROR (DEFAULT, IGNORE)

--REPERROR (DEFAULT, EXCEPTION)

--REPERROR (DEFAULT, IGNORE)

MAP SMV_OWNER.SVOC_SERVICE_PRODUCT_NRT, TARGET CUS_OMNI_DATA.SVOC_SERVICE_PRODUCT;

MAP SMV_OWNER.SVOC_CONV_KEY_MATCHING, TARGET CUS_OMNI_DATA.SVOC_CONV_KEY_MATCHING;

MAP SMV_OWNER.SVOC_CONVERGED_CUSTOMER, TARGET CUS_OMNI_DATA.SVOC_CONVERGED_CUSTOMER;

MAP SMV_OWNER.SVOC_PRODUCT_HOLDING_NRT, TARGET CUS_OMNI_DATA.SVOC_PRODUCT_HOLDING;

MAP SMV_OWNER.SVOC_ACCOUNT_NRT, TARGET CUS_OMNI_DATA.SVOC_ACCOUNT;

User_ZCQ67

Hi Veera,

I have commented BATCHSQL when you mentioned last week.

I have reduced MAX_APPLY_PARALLELISM to 5, I will monitor and let you know. but my overall experience with integrated parallel is not good so far. It hardly processes 400-500 trails in a hour and I expect it process 3000-5000 trails per hour.

ORASCN

I would suggest you, instead of going to Parallel Integrated, try the Parallel Non-Integrated.

But after reducing the parallelism I believe, you should not face any issues. Let us monitor it.

Regards,

Veera

1 - 23

Post Details

Added on Jun 15 2020
23 comments
1,435 views