Forum Stats

  • 3,852,479 Users
  • 2,264,108 Discussions
  • 7,905,079 Comments

Discussions

C++ Extend Client - Request Time-out - How to recover

JoeHolder
JoeHolder Member Posts: 54
edited Dec 20, 2017 4:31PM in Coherence Support

We normally set our request time-out to 5 seconds.  We do this via the extend xml config file.

We have some aggregator calls that can take longer than 5 seconds across large caches.

Therefore for these calls we want to make the request time out longer - say 30 seconds.  We then want to revert the time out to 5 seconds, so that problems are detected early.

In order to test this we have created an aggregator that 'wastes time' so that we simulate a very large cache in our unit tests.

In building the unit test I have got it so that I set the time out to 5 seconds and then call the time-wasting-aggregator and then have the call time - out

e.g. - see stack trace below -

However I then find that all subsequent calls to coherence also timeout - calls that are to regular caches, aggegators etc, which should not timeout.

I need to be able to recover from a time-out and continue using coherence without restarting the process - can you help?

Thanks

Joe

[2017-12-15 11:59:35.369] <D9> (thread=Thread-1)  -->Throwing: coherence::net::RequestTimeoutException: request timed out after 5000 millis

    at class coherence::lang::TypedHandle<class coherence::net::messaging::Response> __thiscall coherence::component::net::extend::AbstractPofRequest::Status::getResponse(void)(AbstractPofRequest.cpp:203)

    at class coherence::lang::TypedHandle<class coherence::net::messaging::Response> coherence::component::net::extend::AbstractPofRequest::Status::getResponse(void)

    at class coherence::lang::TypedHandle<class coherence::net::messaging::Response> coherence::component::net::extend::AbstractPofRequest::Status::waitForResponse(__int64)

    at class coherence::lang::TypedHolder<class coherence::lang::Object> coherence::component::net::extend::PofChannel::request(class coherence::lang::TypedHandle<class coherence::net::messaging::Request>,__int64)

    at class coherence::lang::TypedHolder<class coherence::lang::Object> coherence::component::net::extend::PofChannel::request(class coherence::lang::TypedHandle<class coherence::net::messaging::Request>)

    at class coherence::lang::TypedHolder<class coherence::lang::Object> coherence::component::net::extend::RemoteNamedCache::BinaryCache::aggregate(class coherence::lang::TypedHandle<class coherence::util::Filter const >,class coherence::lang::TypedHandle<class coherence::util::InvocableMap::EntryAggregator>)

    at class coherence::lang::TypedHolder<class coherence::lang::Object> coherence::util::ConverterCollections::ConverterInvocableMap::aggregate(class coherence::lang::TypedHandle<class coherence::util::Filter const >,class coherence::lang::TypedHandle<class coherence::util::InvocableMap::EntryAggregator>)

    at class coherence::lang::TypedHolder<class coherence::lang::Object> coherence::util::ConverterCollections::ConverterNamedCache::aggregate(class coherence::lang::TypedHandle<class coherence::util::Filter const >,class coherence::lang::TypedHandle<class coherence::util::InvocableMap::EntryAggregator>)

    at class coherence::lang::TypedHolder<class coherence::lang::Object> coherence::component::net::extend::RemoteNamedCache::aggregate(class coherence::lang::TypedHandle<class coherence::util::Filter const >,class coherence::lang::TypedHandle<class coherence::util::InvocableMap::EntryAggregator>)

    at class coherence::lang::TypedHolder<class coherence::lang::Object> coherence::component::util::SafeNamedCache::aggregate(class coherence::lang::TypedHandle<class coherence::util::Filter const >,class coherence::lang::TypedHandle<class coherence::util::InvocableMap::EntryAggregator>)

    at transactions::dsp::grid::common::call_test_time_wasting_aggegator_impl

    at transactions::dsp::grid::test_conversation_aggregators_t::call_test_time_wasting_aggregator

    at transactions::dsp::grid::test_conversation_aggregators_t::get_orphan_conversation_ids

    at transactions::dsp::housekeeping::expiry_mgr_t::do_orphan_conversation_expiry

    at transactions::dsp::housekeeping::expiry_mgr_t::do_maintenance_window_expiry_jobs

    at transactions::dsp::housekeeping::expiry_mgr_t::performCacheExpiry

    at std::_Pmf_wrap<void (__thiscall transactions::dsp::housekeeping::expiry_mgr_t::*)(std::shared_ptr<transactions::dsp::housekeeping::active_schedule_t>),void,transactions::dsp::housekeeping::expiry_mgr_t,std::shared_ptr<transactions::dsp::housekeeping::active_schedule_t>,std::_Nil,std::_Nil,std::_Nil,std::_Nil,std::_Nil,std::_Nil>::operator()

    at std::_Bind<1,void,std::_Pmf_wrap<void (__thiscall transactions::dsp::housekeeping::expiry_mgr_t::*)(std::shared_ptr<transactions::dsp::housekeeping::active_schedule_t>),void,transactions::dsp::housekeeping::expiry_mgr_t,std::shared_ptr<transactions::dsp::housekeeping::active_schedule_t>,std::_Nil,std::_Nil,std::_Nil,std::_Nil,std::_Nil,std::_Nil>,transactions::dsp::housekeeping::expiry_mgr_t * const,std::shared_ptr<transactions::dsp::housekeeping::active_schedule_t> &,std::_Nil,std::_Nil,std::_Nil,std::_Nil,std::_Nil>::operator()

    at std::_Callable_obj<std::_Bind<1,void,std::_Pmf_wrap<void (__thiscall transactions::dsp::housekeeping::expiry_mgr_t::*)(std::shared_ptr<transactions::dsp::housekeeping::active_schedule_t>),void,transactions::dsp::housekeeping::expiry_mgr_t,std::shared_ptr<transactions::dsp::housekeeping::active_schedule_t>,std::_Nil,std::_Nil,std::_Nil,std::_Nil,std::_Nil,std::_Nil>,transactions::dsp::housekeeping::expiry_mgr_t * const,std::shared_ptr<transactions::dsp::housekeeping::active_schedule_t> &,std::_Nil,std::_Nil,std::_Nil,std::_Nil,std::_Nil>,0>::_ApplyX<void>

    at std::_Func_impl<std::_Callable_obj<std::_Bind<1,void,std::_Pmf_wrap<void (__thiscall transactions::dsp::housekeeping::expiry_mgr_t::*)(std::shared_ptr<transactions::dsp::housekeeping::active_schedule_t>),void,transactions::dsp::housekeeping::expiry_mgr_t,std::shared_ptr<transactions::dsp::housekeeping::active_schedule_t>,std::_Nil,std::_Nil,std::_Nil,std::_Nil,std::_Nil,std::_Nil>,transactions::dsp::housekeeping::expiry_mgr_t * const,std::shared_ptr<transactions::dsp::housekeeping::active_schedule_t> &,std::_Nil,std::_Nil,std::_Nil,std::_Nil,std::_Nil>,0>,std::allocator<std::_Func_class<void,std::_Nil,std::_Nil,std::_Nil,std::_Nil,std::_Nil,std::_Nil,std::_Nil> >,void,std::_Nil,std::_Nil,std::_Nil,std::_Nil,std::_Nil,std::_Nil,std::_Nil>::_Do_call

    at std::_Func_class<void,std::_Nil,std::_Nil,std::_Nil,std::_Nil,std::_Nil,std::_Nil,std::_Nil>::operator()

    at transactions::utilb::async_io_t::on_timer

    at transactions::utilb::async_io_timer_t::_on_timer

    at boost::_mfi::mf2<void,transactions::utilb::async_io_timer_t,boost::system::error_code const &,std::shared_ptr<transactions::utilb::async_io_timer_t> >::operator()

    at boost::_bi::list3<boost::_bi::value<transactions::utilb::async_io_timer_t *>,boost::arg<1>,boost::_bi::value<std::shared_ptr<transactions::utilb::async_io_timer_t> > >::operator()<boost::_mfi::mf2<void,transactions::utilb::async_io_timer_t,boost::system::error_code const &,std::shared_ptr<transactions::utilb::async_io_timer_t> >,boost::_bi::list1<boost::system::error_code const &> >

    at boost::_bi::bind_t<void,boost::_mfi::mf2<void,transactions::utilb::async_io_timer_t,boost::system::error_code const &,std::shared_ptr<transactions::utilb::async_io_timer_t> >,boost::_bi::list3<boost::_bi::value<transactions::utilb::async_io_timer_t *>,boost::arg<1>,boost::_bi::value<std::shared_ptr<transactions::utilb::async_io_timer_t> > > >::operator()<boost::system::error_code>

    at boost::asio::detail::binder1<boost::_bi::bind_t<void,boost::_mfi::mf2<void,transactions::utilb::async_io_timer_t,boost::system::error_code const &,std::shared_ptr<transactions::utilb::async_io_timer_t> >,boost::_bi::list3<boost::_bi::value<transactions::utilb::async_io_timer_t *>,boost::arg<1>,boost::_bi::value<std::shared_ptr<transactions::utilb::async_io_timer_t> > > >,boost::system::error_code>::operator()

    at boost::asio::asio_handler_invoke<boost::asio::detail::binder1<boost::_bi::bind_t<void,boost::_mfi::mf2<void,transactions::utilb::async_io_timer_t,boost::system::error_code const &,std::shared_ptr<transactions::utilb::async_io_timer_t> >,boost::_bi::list3<boost::_bi::value<transactions::utilb::async_io_timer_t *>,boost::arg<1>,boost::_bi::value<std::shared_ptr<transactions::utilb::async_io_timer_t> > > >,boost::system::error_code> >

    at boost_asio_handler_invoke_helpers::invoke<boost::asio::detail::binder1<boost::_bi::bind_t<void,boost::_mfi::mf2<void,transactions::utilb::async_io_timer_t,boost::system::error_code const &,std::shared_ptr<transactions::utilb::async_io_timer_t> >,boost::_bi::list3<boost::_bi::value<transactions::utilb::async_io_timer_t *>,boost::arg<1>,boost::_bi::value<std::shared_ptr<transactions::utilb::async_io_timer_t> > > >,boost::system::error_code>,boost::_bi::bind_t<void,boost::_mfi::mf2<void,transactions::utilb::async_io_timer_t,boost::system::error_code const &,std::shared_ptr<transactions::utilb::async_io_timer_t> >,boost::_bi::list3<boost::_bi::value<transactions::utilb::async_io_timer_t *>,boost::arg<1>,boost::_bi::value<std::shared_ptr<transactions::utilb::async_io_timer_t> > > > >

    at boost::asio::detail::wait_handler<boost::_bi::bind_t<void,boost::_mfi::mf2<void,transactions::utilb::async_io_timer_t,boost::system::error_code const &,std::shared_ptr<transactions::utilb::async_io_timer_t> >,boost::_bi::list3<boost::_bi::value<transactions::utilb::async_io_timer_t *>,boost::arg<1>,boost::_bi::value<std::shared_ptr<transactions::utilb::async_io_timer_t> > > > >::do_complete

    at boost::asio::detail::win_iocp_operation::complete

    at boost::asio::detail::win_iocp_io_service::do_one

    at boost::asio::detail::win_iocp_io_service::run

    at boost::asio::io_service::run

    at boost::_mfi::mf0<unsigned int,boost::asio::io_service>::operator()

    at boost::_bi::list1<boost::_bi::value<boost::asio::io_service *> >::operator()<unsigned int,boost::_mfi::mf0<unsigned int,boost::asio::io_service>,boost::_bi::list0>

    at boost::_bi::bind_t<unsigned int,boost::_mfi::mf0<unsigned int,boost::asio::io_service>,boost::_bi::list1<boost::_bi::value<boost::asio::io_service *> > >::operator()

    at boost::detail::thread_data<boost::_bi::bind_t<unsigned int,boost::_mfi::mf0<unsigned int,boost::asio::io_service>,boost::_bi::list1<boost::_bi::value<boost::asio::io_service *> > > >::run

    at boost::`anonymous namespace'::thread_start_function

    at _callthreadstartex

    at _threadstartex

    at BaseThreadInitThunk

    at RtlInitializeExceptionChain

    at RtlInitializeExceptionChain

    on thread "Thread-1"

Answers

  • Mfalco-Oracle
    Mfalco-Oracle Member Posts: 503
    edited Dec 15, 2017 11:21AM

    Hi Joe,

    So I assume you have an aggregator which either spins or sleeps.  When we timeout an operation that just means that the client which invoked the operation regains control, it does not mean that the long running operation has actually been interrupted.  As such even after the aggregator times out (from the client perspective), it is still using resources server side.  Depending on how your proxies and caches are configured there may not be available threads to process your subsequent operations since the aggregator is still running and thus they will timeout as well.  If you want your long running aggregator to be interrupted it must implement the PriorityTask interface and the value returned from getExecutionTimeoutMillis will determine how long it can run before we interrupt it server side.  Also ensure you have the dynamic thread pool enabled on the proxy and cache servers so that there are free threads to process requests concurrently.

    thanks,

    mark

  • JoeHolder
    JoeHolder Member Posts: 54
    edited Dec 18, 2017 8:28AM

    Hi Mark,

    I think that the problem might have been that we were using cache references that predated the last calls to shutdown() and configure().

    (That is coherence::net::CacheFactory::shutdown() and coherence::net::CacheFactory::configure(cache_config_xml_handle, operational_config_xml_handle))

    We want to change the request timeout from 5 seconds to 30 seconds for the duration of calls to these aggregators, and then change it back to 5 seconds again.

    The only way we know of doing that is to call shutdown() and then configure() again with a different piece of XML.

    However doing it this way invalidates all the open caches we have in the program.

    Is there a better way of changing the request timeout without having to call shutdown() or configure() (are we right in thinking we have to call shutdown() before configure()?

    Thanks

    Joe

  • Mfalco-Oracle
    Mfalco-Oracle Member Posts: 503
    edited Dec 18, 2017 10:39AM

    Hi Joe,

    Your idea of invoking CacheFactory::shutdown, followed by reconfiguring seems as though it should have worked, but that approach has an awfully high price as you end up having to establish new connections to the proxy each time you reconfigure things.  Also if your application has multiple threads accessing these cache references then it would seem difficult to orchestrate when it is safe to do the shutdown.  An alternate version of that approach would be to use the CacheFactory singleton as your "default" means of obtaining a cache, but also have an auxiliary instance of DefaultConfigurableCacheFactory which has been configured with the longer timeout.  You would then essentially have two independent connections into the Coherence cluster, each with their own independent configurations.

    If you are using the 12.2.1 or later Coherence C++ client there is a potentially more elegant option to consider, our new COH_TIMEOUT_AFTER construct, see coherence/lang/TimeoutBlock.hpp for full details.  Basically this lets you define a custom timeout for a given block of code, if the block cannot complete within the timeout specified for the block then the calling thread will be interrupted.  Note if you specify a shorter timeout via configuration that will still be honored, so you want the configured timeout to be longer.

    try

        {

        COH_TIMEOUT_AFTER(5000)

            {

            cache->get("foo");

            }

        }

    catch (InterruptedException::View vex)

        {

        // thread was interrupted due to timeout or manual interrupt

        }

    A final thing to consider is why aggregations are taking 30s to begin with.  This certainly seems to suggest that you are either pulling back a massive amount of data, or missing an index in the associated query, adding the index should allow the aggregation to complete much faster.

    thanks,

    Mark

  • JoeHolder
    JoeHolder Member Posts: 54
    edited Dec 19, 2017 11:46AM

    Hi Mark,

    It is taking about 7 seconds and the timeout is 5 seconds ordinarily. 

    How about using this https://docs.oracle.com/middleware/1221/coherence/cplus-reference/class_priority_aggregator.html

    It seems to be what we want...

    Thanks

    Joe

  • Mfalco-Oracle
    Mfalco-Oracle Member Posts: 503
    edited Dec 19, 2017 12:38PM

    Hi Joe,

    Yes, PriorityAggregator implements PriorityTask and would fit the bill.

    thanks,

    Mark

    Oracle Coherence

  • JoeHolder
    JoeHolder Member Posts: 54
    edited Dec 20, 2017 5:19AM

    Hi Mark,

    I have tried using the priority aggregator in my unit test - but it is still timing out after 5 seconds -

    aggregator::AbstractAggregator::Handle hAggregator = test_time_wasting_aggregator_t::create(millisecs_to_waste);

    aggregator::PriorityAggregator::Handle aggrPriority = aggregator::PriorityAggregator::create(hAggregator);

    aggrPriority->setExecutionTimeoutMillis(30000L);   //PriorityTask::timeout_none

    aggrPriority->setRequestTimeoutMillis(30000L); //PriorityTask::timeout_none

    Object::Holder rawObjectHolder = conversation_cache_handle->aggregate((Filter::View) hFilter, aggrPriority);

    The aggregator class is defined as below:

    class test_time_wasting_aggregator_t : public coherence::lang::class_spec<test_time_wasting_aggregator_t, coherence::lang::extends<coherence::util::aggregator::AbstractAggregator>,        

                                                                   ::coherence::lang::implements<::coherence::io::pof::PortableObject> >

    {

        ....

    Any ideas why it is not working?

    Thanks

    Joe

  • P Fry-Oracle
    P Fry-Oracle Member Posts: 80 Employee
    edited Dec 20, 2017 4:31PM

    Hi Joe,

    I am investigating the PriorityAggregator issue.  Will get back to you soon.

    Thanks,

    Patrick

This discussion has been closed.