6 Replies Latest reply: Sep 20, 2013 4:57 PM by GregoryG. RSS

    UEK3/Beta 1 - Using USDT stucks mysqld to 100% CPU

    GregoryG.

      Hello,

       

      To follow up with some DTrace testing, I've enabled USDT with MySQL 5.6.13  CE on Oracle Linux 6.4 with UEK3/DTrace and this is what I get:

       

      1/- To install MySQL:

      • I've changed the cmake/dtrace.cmake file as below:
      git diff dtrace.cmake
      diff --git a/dtrace.cmake b/dtrace.cmake
      index 771a28b..5539368 100644
      --- a/dtrace.cmake
      +++ b/dtrace.cmake
      @@ -93,7 +93,7 @@ FUNCTION(DTRACE_INSTRUMENT target)
          ADD_DEPENDENCIES(${target} gen_dtrace_header)
      
          # Invoke dtrace to generate object file and link it together with target.
      -    IF(CMAKE_SYSTEM_NAME MATCHES "SunOS")
      +    IF(CMAKE_SYSTEM_NAME MATCHES "SunOS" OR CMAKE_SYSTEM_NAME MATCHES "Linux")
            SET(objdir ${CMAKE_CURRENT_BINARY_DIR}/CMakeFiles/${target}.dir)
            SET(outfile ${objdir}/${target}_dtrace.o)
            GET_TARGET_PROPERTY(target_type ${target} TYPE)
      @@ -151,7 +151,7 @@ ENDFUNCTION()
      # run them again through dtrace -G to generate an ELF file that links
      # to mysqld.
      MACRO (DTRACE_INSTRUMENT_STATIC_LIBS target libs)
      -IF(CMAKE_SYSTEM_NAME MATCHES "SunOS" AND ENABLE_DTRACE)
      +IF((CMAKE_SYSTEM_NAME MATCHES "SunOS" OR CMAKE_SYSTEM_NAME MATCHES "Linux") AND ENABLE_DTRACE)
        # Filter out non-static libraries in the list, if any
        SET(static_libs)
        FOREACH(lib ${libs})
      

       

      2/- Probes are available from mysqld:

      dtrace -c /usr/local/mysql/bin/mysqld -l |grep "query-exec-start"
        123 mysql21269            mysqld _ZN13sp_instr_stmt9exec_coreEP3THDPj query-exec-start
        124 mysql21269            mysqld _Z11mysql_parseP3THDPcjP12Parser_state query-exec-start
        125 mysql21269            mysqld _ZN18Prepared_statement7executeEP6Stringb query-exec-start
        126 mysql21269            mysqld _Z17mysql_open_cursorP3THDP13select_resultPP18Server_side_cursor query-exec-start
        923 mysql21349            mysqld _ZN13sp_instr_stmt9exec_coreEP3THDPj query-exec-start
        924 mysql21349            mysqld _Z11mysql_parseP3THDPcjP12Parser_state query-exec-start
        925 mysql21349            mysqld _ZN18Prepared_statement7executeEP6Stringb query-exec-start
        926 mysql21349            mysqld _Z17mysql_open_cursorP3THDP13select_resultPP18Server_side_cursor query-exec-start
      

       

      3/- MySQL is working; however when we start DTrace

      ps -fu mysql
      UID        PID  PPID  C STIME TTY          TIME CMD
      mysql    21269 21166  0 21:38 pts/3    00:00:01 /usr/local/mysql/bin/mysqld --basedir=/usr/
      
      # dtrace -n :mysqld::'{ trace(arg0); }'
      dtrace: description ':mysqld::' matched 162 probes
      

       

      It becomes impossible to use the server; one of the mysqld thread uses 100% CPU like in the top below:

      top - 21:59:34 up  3:56,  4 users,  load average: 1.00, 0.98, 1.09
      Tasks:  22 total,  1 running,  21 sleeping,  0 stopped,  0 zombie
      Cpu(s):  0.0%us, 50.2%sy,  0.0%ni, 49.7%id,  0.0%wa,  0.0%hi,  0.0%si,  0.2%st
      Mem:  2040288k total,  1654172k used,  386116k free,    26216k buffers
      Swap:  4194300k total,    2624k used,  4191676k free,  878564k cached
      
      
        PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND                                                             
      21362 mysql    20  0 1013m 438m 5616 R 100.2 22.0  4:56.39 mysqld                                                             
      21269 mysql    20  0 1013m 438m 5616 S  0.0 22.0  0:00.81 mysqld                                                             
      21273 mysql    20  0 1013m 438m 5616 S  0.0 22.0  0:00.01 mysqld                                                             
      21274 mysql    20  0 1013m 438m 5616 S  0.0 22.0  0:00.02 mysqld                                                             
      21275 mysql    20  0 1013m 438m 5616 S  0.0 22.0  0:00.01 mysqld                                                             
      
      
      
      
      

       

      Trying to strace to process just kills it.

       

      It might be that the build process is just plain wrong considering USDT use is not documented yet. However, you might want to have a look.

       

      Best Regards,

       

      Gregory

        • 1. Re: UEK3/Beta 1 - Using USDT stucks mysqld to 100% CPU
          GregoryG.

          Hello,

          For more information, below is the strace output when I attach to the process:

          # strace -f -p 21269 
          Process 21269 attached with 22 threads - interrupt to quit
          [pid 21293] rt_sigtimedwait([HUP QUIT ALRM TERM TSTP], NULL, NULL, 8 <unfinished ...>[pid 21292] restart_syscall(<... resuming interrupted call ...> <unfinished ...>[pid 21291] restart_syscall(<... resuming interrupted call ...> <unfinished ...>[pid 21290] futex(0x1688b84, FUTEX_WAIT_PRIVATE, 1, NULL <unfinished ...>[pid 21362] --- SIGILL (Illegal instruction) @ 0 (0) ---
          [pid 21289] select(0, NULL, NULL, NULL, {0, 2862} <unfinished ...>[pid 21288] futex(0x16889d4, FUTEX_WAIT_PRIVATE, 3, NULL <unfinished ...>[pid 21287] select(0, NULL, NULL, NULL, {0, 966463} <unfinished ...>[pid 21286] restart_syscall(<... resuming interrupted call ...> <unfinished ...>[pid 21285] restart_syscall(<... resuming interrupted call ...> <unfinished ...>[pid 21284] restart_syscall(<... resuming interrupted call ...> <unfinished ...>[pid 21282] io_getevents(139912766763008, 1, 256,  <unfinished ...>[pid 21281] io_getevents(139912766775296, 1, 256,  <unfinished ...>[pid 21280] io_getevents(139912766787584, 1, 256,  <unfinished ...>[pid 21279] io_getevents(139912766799872, 1, 256,  <unfinished ...>[pid 21278] io_getevents(139912767021056, 1, 256,  <unfinished ...>[pid 21277] io_getevents(139912767033344, 1, 256,  <unfinished ...>[pid 21276] io_getevents(139913278910464, 1, 256,  <unfinished ...>[pid 21275] io_getevents(139913278922752, 1, 256,  <unfinished ...>[pid 21274] io_getevents(139912767008768, 1, 256,  <unfinished ...>[pid 21362] write(2, "20:01:59 UTC - mysqld got signal"..., 37 <unfinished ...>[pid 21273] io_getevents(139912766996480, 1, 256,  <unfinished ...>[pid 21362] <... write resumed> )       = 37
          [pid 21269] restart_syscall(<... resuming interrupted call ...> <unfinished ...>[pid 21362] write(2, "This could be because you hit a "..., 226) = 226
          [pid 21362] write(2, "We will try our best to scrape u"..., 177) = 177
          [pid 21362] write(2, "key_buffer_size=8388608\n", 24) = 24
          [pid 21362] write(2, "read_buffer_size=131072\n", 24) = 24
          [pid 21362] write(2, "max_used_connections=1\n", 23) = 23
          [pid 21362] write(2, "max_threads=151\n", 16) = 16
          [pid 21362] write(2, "thread_count=1\n", 15) = 15
          [pid 21362] write(2, "connection_count=1\n", 19) = 19
          [pid 21362] write(2, "It is possible that mysqld could"..., 139) = 139
          [pid 21362] write(2, "Hope that's ok; if not, decrease"..., 66) = 66
          [pid 21362] write(2, "Thread pointer: 0x166a170\n", 26) = 26
          [pid 21362] write(2, "Attempting backtrace. You can us"..., 159) = 159
          [pid 21362] futex(0x3a1d191600, FUTEX_WAKE_PRIVATE, 2147483647) = 0
          [pid 21289] <... select resumed> )      = 0 (Timeout)
          [pid 21289] gettimeofday({1378670519, 5535}, NULL) = 0
          [pid 21289] gettimeofday({1378670519, 5692}, NULL) = 0
          [pid 21289] select(0, NULL, NULL, NULL, {1, 0} <unfinished ...>[pid 21362] write(2, "stack_bottom = 7f3fed4a6e28 thre"..., 49) = 49
          [pid 21362] writev(2, [{"/usr/local/mysql/bin/mysqld", 27}, {"(", 1}, {"my_print_stacktrace", 19}, {"+0x", 3}, {"35", 2}, {")", 1}, {"[0x", 3}, {"8d8ce5", 6}, {"]\n", 2}], 9) = 64
          [pid 21362] writev(2, [{"/usr/local/mysql/bin/mysqld", 27}, {"(", 1}, {"handle_fatal_signal", 19}, {"+0x", 3}, {"40b", 3}, {")", 1}, {"[0x", 3}, {"64df1b", 6}, {"]\n", 2}], 9) = 65
          [pid 21362] writev(2, [{"/lib64/libpthread.so.0", 22}, {"[0x", 3}, {"3a1d20f500", 10}, {"]\n", 2}], 4) = 37
          [pid 21362] writev(2, [{"/lib64/libc.so.6", 16}, {"(", 1}, {"shutdown", 8}, {"+0x", 3}, {"b", 1}, {")", 1}, {"[0x", 3}, {"3a1cee99cb", 10}, {"]\n", 2}], 9) = 45
          [pid 21362] writev(2, [{"/usr/local/mysql/bin/mysqld", 27}, {"(", 1}, {"vio_shutdown", 12}, {"+0x", 3}, {"16d", 3}, {")", 1}, {"[0x", 3}, {"b4098d", 6}, {"]\n", 2}], 9) = 58
          [pid 21362] writev(2, [{"/usr/local/mysql/bin/mysqld", 27}, {"(", 1}, {"_ZN3THD10disconnectEv", 21}, {"+0x", 3}, {"50", 2}, {")", 1}, {"[0x", 3}, {"69af80", 6}, {"]\n", 2}], 9) = 66
          [pid 21362] writev(2, [{"/usr/local/mysql/bin/mysqld", 27}, {"(", 1}, {"_Z16close_connectionP3THDj", 26}, {"+0x", 3}, {"21", 2}, {")", 1}, {"[0x", 3}, {"579ef1", 6}, {"]\n", 2}], 9) = 71
          [pid 21362] writev(2, [{"/usr/local/mysql/bin/mysqld", 27}, {"(", 1}, {"_Z24do_handle_one_connectionP3TH"..., 33}, {"+0x", 3}, {"132", 3}, {")", 1}, {"[0x", 3}, {"69fac2", 6}, {"]\n", 2}], 9) = 79
          [pid 21362] writev(2, [{"/usr/local/mysql/bin/mysqld", 27}, {"(", 1}, {"handle_one_connection", 21}, {"+0x", 3}, {"42", 2}, {")", 1}, {"[0x", 3}, {"69fbd2", 6}, {"]\n", 2}], 9) = 66
          [pid 21362] writev(2, [{"/usr/local/mysql/bin/mysqld", 27}, {"(", 1}, {"pfs_spawn_thread", 16}, {"+0x", 3}, {"127", 3}, {")", 1}, {"[0x", 3}, {"912e27", 6}, {"]\n", 2}], 9) = 62
          [pid 21362] writev(2, [{"/lib64/libpthread.so.0", 22}, {"[0x", 3}, {"3a1d207851", 10}, {"]\n", 2}], 4) = 37
          [pid 21362] writev(2, [{"/lib64/libc.so.6", 16}, {"(", 1}, {"clone", 5}, {"+0x", 3}, {"6d", 2}, {")", 1}, {"[0x", 3}, {"3a1cee894d", 10}, {"]\n", 2}], 9) = 43
          [pid 21362] write(2, "\nTrying to get some variables.\nS"..., 89) = 89
          [pid 21362] write(2, "Query (0): ", 11) = 11
          [pid 21362] gettid()                    = 21362
          [pid 21362] open("/proc/self/task/21362/mem", O_RDONLY) = -1 EACCES (Permission denied)
          [pid 21362] write(2, "is an invalid pointer\n", 22) = 22
          [pid 21362] write(2, "Connection ID (thread ID): 1\n", 29) = 29
          [pid 21362] write(2, "Status: KILL_CONNECTION\n\n", 25) = 25
          [pid 21362] write(2, "The manual page at http://dev.my"..., 145) = 145
          [pid 21362] exit_group(1)               = ?
          Process 21362 detached

          I wish it can help you. Best Regards, Gregory

          • 2. Re: UEK3/Beta 1 - Using USDT stucks mysqld to 100% CPU
            LenzGrimmer

            Hi Gregory, thanks a lot for the detailed report. I've asked the Linux DTrace developers to take a look.

            • 3. Re: UEK3/Beta 1 - Using USDT stucks mysqld to 100% CPU
              NickAlcock

              As an aside, stracing a process that dtrace is running over is not going to work, just as you can't strace a process that is running under gdb. (Equally, you can't gdb a process that's being dtraced, dtrace a process that's being gdb'd, or gdb a process that's being straced.)

              • 4. Re: UEK3/Beta 1 - Using USDT stucks mysqld to 100% CPU
                GregoryG.

                Nick,

                 

                So, strace was a bad idea; good to know!

                 

                Did you manage to reproduce the issue? Do you want I share the built project?

                 

                Thank you

                 

                Gregory

                • 5. Re: UEK3/Beta 1 - Using USDT stucks mysqld to 100% CPU
                  kvh

                  Thanks again for your feedback.  We are in the process (together with the MySQL team) of preparing a release of MySQL for testing purposes, with USDT probes.  The problems you were seeing are a combination of several bugs, and some interaction related to the action you were using.  Due to a memory leak issue, performance would get degraded over time with MySQL due to the impact of multiple forks (for worker tasks).  Combine this with a reference counting problem in the handling of meta providers, and you got a perfect recipe for quite noticeable performance impact.  And then, because your action results in output for every single probe firing, this combination of problems would become quite obvious very quickly.

                   

                  Fortunately, the upcoming release has fixes for all these rather obscure problems

                   

                  Thanks again for showing why beta testing "outside the lab" is so crucial in software development!

                  • 6. Re: UEK3/Beta 1 - Using USDT stucks mysqld to 100% CPU
                    GregoryG.

                    Hi Kris,

                     

                    I've read Chris Jones Blog's about PHP and DTrace and I was expecting the "fork fix" would have somehow, at least in part, address this problem... as you've confirmed!

                     

                    To provide another feedback that is not directly related with the technology: (1) I've been working with Oracle for a very long time and (2) I've made presentation about DTrace 0.4 "Beta 1" at the Paris Oracle Meetup yesterday:

                    - I've never experienced any commitment and feedback like the one from your team at Oracle. And there are many great teams at Oracle

                    - I've never seen any technology that people was expecting more than DTrace on Linux (the Solaris team deserves this honor, I guess)

                     

                    So not sure this is any good news for Solaris but we are waiting eagerly to get more...

                     

                    Best Regards,

                     

                    Gregory