Forum Stats

  • 3,780,903 Users
  • 2,254,454 Discussions
  • 7,879,490 Comments

Discussions

how does variable rpl_semi_sync_master_wait_point work in mysql 5.7.2 onwards?

2784987
2784987 Member Posts: 54
edited Apr 7, 2016 9:01PM in MySQL Community Space

In mysql 5.7.2 onwards,variable rpl_semi_sync_master_wait_point is used to control when the transactions in the master waits for the response from one of slaves for semi sync replication.

if we set the variable as after_sync,according to the document:

Upon receiving acknowledgment, the master commits the transaction to the storage engine and returns a result to the client, which then can proceed.

that means if the master can't receive response from one of its slaves and the master crashes before timeout happens which is defined by rpl_semi_sync_master_timeout,the transaction should fail.

but my test shows that it is not true.

I set rpl_semi_sync_master_timeout to a bigger value such as 10000,stopped all slave io threads and launched a transaction,then the transaction waited for the response from one of slaves,then I crashed the master by killing the mysqld process.after I restarted mysqld process,it showed that the transaction succeeded,which should fail as decribed in the doc.

why?

thanks

«1

Answers

  • Matt Lord-Oracle
    Matt Lord-Oracle Member Posts: 5
    edited Feb 15, 2016 4:02PM

    You are correct in assuming that when using rpl_semi_sync_master_wait_point=AFTER_SYNC (which is the default value), the transaction should only succeed when it's successful on both master and slave.

    rpl_semi_sync_master_timeout=10000 is also the default value, and it's important to note that this is milliseconds. I assume that your intention was to set this to such a large value so as to preclude that from being a potential factor in your tests? If so, then you should instead try using rpl_semi_sync_master_timeout=10000000 (over 2.5 hours).


    If you can repeat the failure--in this context meaning that the transaction succeeded (meaning it was persisted and externalized) on the master without reaching the slave--then it would definitely be a bug. I would encourage you to open a bug report and let us know all of the details so that we can verify it, and then fix it.


    Thank you! 

  • 2784987
    2784987 Member Posts: 54
    edited Feb 15, 2016 9:11PM


    Hi Matt,

    thanks for your reply.In fact,my test was very simple:

    1.I set rpl_semi_sync_master_timeout=10000 ,because 10 seconds is long enough for me to crash the master  and the master don't revert to async replication

    2.I stopped io threads of all slaves

    3.I launched a transaction as below:

    insert into test.t5 select now();

    then the above query hang

    4.I killed the mysqld process of the master and from the window,I noticed an error occured that the connection was closed

    1.jpg

    5.then I restarted the mysqld process of the master and check the table and found that the row I inserted just now existed.

    thanks

  • 2784987
    2784987 Member Posts: 54
    edited Feb 15, 2016 9:14PM


    Can you pls do a simple test as what I do and check if the result is the same?

    thanks a lot

  • Matt Lord-Oracle
    Matt Lord-Oracle Member Posts: 5
    edited Feb 15, 2016 9:27PM

    Hi,

    "

    1.I set rpl_semi_sync_master_timeout=10000 ,because 10 seconds is long enough for me to crash the master  and the master don't revert to async replication

    2.I stopped io threads of all slaves

    3.I launched a transaction as below:

    ...

    "


    I'm sorry, I had completely glossed over point#2 in your steps above.


    If you stopped all IO threads of all slaves, then from the master's point of view there are no slaves at the time you executed step#3 and thus the write succeeded as it was a "stand-alone" instance at that point. You don't want your production system to block all writes if there are no slaves.


    If you're interested in multi-master, then I would encourage you to look at MySQL Group Replication: Group Replication — an Overview | MySQL High Availability


    Best Regards

  • 2784987
    2784987 Member Posts: 54
    edited Feb 15, 2016 9:37PM


    Why I stopped all IO threads of all slaves? because I wanted the transaction in the master hang

    or the binlog will be transfered to the slaves and I can't check if the after_sync works

    In fact,the transaction in the master really hang,if it worked as a stand-alone instance,it should not hang,am I right?

    thanks

  • Matt Lord-Oracle
    Matt Lord-Oracle Member Posts: 5
    edited Feb 15, 2016 10:35PM

    Hi,

    Sorry, after looking into it further I can see that I was wrong:

    http://dev.mysql.com/doc/refman/5.7/en/server-system-variables.html#sysvar_rpl_semi_sync_master_wait_for_slave_count

    http://dev.mysql.com/doc/refman/5.7/en/server-system-variables.html#sysvar_rpl_semi_sync_master_wait_no_slave

    So assuming rpl_semi_sync_master_wait_no_slave was ON, then the master would still wait rpl_semi_sync_master_timeout milliseconds for an ACK from 0 slaves (only entering "stand-alone" mode after the first timeout).

    I created a test environment to verify (highlights my own):

    mysql> show global variables like "rpl%";

    +-------------------------------------------+------------+

    | Variable_name                             | Value      |

    +-------------------------------------------+------------+

    | rpl_semi_sync_master_enabled              | ON         |

    | rpl_semi_sync_master_timeout              | 10000      |

    | rpl_semi_sync_master_trace_level          | 32         |

    | rpl_semi_sync_master_wait_for_slave_count | 1          |

    | rpl_semi_sync_master_wait_no_slave        | ON         |

    | rpl_semi_sync_master_wait_point           | AFTER_SYNC |

    | rpl_stop_slave_timeout                    | 31536000   |

    +-------------------------------------------+------------+

    7 rows in set (0.00 sec)

    mysql> insert into gcoltest (lon, lat) select lon, lat from gcoltest;

    Query OK, 1 row affected (10.01 sec)

    Records: 1  Duplicates: 0  Warnings: 0

    And in the MySQL error log (highlights my own):

    2015-12-24T22:58:12.193543Z 4 [Note] Semi-sync replication initialized for transactions.

    2015-12-24T22:58:12.193584Z 4 [Note] Semi-sync replication enabled on the master.

    2015-12-24T22:58:12.193802Z 0 [Note] Starting ack receiver thread

    2015-12-24T23:00:10.129022Z 4 [Warning] Timeout waiting for reply of binlog (file: hanode4-bin.000007, pos: 890), semi-sync up to file , position 0.

    2015-12-24T23:00:10.129064Z 4 [Note] Semi-sync replication switched OFF.

    So you should check your MySQL error log. I suspect that you will see the same Timeout warning. You noted that you "killed mysqld", but assuming you simply did "kill <PID>" then you simply sent the process a SIGTERM, which tells it to start a normal shutdown. That shutdown process can take many seconds to complete as it tries to terminate various internal processes gracefully. If you want the process to terminate immediately, then you should send it a SIGKILL or "kill -9 <PID>".

    If you see something else, then you should file a bug report so that we can look into it further.

    I will talk to the documentation team, as these behaviors could certainly be better documented.

    Best Regards

  • 2784987
    2784987 Member Posts: 54
    edited Feb 15, 2016 10:37PM


    I really used "kill -9 <PID>"

    additionally,no other errors when the transaction hang,but after the mysqld process started,the transaction was recovered:

    thanks

  • 2784987
    2784987 Member Posts: 54
    edited Feb 15, 2016 10:38PM


    pls refer to following pic:

    2.JPG

  • Matt Lord-Oracle
    Matt Lord-Oracle Member Posts: 5
    edited Feb 15, 2016 10:51PM

    Hi,

    I was able to repeat the issue, and it certainly seems to be a bug. I will talk with the Replication team and ensure that it's addressed.

    Thank you!

  • 2784987
    2784987 Member Posts: 54
    edited Feb 15, 2016 10:55PM


    thanks a lot,

    pls let me know if you get any responses.