0 Replies Latest reply on Mar 13, 2012 6:22 AM by 923410

    Injecting a bug in the working of the L2 cache of Opensparc T1

      As a part of my research work, I need to inject a bug in the operation of the L2 cache of Opensparc and then do some analysis of the RTL code based on the outcome of that bug. I have been working on Opensparc T1. I decided to target the writeback operation of the L2 cache.

      I did the following to achieve my end:
      1) I have written a SPARC assembly language program that accesses the L2 cache again and again. Specifically, I change the page size to 4 MB, I do 16 write accesses (sth) operations to addresses 262144 bytes apart (64 * 4 * 1024) to hit the set 0 of bank 0 of L2 cache, then again 16 such accesses to set 1 and 16 such accesses to set 2. This ensures that after 12 accesses there are 4 writebacks on average, for the last 4 accesses to each set. Now I also do a load access (lduh) on one of these addresses which are being written back previously. I check in my program whether I am reading the correct value that I had written for that address.

      2) Normally, what happens is, when my LOAD PCX packet arrives, it undegoes a miss at L2 cache, but it hits in the writeback buffer (that is, the address that I am trying to load is in the writeback buffer at that time). So this LOAD waits in the miss buffer. When the writeback buffer receives the DRAM write ack corresponding to this address, it wakes the correponding miss buffer entry and then this load executes through the L2 cache pipeline to get the correct value and my assembly program execution succeeds.

      3) Right now, I inject a simple bug to make the wbctl_hit_unqual_c2 signal in file "sctag_wbctl.v" stuck at 0, that is the signal that goes from writeback buffer to miss buffer to tell miss buffer that this particular access is a hit in the writeback buffer. Since this signal is stuck-at-0, what I expect is that the miss buffer will insert this particular LOAD as a true miss (one that doesn't depend on any miss buffer/fill buffer/writeback buffer value) and so the miss will be issued to L2 cache pipeline independently and will receive the old value of this address from the DRAM. So my assembly program will fail. That is my expectation.

      4) What actually happens in this case, is that the miss buffer treats the LOAD as a true miss and does issue the READ independently to the DRAM, but the read request goes to the DRAM just after the write request to DRAM for the same address goes to the DRAM from the writeback buffer. As a result, the manifestation I see is that my assembly program terminates with the following error:
      "ERROR : In dram channel 0
      At time 13116747 rd entry 0 which is address = 800086000, has a match with incoming write entry at WR Q location 4
      13116747 ERROR: DRAM Channel 0 RD/WR Sequencing violation
      ERROR: cmp_top.cmp_dram.cmp_dram_mon.dram_mon0: DRAM monitor exited"
      I do not see the RAW hazard error that I was expecting (I was expecting a clean exit of my program with fail, that is inside program the value read will be compared with value expected by CMP and that will result in fail), but instead I see the above from the DRAM monitor code. Is this what I should be seeing? Is this read/write sequencing error equivalent to the RAW hazard that I am trying to create?

      5) I tried to delay the write request to DRAM for this address a little, so that my read request will end up reaching the DRAM first and get serviced with the old value, so that my bug manifestation will be as I wanted. I tried assigning a delay to the continuous assignment of signal "can_req_dram" in file "sctag_wbctl.v" so that the write request issued from writeback buffer to DRAM will be delayed till after the read request issued for true miss on that address from miss buffer. But that is not happening. This rd/wr sequencing is all that I can get.

      Could anyone throw some light on this? Also maybe it can be that the actual RAW hazard is happening in this case, but the program is getting terminated before giving the expected result because the DRAM monitor is written to catch such sequencing errors and terminate early? Also if anyone can suggest a way of delaying writeback for this particular address so that the write request reaches DRAM after the read. Please help.