Using Silicon Secured Memory to Detect Memory Access Errors at Hardware Speeds

Version 8

    by Raj Prakash

     

    Hardware support provided by Oracle's SPARC M7 processor can detect a wide range of memory access errors at hardware speed.

     

    Introduction

     

    In a previous article, we discussed the types and the severity of memory access errors. We also covered the tools available in Oracle Solaris Studio for detecting this class of error.

     

    In this article, we discuss the limitations of performing this kind of error checking in software. We then describe the hardware support provided by Oracle's SPARC M7 processor for detecting memory access errors, and we point out the benefits of hardware analysis over software analysis.

     

    Limitations of Dynamic Instrumentation

     

    The most obvious limitation of dynamic instrumentation is that the instrumented program runs much slower than the original program. This is because the instrumented program needs to keep a database of each allocated chunk of memory. Each memory read and write instruction has to be augmented with other instructions to read the information from the database to find out whether the memory location is valid. Instrumented programs typically run more than 30x slower.

     

    This slowdown often makes it impractical to do dynamic checking of a program with a large test suite. Consequently, dynamic checking is mostly used for debugging a specific problem or with a smaller test suite.

     

    There are also some limitations on the errors that dynamic instrumentation can detect. The ability to detect memory errors relies on being able to distinguish between valid memory accesses and invalid memory accesses. Some kinds of memory access are clearly an error, for example, a read of uninitialized memory is unambiguously a problem. Other situations are not so clear cut.

     

    As an example, consider an application that allocates a structure, uses that structure, and then frees it. If a pointer to this region of memory is used, then the tool can detect that it is an access to invalid memory, and can report it as a "freed memory" access error. This situation is known as a dangling pointer—a pointer that points to a region of memory that is no longer valid.

     

    However, if later malloc() reuses the same region of memory, as shown in Listing 1, the memory is considered valid again. Now a memory access through the stale pointer is indistinguishable from a memory access through a legitimate pointer to the region. So it is not possible for a tool to report an error when the stale pointer is used.

     

    int *area1 = malloc(64);
    free(area1);
    char *area2 = malloc(64); // area2 gets the memory area just freed by area1 area1[0] = 0;  // Stale-Pointer Access

    Listing 1.

     

    f1.png

    Figure 1.

     

    There is a similar situation in which a pointer gets corrupted. If the corrupted pointer happens to point to a valid region of memory, it's not possible for a tool to determine that this is a corrupted, rather than a legitimate, pointer.

     

    Hardware Support for Detecting Memory Access Errors

     

    Oracle's SPARC M7 processor provides a hardware feature called Silicon Secured Memory (SSM), previously known as Application Data Integrity (ADI) and sometimes still referred to as such. This hardware feature allows real-time detection of memory access errors.

     

    Data is stored in memory in units of 64 bytes called cache lines. So when data that consists of one or more bytes is loaded from memory, the entire block of 64 bytes containing that data is fetched. The latest SPARC processors extend this by adding four additional bits to each cache line. Fetching the 64-byte cache line also fetches these additional bits. These four bits are invisible to the application and are used to hold additional information for SSM.

     

    The best way of thinking of the bits is to imagine them containing a color. For example, a value of one could be thought of as red, a value of two as green, and so on. So a cache line of 64 bytes can be thought of as both containing 64 bytes of data and having a color.

     

    Whenever we need to access a memory location, we need to have a pointer to that memory location. Pointers are 64 bits in size, which allows a 64-bit processor to potentially access 16 exbibytes (EiB) of data—which is about 17,000,000 TB of data. There are no current systems that can hold this much memory. For example, Oracle's SPARC M7-32 system can contain a staggering 64 TB of memory. Consequently a 64-bit processor does not need to use all the 64 bits in a pointer. Normally the unused bits are constrained to be all zeros or all ones, but SSM uses them for a different purpose.

     

    Instead of requiring the most-significant four bits to be all zeros or all ones, SSM uses them to store color values. This means that all the pointers can be thought of as being colored in the same way as all the cache lines in memory are colored.

     

    SSM uses the fact that we can color both pointer and memory to check for invalid memory accesses. A "green" cache line can be accessed only through a "green" pointer. It is an error to use a "green" pointer to access a "red" cache line. The hardware will cause a trap when such a color mismatch occurs.

     

    f2.png

    Figure 2.

     

    Advantages of Hardware Support

     

    The most obvious advantage of hardware support for memory error detection is the massive performance advantage. The hardware takes responsibility for checking that every memory access is valid, and this usually incurs a cost only if the access is invalid and the hardware has to cause a trap to report the error. Consequently most applications run at close to their usual speeds.

     

    f3.png

    Figure 3.

     

    Another important advantage is that the software changes needed to support SSM can be provided in a library. An application does not need to have any instrumentation added in order for it to be checked. This means even existing applications for which the source code has been lost can be checked for correctness. For example, if the application is run with command a.out, the following will enable SSM to check the application:

     

    % LD_PRELOAD_64=<compiler>/lib/compilers/sparcv9/libdiscoverADI.so a.out

     

    However, there is another advantage to SSM that is not immediately apparent. SSM can pick up a range of errors that normal instrumentation cannot identify. Earlier, we discussed that typical software instrumentation has a problem when there are stale pointers to reassigned memory locations or if a pointer happens to point to a valid memory location. It is very hard for software to be able to handle these situations, because it has no idea whether the memory location is a valid location for the pointer to address. SSM, on the other hand, encodes the "color" of a memory location into both the pointer and the memory location. So only a "red" pointer can address a "red" memory location.

     

    Let's consider how this changes the stale-pointer situation. When a memory location is freed, the call to free() can change the color of that block of memory. So now a stale pointer to that block of memory will be the wrong color to access it. Using this approach, we can detect an access to that block through a freed pointer.

     

    f4.png

    Figure 4.

     

    Now imagine that the block of memory is returned for new use by another call to malloc(). In this situation, we can change the color of the block again. An access through the stale pointer continues to report an error by trapping.

     

    f5.png

    Figure 5.

     

    Listing 2 shows the example in C.

     

    int *area1 = malloc(64);
    free(area1);
    char *area2 = malloc(64); // area2 gets the memory area just freed by area1 area1[0] = 0;  // Stale Pointer Access

    Listing 2.

     

    The other situation is when a pointer ends up with random data that is dereferenced and happens to point to a valid block of data. In this case, the pointer is likely to be of the wrong color, so the error will be detected.

     

    All these situations are very hard to detect in software, but they are caught with hardware support. So hardware support is not only faster and easier to apply to an existing application, but it also identifies an additional range of problems.

     

    Listing 3 is an example program that shows the hardware detecting four types of errors: buffer overflow, freed-memory access, stale-pointer access, and freeing memory more than once.

     

      1  #include <stdlib.h>
      2  #include <stdio.h>
      3  int main() {
      4    int *area1 = malloc(sizeof(int)*16);
      5    int *area2 = malloc(sizeof(int)*100);
      6  
      7    for (int i = 0; i <= 16; i++)
      8      area1[i] = 0;     // Array Out of Bounds
      9  
     10    free(area1);
     11    area1[0] = 0;       // Freed Memory Access
     12    
     13    char *area3 = malloc(sizeof(char)*64);
     14    if ((void *)area1 == (void *)area3)
     15     printf("New area3 is same as old area1\n");
     16    area1[0] = 0;       // Stale Pointer Access
     17  
     18    free(area3);
     19    free(area3);
     20
     21    return 0;
     22  }

    Listing 3.

     

    As shown in Listing 4, the Listing 3 program can be run using SSM and the Oracle Solaris Studio discover tool by passing the flag -i adi to discover:

     

    $ cc t.c -g -m64
    $ discover -i adi -w - a.out
    $ a.out

    Listing 4.

     

    The first problem that SSM detects is the buffer overflow on line 8 of Listing 3, where the array area1 is accessed with index 16 while the highest valid index is 15.

     

    f6.png

    Figure 6.

     

    As shown in Listing 5, discover indicates the point in the code where the access occurs, plus the location in the code where the buffer is allocated.

     

    ERROR 1 (ABW): writing to memory beyond array bounds at address 0x200000021047e040:

            main() + 0x38  <t.c:8>

                     5:      int *area2 = malloc(sizeof(int)*100);

                     6:   

                     7:      for (int i = 0; i <= 16; i++)

                     8:=>      area1[i] = 0;     // Array Out of Bounds

                     9:   

                    10:      free(area1);

                    11:      area1[0] = 0;       // Freed Memory Access

            _start() + 0x108

        was allocated at (64 bytes):

            main() + 0x8  <t.c:4>

                    1:    #include <stdlib.h>

                    2:    #include <stdio.h>

                    3:    int main() {

                    4:=>    int *area1 = malloc(sizeof(int)*16);

                    5:      int *area2 = malloc(sizeof(int)*100);

                    6:   

                    7:      for (int i = 0; i <= 16; i++)

            _start() + 0x108

    Listing 5.

     

    As shown in Listing 6, the next problem detected is the write to freed memory at line 11 of Listing 3; the memory was freed earlier at line 10.

     

    ERROR 2 (FMW): writing to freed memory at address 0x200000021047e000:

            main() + 0x6c  <t.c:11>

                     8:        area1[i] = 0;     // Array Out of Bounds

                     9:   

                    10:      free(area1);

                    11:=>    area1[0] = 0;       // Freed Memory Access

                    12:   

                    13:      char *area3 = malloc(sizeof(char)*64);

                    14:      if ((void *)area1 == (void *)area3)

            _start() + 0x108

        was allocated at (64 bytes):

            main() + 0x8  <t.c:4>

                    1:    #include <stdlib.h>

                    2:    #include <stdio.h>

                    3:    int main() {

                    4:=>    int *area1 = malloc(sizeof(int)*16);

                    5:      int *area2 = malloc(sizeof(int)*100);

                    6:   

                    7:      for (int i = 0; i <= 16; i++)

            _start() + 0x108

        freed at:

            main() + 0x5c  <t.c:10>

                     7:      for (int i = 0; i <= 16; i++)

                     8:        area1[i] = 0;     // Array Out of Bounds

                     9:   

                    10:=>    free(area1);

                    11:      area1[0] = 0;       // Freed Memory Access

                    12:   

                    13:      char *area3 = malloc(sizeof(char)*64);

            _start() + 0x108

    Listing 6.

     

    There is a stale pointer access at line 16 of Listing 3. The memory pointed to by area1 was freed at line 10, but the memory was reused for area3 at line 13. As shown in Listing 7, discover reports this as a write to freed memory—even though the memory has been repurposed. This is an example of the kind of error that it is very hard for a software-only solution to detect.

     

    ERROR 3 (FMW): writing to freed memory at address 0x200000021047e000:

            main() + 0xb4  <t.c:16>

                    13:      char *area3 = malloc(sizeof(char)*64);

                    14:      if ((void *)area1 == (void *)area3)

                    15:       printf("New area3 is same as old area1\n");

                    16:=>    area1[0] = 0;       // Stale pointer access

                    17:   

                    18:      free(area3);

                    19:      free(area3);

            _start() + 0x108

        was allocated at (64 bytes):

            main() + 0x8  <t.c:4>

                    1:    #include <stdlib.h>

                    2:    #include <stdio.h>

                    3:    int main() {

                    4:=>    int *area1 = malloc(sizeof(int)*16);

                    5:      int *area2 = malloc(sizeof(int)*100);

                    6:   

                    7:      for (int i = 0; i <= 16; i++)

            _start() + 0x108

        freed at:

            main() + 0x5c  <t.c:10>

                     7:      for (int i = 0; i <= 16; i++)

                     8:        area1[i] = 0;     // Array Out of Bounds

                     9:   

                    10:=>    free(area1);

                    11:      area1[0] = 0;       // Freed Memory Access

                    12:   

                    13:      char *area3 = malloc(sizeof(char)*64);

            _start() + 0x108

    Listing 7.

     

    The final error reported by discover in Listing 8 is the double freeing of area3 at lines 18 and 19 of Listing 3.

     

    ERROR 4 (DFM): double freeing memory at address 0x300000021047e000:

            main() + 0xc8  <t.c:19>

                    16:      area1[0] = 0;       // Stale pointer access

                    17:   

                    18:      free(area3);

                    19:=>    free(area3);

                    20:   

                    21:      return 0;

                    22:    }

            _start() + 0x108

        was allocated at (64 bytes):

            main() + 0x74  <t.c:13>

                    10:      free(area1);

                    11:      area1[0] = 0;       // Freed Memory Access

                    12:   

                    13:=>    char *area3 = malloc(sizeof(char)*64);

                    14:      if ((void *)area1 == (void *)area3)

                    15:       printf("New area3 is same as old area1\n");

                    16:      area1[0] = 0;       // Stale pointer access

            _start() + 0x108

        freed at:

            main() + 0xbc  <t.c:18>

                    15:       printf("New area3 is same as old area1\n");

                    16:      area1[0] = 0;       // Stale pointer access

                    17:   

                    18:=>    free(area3);

                    19:      free(area3);

                    20:   

                    21:      return 0;

            _start() + 0x108

    DISCOVER SUMMARY:

            unique errors   : 4 (4 total)

    Listing 8.

     

    Robust Checking from Smart Algorithms and Probability

     

    SSM uses only a subset of the bits in the pointer, and it still provides very robust error detection. Using a single bit, we get a 50 percent chance of a pointer matching a region of memory when it shouldn't. If we were to use two bits, we would have a 75 percent chance of catching an error. With three bits, we would have 87.5 percent chance, and so on.

     

    However, that is true only if the colors were assigned randomly. The memory allocation routines give different colors to adjacent areas. Therefore, a buffer overflow into the neighboring area is detected 100 percent of the time. The security vulnerabilities caused by buffer overflows, such as those exploited by Heartbleed and Venom, are stopped every time. Freed-memory access is also caught reliably.

     

    What's more, even stale-pointer access (a freed-memory access to an area that has been subsequently allocated for another purpose), which no software tool to date detects, is also caught nearly 100 percent of the time That is because in practice, a stale-pointer access happens very soon after a reuse of an area, and the area will have been assigned a new color upon its reuse. The area does not return to the original color until the memory management routines cycle through many allocations and freeing of the same area.

     

    Conclusion

     

    The hardware support provided by the Silicon Secured Memory feature of Oracle's SPARC M7 processor changes the game for application correctness. This hardware support means that applications run at nearly full speed; consequently the correctness of an application can be checked over full and extensive test suites. In this way, we can be nearly certain that all parts of the code have been exercised and tested.

     

    The hardware support for memory error checking extends beyond what can be typically achieved with software instrumentation. So the SPARC M7 processor provides a very powerful combination of hardware-speed testing for a wide range of types of memory access errors.

     

    See Also

     

     

    About the Author

     

    Raj Prakash is a senior software architect at Oracle. Currently he is the technical lead for several code analysis tools and global optimizers. His expertise is in developing tools designed to improve application security, performance, and scalability. He also writes a blog.

     

     

    Revision 1.0, 03/07/2016

     

    Follow us:
    Blog | Facebook | Twitter | YouTube