Detecting heap memory pitfalls

Step by step and using custom taint analysis to detect heap security issues

Date: 10/12/2022

The security motivation

Auditing the memory heap in a static scenario is an important part of ensuring the security and stability of a program or system. The memory heap is a portion used to store data and dynamically allocate objects at runtime. In a static context, auditing the memory heap involves analyzing the memory allocation and deallocation processes at compile time rather than at runtime. This can help to identify and address any potential issues or vulnerabilities in the program or system before it is deployed, which can prevent memory leaks, use-after-free, double free vulnerabilities, and other issues that can impact the security and stability of the program or system. Regularly auditing the memory heap in a static context can also help to improve the performance and efficiency of a program or system by ensuring that memory is used and managed in an optimal manner.

Introduction

Heap is a memory region allotted to every program. Unlike stack, heap memory can be dynamically allocated. So this means that the program can 'allocate' and 'release' memory from the heap segment whenever required.

So following unix-like systems, we can use functions like malloc(), calloc(), realloc() , new(in the case of C++, and function free() uses delete) to use heap memory context and other functions. These functions provide a layer between the developer and the operating system that efficiently manages heap memory. It is the responsibility of the developer to liberate any allocated memory after using it exactly once. Internally, these functions use system calls such brk()/sbrk() and mmap() and munmap() to request and liberate heap memory from the operating system.

The malloc function is a C standard library function used to allocate memory dynamically at runtime. It takes a single argument: allocating the number of bytes of memory. The function then returns a pointer to the newly allocated memory, which can be used to store data or objects. The malloc function searches the memory heap for a contiguous block of free memory that is large enough to satisfy the request. If it finds a suitable block of memory, it reserves that block for the caller and returns a pointer. If it cannot find a suitable memory block, it may return a null pointer to indicate that the allocation request could not be satisfied. It is crucial to properly manage and deallocate memory allocated with malloc, to prevent memory leaks and other issues.

Sometimes malloc uses mmap to create a private anonymous mapping segment, for example, when needs ample space to allocate. The primary purpose of private anonymous mapping is to allocate new memory (zero-filled), and the calling process would exclusively use this new memory. However, memory mapping is a convenient and high-performance way to do file I/O, so it is used for loading dynamic libraries. It is also possible to create an anonymous memory mapping that does not correspond to any files, used instead for program data. In Linux, if we request a large block of memory via malloc(), the C library will create such an anonymous mapping instead of using heap memory. 'Large' means larger than MMAP_THRESHOLD bytes; we can adjustable that point via mallopt().

However, a good example of the extensible use of dynamic allocation is my repository fortress of solitude which has an extensive list of algorithms with my implementation using pointers and dynamic allocation. The essential point is that all cases use the Valgrind tool for auditing the heap in the dynamic context, like the following for the sake of good knowledge:

File: https://github.com/CoolerVoid/Fortress-of-Solitude/blob/main/ice_doubly_linked_list/heap_test/test.log

==130422== Memcheck, a memory error detector
==130422== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==130422== Using Valgrind-3.13.0 and LibVEX; rerun with -h for copyright info
==130422== Command: bin/test_doubly_linked
==130422== 

---Ice doubly linked list DEBUG-START ---

 Mon Oct 12 16:00:19 2020 lib/ice_doubly_linked.c[120] fifo_dl_list_dbg(): 
 

---Ice doubly linked list DEBUG-END ---
0 2 4 6 8 10  

---Ice doubly linked list DEBUG-START ---

 Mon Oct 12 16:00:19 2020 lib/ice_doubly_linked.c[137] lifo_dl_list_dbg(): 
 

---Ice doubly linked list DEBUG-END ---
10 8 6 4 2 0  
Delete position 8
Label: hulk
 var_name: green
Label: spider
 var_name: red
Label: daredevil
 var_name: red
Label: wolverine
 var_name: yellow
Label: deadpool
 var_name: dark red
----------------------------
Label: deadpool
 var_name: dark red
Label: wolverine
 var_name: yellow
Label: daredevil
 var_name: red
Label: spider
 var_name: red
Label: hulk
 var_name: green
==130422== 
==130422== HEAP SUMMARY:
==130422==     in use at exit: 0 bytes in 0 blocks
==130422==   total heap usage: 33 allocs, 33 frees, 7,044 bytes allocated
==130422== 
==130422== All heap blocks were freed -- no leaks are possible
==130422== 
==130422== For counts of detected and suppressed errors, rerun with: -v
==130422== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)

The critical point here is to read the quote "no leaks are possible" during the execution of the programme with Valgrind. So if memory leaks exist, the programmer needs to investigate each sink or pitfall and liberate each scenario until the testes return "no leaks are possible".

When a memory leak occurs, the memory that is no longer being used by a program is not properly deallocated, which can cause the program to use more and more memory over time. This can lead to the program crashing or becoming unresponsive. To protect against memory leaks, strong coding practices must be used to properly manage and deallocate memory and regularly monitor and test our programs for memory leaks.

Little advice at the end of this part. That point is not a Silver Bullet for all security issues in the heap usage.

Rules to detect Heap security pitfalls

So looking at the literature, some books by Robert Seacord, for example, CERT C or book TAOSSA from Mark Down, we can search an extensive list for good security practices for programming that have an impact on heap memory usage, such as the following:

  • Never access freed memory. Use-after-free is a type of vulnerability that occurs when a program continues to use memory after it has been deallocated or freed. This can happen when a programmer improperly manages memory, for example, by trying to access or manipulate a memory that has already been freed. Attackers can exploit use-after-free vulnerabilities to gain access to sensitive information or execute arbitrary code on a victim's system.

  • Always use the Valgrind tool to help detect memory pitfalls.

  • When using mmap() function, always use munmap() at the end of the task to liberate resources. In general, it is good practice to properly manage and deallocate memory that is no longer needed, to ensure the security and stability of our program or system.

  • After every free, re-assign each pointer pointing to the recently freed memory to NULL.

  • Use only the amount of memory asked using malloc(). Make sure not to cross either boundary.

  • Always check the return value of malloc for NULL. So my suggestion is to create a function to allocate and validate like xmalloc() present in Linux's kernel repository schema.

  • Free only the memory that was dynamically allocated precisely once.

  • Always release allocated storage in error handlers.

  • Zero out sensitive data before freeing it using such as explicit_memset(), memset_s(), or SecureZeroMemory(). Or a similar method that the compiler cannot optimise.

  • Do not make any assumption regarding the positioning of the returned addresses from the function that uses malloc().

  • Take care to liberate memory with free() and delete(c++ scenarios) in the looping scenarios; for example, a double free vulnerability occurs when a piece of memory is freed more than once, which can lead to memory corruption and potentially allow an attacker to execute arbitrary code or gain access to sensitive information. This can happen when a programmer improperly manages memory, for example, by freeing the same piece of memory multiple times or trying to free memory that has already been freed.

  • Use a proper SAST tool to help detect points of possible anomalies. Just for the sake of good knowledge are interesting tools like Splint, Cppchecker, and flaw finder. So looking at the Linux kernel module development scenarios, a good choice can be Coccinelle, Sparce and smatch. Another point is maybe a tool to detect out-of-line mistakes. Just, for example, looking to The MISRA C coding standard was initially written for the automotive embedded software industry, and each scenario needs a proper SAST tool.

  • Use a proper AdressSanitizer, aka ASAN, MemorySanitizer, which we can use in LLVM, Clang and GCC, for example.

  • When working with POSIX threads and async resources, pay attention to mutex lock and every resource else to properly use atomicity to prevent race conditions like TOCTOU vulnerabilities. The Helgrind tool of Valgrind can help with that point, or a proper ThreadSanitizer. Race conditions are a type of vulnerability that can arise when two or more threads of execution access and manipulate shared data concurrently. If the threads are not synchronized properly, this can lead to unpredictable and potentially harmful behaviour, such as data corruption or unexpected program behaviour. To prevent race conditions, it is essential to implement proper synchronization mechanisms, such as locks or atomic operations, to ensure that shared data is accessed and manipulated in a predictable and controlled manner.

Extra mitigations

Resources for hardening heap memory usage:

So we can link those resources with LD-PRELOAD are a possible solution:

github.com/GrapheneOS/hardened_malloc
CMD:  LD_PRELOAD=/opt/hardened_malloc/out/libhardened_malloc.so bin/go_server_bin

github.com/microsoft/snmalloc
CMD:  LD_PRELOAD=project/libsnmalloc-checks.so ./my_app

llvm.org/docs/ScudoHardenedAllocator.html#library
CMD:  LD_PRELOAD=project/libscudo.so ./my_app

github.com/emeryberger/DieHard
Build the shared library with make. You can either link in the resulting shared object (libdiehard.so), or use DieHard by setting the LD_PRELOAD environment variable, as in:
CMD: setenv LD_PRELOAD /path/to/diehard/libdiehard.so
To use the replicated version, invoke your program with (for example):
CMD: diehard 3 /path/to/libdiehard_r.so yourapp

Little advice at the end of this part. That point is not a definitive Silver Bullet for all security issues.

Using my tool heap detective to detect security pitfalls

This tool uses the taint analysis technique for static analysis and aims to identify points of heap memory usage vulnerabilities in C and C++ languages. The tool uses a common approach in the first phase of static analysis, using tokenization to collect information.

The second phase has a different approach to common lessons of the legendary dragon book. The tool doesn't use AST or resources like LLVM following parsers' and standard tips. The approach present aims to study other ways to detect vulnerabilities, using custom vector structures and typical recursive traversal with ranking following taint point. So the result of the sum of these techniques is the Heap_detective.

Features

  • C and C++ tokenizer

  • List of heap static routes for each source with taint points for analysis

  • Analyser to detect double-free vulnerability

  • Analyser to detect use after free vulnerability

  • Analyser to detect memory leak

To test, read the directory samplers to understand the context, so to run, look the following:

$ git clone https://github.com/CoolerVoid/heap_detective

$ cd heap_detective

$ make
// to run
$ bin/heap_detective samplers/   
note:
So don't try "$ cd bin; ./heap_detective"
first argv is a directory for recursive analysis "samplers"

$ gcc --version
gcc (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0
Copyright (C) 2019 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

The first argument by command is a directory for recursive analysis. We can study bad practices in the directory "samplers".

Example of results in console:

Future features

  • Analyser to detect off-by-one vulnerability

  • Analyser to detect wild pointer

  • Analyser to detect heap overflow vulnerability

https://github.com/CoolerVoid/heap_detective

References

The following is a technical compendium of Phrack zine about the Heap topics in-depth, it's just for the sake of knowledge of the security risks in the Heap scenario.

Books

  • CERT® C Coding Standard, Second Edition, The: 98 Rules for Developing Safe, Reliable, and Secure Systems (SEI Series in Software Engineering) 2nd Edition by Robert Seacord

  • Effective C: An Introduction to Professional C Programming by Robert Seacord

  • The Art of Software Security Assessment: Identifying and Preventing Software Vulnerabilities (Volume 1 of 2) 1st Edition by Mark Down

Extra stuff

So all right, thank you for reading my post.

Cheers!

Last updated