Attacker Value
High
(3 users assessed)
Exploitability
High
(3 users assessed)
User Interaction
None
Privileges Required
Low
Attack Vector
Local
2

CVE-2021-33909

Disclosure Date: July 20, 2021
Add MITRE ATT&CK tactics and techniques that apply to this CVE.

Description

fs/seq_file.c in the Linux kernel 3.16 through 5.13.x before 5.13.4 does not properly restrict seq buffer allocations, leading to an integer overflow, an Out-of-bounds Write, and escalation to root by an unprivileged user, aka CID-8cae8cd89f05.

Add Assessment

4
Ratings
  • Attacker Value
    High
  • Exploitability
    High
Technical Analysis

An unprivileged local attacker can exploit this vulnerability by creating, mounting, and deleting a deep directory structure whose total path length exceeds 1GB, and also escalate privileges.

https://www.helpnetsecurity.com/2021/07/20/cve-2021-33909/

3
Ratings
  • Attacker Value
    High
  • Exploitability
    Medium
Technical Analysis

CVE-2021-33909 has high attacker value because it is root privilege escalation in core functionality of the Linux kernel itself. Exploitability is a little lower, since it involves kernel memory corruption with particular requirements, but Qualys has indicated successful exploitation of several Linux distributions and versions, noting that distributions they haven’t tested may be equally exploitable out of the box. Mitigations do exist but do not fix the root cause. You’ll want to patch this one before a full exploit drops. (A crash PoC has already been released.)

3
Technical Analysis

TLDR Version

  • size_t to int conversion vulnerability leading to an integer overflow in the Linux kernel’s filesystem layer.
  • Exploitable by creating, mounting, and deleting a deep directory structure whose total path length exceeds 1 GB.
  • Exploiting it allows an attacker to write the string //deleted to an offset of exactly -2GB-10B (-10B cause the length of //deleted is 10 bytes if you include the NULL terminator) below the beginning of a vmalloc() allocated kernel buffer.
  • Exploit the uncontrolled OOB write to obtain full root privileges on default installations of Ubuntu 20.04, Ubuntu 20.10, Ubuntu 21.04, Debian 11, Fedora 34 Workstation. Though with that being said other Linux distributions are vulnerable and likely exploitable.
  • Exploit requires about 5GB of memory and 1M inodes, and exploit will be published by Qualys sometime in near future according to their blog.
  • Vulnerability introduced in July 2014 (Linux 3.16) by commit 058504ed (“fs/seq_file: fallback to vmalloc allocation”), was fixed with https://github.com/torvalds/linux/commit/8cae8cd89f05f6de223d63e6d15e31c8ba9cf53b in Linux kernel 5.13.4.

Preliminary Warning :)

This commentary may get a bit technical as that is my preferred style of writing so if you want the nutshell version take a look at @NinjaOperator or @wvu-r7’s reviews on this for a nutshell version or look at the TLDR section above if you need the pertinent details and aren’t interested in a deeper dive into this bug (you’ll miss out on some good info though :D) Alright, ready? Lets dive into this bug.

Exploit Background and Some History

So according to Qualys this bug was first introduced in July 2014 (Linux 3.16) by commit 058504ed (“fs/seq_file: fallback to vmalloc allocation”) which can be found at https://gitlab.raptorengineering.com/meklort/talos-obmc-linux/–/commit/058504edd02667eef8fac9be27ab3ea74332e9b4. This was when original kmalloc(m->size <<= 1, GFP_KERNEL); call was switched to a seq_buf_alloc(m->size <<= 1); call.

This is interesting as when we look at the earlier source code for kmalloc from say, version 3.15 of the Linux source code we find that the maximum size of memory that kmalloc can allocate is noted at https://elixir.bootlin.com/linux/v3.15/source/include/linux/slab.h#L455 as KMALLOC_MAX_CACHE_SIZE. This is defined at https://elixir.bootlin.com/linux/v3.15/source/include/linux/slab.h#L234 as #define KMALLOC_MAX_CACHE_SIZE (1UL << KMALLOC_SHIFT_HIGH) or the unsigned number 1 left shifted by KMALLOC_SHIFT_HIGH. KMALLOC_SHIFT_HIGH is defined multiple ways, depending on the backend allocator in use by the OS, but its either defined as a max of 25 (for SLAB allocators), PAGESHIFT, which is defined as 12 for x86/x64 systems, or PAGESHIFT+1 aka 13.

Or in other words to make a long story story, the maximum size that kmalloc() may allocate is 32 MB aka 1<<25. This is far below what can be represented by a 32 bit number. However when the kernel changed to calling seq_buf_alloc() it now calls vmalloc as can be seen at https://elixir.bootlin.com/linux/v3.16/source/fs/seq_file.c#L41, which does not have this same limitation and can allocate as much memory as it pleases. Which means that size could technically be a number that is larger than what can be represented by a signed 32 bit integer.

This leads us into the actual code itself, which I’ll explain below.

The Vulnerable Code Explanation

The Linux kernel has a seq_file interface that produces virtual files that contain sequences of records. Each record must fit into a seq_file buffer, whose size is increased as needed by doubling its size by freeing the existing allocation, and then doing a new seq_buf_alloc() call where the size is the previous size bit shifted left by 1, effectively doubling the size allocated. We can see this if we take a look at https://elixir.bootlin.com/linux/v5.13.3/source/fs/seq_file.c#L242, though the relevant parts of the code are shown below:

 168 ssize_t seq_read_iter(struct kiocb *iocb, struct iov_iter *iter)
 169 {
 170         struct seq_file *m = iocb->ki_filp->private_data;
 ...
 205         /* grab buffer if we didn't have one */
 206         if (!m->buf) {
 207                 m->buf = seq_buf_alloc(m->size = PAGE_SIZE);
 ...
 210         }
 ...
 220         // get a non-empty record in the buffer
 ...
 223         while (1) {
 ...
 227                 err = m->op->show(m, p);
 ...
 236                 if (!seq_has_overflowed(m)) // got it
 237                         goto Fill;
 238                 // need a bigger buffer
 ...
 240                 kvfree(m->buf);
 ...
 242                 m->buf = seq_buf_alloc(m->size <<= 1);
 ...
 246         }

Note that the m is the seq_read_iter() function is a seq_file structure corresponding to the path to the virtual file that we are currently operating on. Anyway now that we have allocated memory the next question might be “well if we can control m->size, couldn’t we do an overflow here?” Well not really as Qualys notes cause either the allocation will fail, or you will run out of memory before you overflow m->size since it is of type size_t as noted at https://elixir.bootlin.com/linux/v5.13.3/source/include/linux/seq_file.h#L18.

The problem however is that m->size is actually used in functions that expect an int value, aka a signed 32 bit integer, not size_t, a 64 bit unsigned integer. Which leads us to the function show() at https://elixir.bootlin.com/linux/v5.13.3/source/fs/seq_file.c#L269, which according to Qualys ends up calling show_mountinfo().

You see show_mountinfo(), will end up calling seq_dentry(m, mnt->mnt_root, " \t\n\\"); as shown at https://elixir.bootlin.com/linux/v5.13.3/source/fs/proc_namespace.c#L150. Note that m here will be the seq_file object containing a buf pointer to the buffer we allocated earlier. seq_dentry() then gets the size of the allocate buffer that m->buf points to, aka the buffer we allocated earlier, and sets the local variable size to its size as can be seen at https://elixir.bootlin.com/linux/v5.13.3/source/fs/seq_file.c#L544. Note however that size is of type size_t, aka an unsigned 64 bit number

And this is where things go really wrong, as dentry_path(dentry, buf, size); is then called which leads to the code starting at https://elixir.bootlin.com/linux/v5.13.3/source/fs/d_path.c#L385. Note however that dentry_path expects its size argument to be an int, aka a signed 32 bit integer though, yet we passed it a size_t number. So if we allocated a buffer 2GB or greater, aka 2147483648 or greater, this would overflow the limits of a signed 32 bit integer, as signed 32 bit numbers can only represent numbers in the range -2147483648 to 2147483647. So in effect the number 2147483648 in size would be converted inside dentry_path to the value -2147483648. Woops!

This then leads to p = buf + buflen; pointing to the location of the allocated buffer, aka buf minus buflen which will now be -2147483648 assuming the that size was specified as 2GB aka 2147483648. So p will in effect point to a memory location 2GB before where our vmalloc() allocated buffer is.

We then end up calling prepend(&p, &buflen, "//deleted", 10), and we can see the code for prepend at https://elixir.bootlin.com/linux/v5.13.3/source/fs/d_path.c#L11, thought the interesting part starts at https://elixir.bootlin.com/linux/v5.13.3/source/fs/d_path.c#L16. Here we can see that buffer, aka the pointer to the memory at 2GB before the vmalloc() allocated buffer, is subtracted by 10, making it point an additional 10 bytes earlier in memory. Following this 10 bytes of the //deleted string, aka the //deleted string plus the null terminator is written to memory.

This effectively allows the attacker to gain an OOB write in kernel memory as they can adjust the length of the string allocated to adjust where they write in memory. Now typically this wouldn’t lead to much however Qualys was able to use this OOB write to overwrite the instruction of the validated eBPF program after it has been validated by the kernel but before it has been JIT compiled, and use this to transform the uncontrolled OOB write into and information disclosure and then in to a limited but controlled OOB write. Then then used Manfred Paul’s btf and map_push_elem techniques from https://www.thezdi.com/blog/2020/4/8/cve-2020-8835-linux-kernel-privilege-escalation-via-improper-ebpf-program-verification to transform this limited controlled OOB write into a full arbitrary kernel read and write and used this to set modprobe_path to their current executable, a technique that has been described in more detail than I can describe here in places like https://lkmidas.github.io/posts/20210223-linux-kernel-pwn-modprobe/, to elevate their current process such that it now executes code in kernel mode as root.

Official Patch And Some Important Notes

https://github.com/torvalds/linux/commit/8cae8cd89f05f6de223d63e6d15e31c8ba9cf53b is the official patch for this issue, which fixes the issue by ensuring that seq_buf_alloc doesn’t allocate memory that is larger than MAX_RW_COUNT. Looking at where MAX_RW_COUNT is defined we see https://elixir.bootlin.com/linux/v5.13.4/source/include/linux/fs.h#L2572 where it is defined as #define MAX_RW_COUNT (INT_MAX & PAGE_MASK). This basically page aligns INT_MAX to a page so the max value allowed will be the value of INT_MAX, which is the maximum value of a signed 32 bit integer, minus the size of a page of memory on the system, which is typically 4KB in size.

If the size is over this amount then the allocation will fail and we will never hit the vulnerable code. However this doesn’t really solve the root issue per say. Assuming users can find another way to execute the same vulnerable code and abuse the fact that the kernel is still passing unsigned 64 bit integers to functions that expect signed 32 bit integers, its likely that someone could bypass this patch via alternative means. Whether or not this is possible remains to be seen, but in my opinion, whilst it would be more work, the appropriate solution would be update the functions to pass the appropriate data into one another whilst also taking care to not perform casts between signed and unsigned numbers without performing appropriate checks.

CVSS V3 Severity and Metrics
Base Score:
7.8 High
Impact Score:
5.9
Exploitability Score:
1.8
Vector:
CVSS:3.1/AV:L/AC:L/PR:L/UI:N/S:U/C:H/I:H/A:H
Attack Vector (AV):
Local
Attack Complexity (AC):
Low
Privileges Required (PR):
Low
User Interaction (UI):
None
Scope (S):
Unchanged
Confidentiality (C):
High
Integrity (I):
High
Availability (A):
High

General Information

Vendors

  • debian,
  • fedoraproject,
  • linux,
  • netapp,
  • oracle,
  • sonicwall

Products

  • communications session border controller 8.2,
  • communications session border controller 8.3,
  • communications session border controller 8.4,
  • communications session border controller 9.0,
  • debian linux 10.0,
  • debian linux 9.0,
  • fedora 34,
  • hci management node -,
  • linux kernel,
  • sma1000 firmware,
  • solidfire -

References

Advisory

Additional Info

Technical Analysis