mmap
is a Linux system call and a standard C library function which is used to create a mapping in the virtual address space (VAS) of the calling (current) process.
-
The mapping is between the VAS and (optionally) a file/device identified by a file-descriptor.
-
The VAS of a process is set of addresses assigned by the operating-system (OS) for use by the process. The addresses in the VAS do not have a one-to-one mapping with addresses in the physical address space of the computer; the latter being defined by the available physical random access memory (RAM) of the computer.
-
The address space of the file/device is mapped directly byte-to-byte to a region belonging to the process’s VAS. Such a file is referred as a ‘memory-mapped’ file. The contents of a memory-mapped file can be accessed by dereferencing addresses present in the mapped region.
Here’s a small C example which uses mmap
to map the contents of a file message.txt
in a read-only, private region in the process’s VAS. The mapping can be observed using pmap <pid>
.
Compiling and running the executable,
Using top
, we get the process-identifier (PID) of the process and pass to pmap <pid>
,
> ps aux | grep -i ex_mmap
shubham 1025 0.0 0.0 6336 1940 pts/5 D+ 11:07 0:00 grep -i ex_mmap
shubham 32531 0.0 0.0 2468 920 pts/4 S+ 11:03 0:00 ./ex_mmap
> pmap 32531
> 32531: ./ex_mmap
00005581bd62e000 4K r---- ex_mmap
00005581bd62f000 4K r-x-- ex_mmap
00005581bd630000 4K r---- ex_mmap
00005581bd631000 4K r---- ex_mmap
00005581bd632000 4K rw--- ex_mmap
00005581e444a000 132K rw--- [ anon ]
00007fa13a20c000 12K rw--- [ anon ]
00007fa13a20f000 152K r---- libc.so.6
00007fa13a235000 1364K r-x-- libc.so.6
00007fa13a38a000 332K r---- libc.so.6
00007fa13a3dd000 16K r---- libc.so.6
00007fa13a3e1000 8K rw--- libc.so.6
00007fa13a3e3000 52K rw--- [ anon ]
00007fa13a3f6000 4K r---- message.txt
00007fa13a3f7000 8K rw--- [ anon ]
00007fa13a3f9000 4K r---- ld-linux-x86-64.so.2
00007fa13a3fa000 148K r-x-- ld-linux-x86-64.so.2
00007fa13a41f000 40K r---- ld-linux-x86-64.so.2
00007fa13a429000 8K r---- ld-linux-x86-64.so.2
00007fa13a42b000 8K rw--- ld-linux-x86-64.so.2
00007ffeb810f000 136K rw--- [ stack ]
00007ffeb81be000 16K r---- [ anon ]
00007ffeb81c2000 8K r-x-- [ anon ]
total 2468K
The mapping named message.txt
is visible alongside libc
and ld-linux-x86-64
which are shared libraries whose functions are referred by our program.
INFO
Why is the
ld-linux-x86-64
(GNU Linker)‘s code mapped to the process’s VAS?
ld-linux-x86-64
is a dynamic linker which loads shared library dependencies at runtime from persistent storage to the main memory of the computer. It not only links the target executable to the shared libraries but also places machine code functions at specific address points in memory that the target executable knows about at link time. When an executable wishes to interact with the dynamic linker, it simply executes the machine-specific call or jump instruction to one of those well-known address points.
Why do we need to mmap
files?
-
Reading/writing files with
fopen
andfwrite
(which internally useread
andwrite
system calls) is buffered at multiple stages between the current program and the storage device. There exist buffers in the kernel-space and user-space code resulting in multiple copies of the file. With a memory-mapped file, a direct access to the page-cache (file-cache) of the operating system is made, significantly reducing the sys-call overhead. -
The pages belonging to the mapped region are also loaded lazily i.e. only when they’re required by the process (demand paging)
-
The page-cache does not contain all requested pages, thus resulting in page-faults. For larger files where frequent accesses are made randomly (not sequentially), the number of page-faults can increase, hurting the performance of the operation.