11-17 10:27 阅读 162

Redis6源码系列（一）- 内存管理zmalloc（下）

Redis中对内存的管理功能由 zmalloc 完成，对应 zmalloc.h/zmalloc.c 文件；头文件 zmalloc.h 中包含了相关的宏定义和函数声明，具体的实现在 zmalloc.c 文件中。

zmalloc本质上是对 jemalloc、tcmalloc、libc（ptmalloc2）等内存分配器（算法库）的简单抽象封装，提供了统一的内存管理函数，屏蔽底层不同分配器的差异。

1、函数定义

在头文件 zmalloc.h 中，定义了Redis内存分配的主要功能函数，这些函数就包括内存申请、释放和统计等功能：

// 申请内存
void *zmalloc(size_t size);
void *zcalloc(size_t size);
void *zrealloc(void *ptr, size_t size);
// 释放内存
void zfree(void *ptr);
// 获取内存大小
size_t zmalloc_used_memory(void);
// 内存溢出处理
void zmalloc_set_oom_handler(void (*oom_handler)(size_t));
// 其他函数...复制代码

2、宏和全局变量

zmalloc定义了几个宏变量、函数和全局变量，用于记录一些状态信息

PREFIX_SIZE

C语言标准库函数malloc在申请内存时，会记录申请的内存块大小，并把大小数值存储到分配的内存块中，用于外部获取已分配空间的大小。存储大小数值的内存空间是申请大小的额外空间，PREFIX_SIZE 就是表示这块额外内存空间的大小。

zmalloc根据实际使用的内存分配器判断是否需要申请额外的内存空间，然后通过 HAVE_MALLOC_SIZE 变量进行标识。

tcmalloc和Mac平台下的malloc函数族提供了计算已分配空间大小的函数，Redis不需要多申请一个PREFIX_SIZE大小的内存空间来记录分配的内存块大小，此时PREFIX_SIZE值为0。否则根据Redis服务器所在的系统平台，使用sizeof(long long)或sizeof(size_t)大小的额外空间记录申请的空间大小。

#ifdef HAVE_MALLOC_SIZE
#define PREFIX_SIZE (0)
#define ASSERT_NO_SIZE_OVERFLOW(sz)
#else
#if defined(__sun) || defined(__sparc) || defined(__sparc__)
#define PREFIX_SIZE (sizeof(long long))
#else
// size_t 是一种无符号的整型数，它在头文件中typedef为unsigned int类型
// size_t 随着系统体系结构变化，在32位操作系统上宽度可能至少为32位
#define PREFIX_SIZE (sizeof(size_t))
#endif
// PREFIX_SIZE + size 大于 size
#define ASSERT_NO_SIZE_OVERFLOW(sz) assert((sz) + PREFIX_SIZE > (sz))
#endif复制代码

ASSERT_NO_SIZE_OVERFLOW

在上面的宏定义中还包含用于检查申请内存大小是否有效的函数ASSERT_NO_SIZE_OVERFLOW，断言申请的内存大小size加上PREFIX_SIZE的和大于 size本身

used_memory

zmalloc使用used_memory变量来统计当前已分配的总内存大小，同时定义了两个原子操作，用于更新该变量的值：

// 原子操作，used_memory增加分配的内存空间size大小
#define update_zmalloc_stat_alloc(__n) atomicIncr(used_memory,(__n))
// 原子操作，used_memory减少分配的内存空间size大小
#define update_zmalloc_stat_free(__n) atomicDecr(used_memory,(__n))

static redisAtomic size_t used_memory = 0;复制代码

3、zmalloc函数

zmalloc函数用于申请指定大小内存，函数实现由2个部分组成：尝试申请内存、失败处理

void *zmalloc(size_t size) {
    // 申请内存
    void *ptr = ztrymalloc_usable(size, NULL);
    // ptr为Null时调用异常处理函数
    if (!ptr) zmalloc_oom_handler(size);
    return ptr;
}复制代码

尝试申请内存

ztrymalloc_usable函数尝试申请内存，会先检查申请的内存块大小，然后调用malloc函数进行内存分配；申请成功后统计内存大小，如果申请失败则返回NULL

void *ztrymalloc_usable(size_t size, size_t *usable) {
    // 检查申请的内存大小
    ASSERT_NO_SIZE_OVERFLOW(size);
    // 调用malloc函数进行内存申请，申请的内存块大小为 size + PREFIX_SIZE
    // 这里额外的 PREFIX_SIZE 大小的内存块，用于存放此次分配到的内存块的大小信息
    void *ptr = malloc(MALLOC_MIN_SIZE(size)+PREFIX_SIZE);

    if (!ptr) return NULL;
#ifdef HAVE_MALLOC_SIZE
    // 调用函数，获取内存大小
    size = zmalloc_size(ptr);
    // 更新已使用内存 used_memory 的值
    update_zmalloc_stat_alloc(size);
    if (usable) *usable = size;
    return ptr;
#else
    // 获取当前ptr指针对应的内存空间大小
    *((size_t*)ptr) = size;
    // 更新已使用内存 used_memory 的值
    update_zmalloc_stat_alloc(size+PREFIX_SIZE);
    if (usable) *usable = size;
    return (char*)ptr+PREFIX_SIZE;
#endif
}复制代码

实际的内存分配是由 malloc函数 完成的，malloc函数是对实际内存分配器的内存分配函数的抽象封装，用于屏蔽不同内存分配器函数的差异。例如使用tcmalloc时，调用malloc函数实际就是在调用tc_malloc函数。

除了malloc函数，calloc、realloc和free等函数也采用了相同的做法

#if defined(USE_TCMALLOC)
// 使用tcmalloc时，将tc_malloc函数重命名为malloc
#define malloc(size) tc_malloc(size)
#define calloc(count,size) tc_calloc(count,size)
#define realloc(ptr,size) tc_realloc(ptr,size)
#define free(ptr) tc_free(ptr)
#elif defined(USE_JEMALLOC)
// 使用jemalloc时，将je_malloc函数重命名为malloc
#define malloc(size) je_malloc(size)
#define calloc(count,size) je_calloc(count,size)
#define realloc(ptr,size) je_realloc(ptr,size)
#define free(ptr) je_free(ptr)
#define mallocx(size,flags) je_mallocx(size,flags)
#define dallocx(ptr,flags) je_dallocx(ptr,flags)
#endif复制代码

在调用malloc函数之前，还有个最小申请大小的处理；如果申请的内存大小size小于0，则返回储存long类型数值所需要的空间大小

/* When using the libc allocator, use a minimum allocation size to match the
 * jemalloc behavior that doesn't return NULL in this case.
 */
#define MALLOC_MIN_SIZE(x) ((x) > 0 ? (x) : sizeof(long))复制代码

OOM处理

在申请内存失败时，会调用 zmalloc_oom_handler函数 进行处理。zmalloc_oom_handler 函数的默认实现是打印“Out of memory”异常信息，并终止服务进程

static void zmalloc_default_oom(size_t size) {
    fprintf(stderr, "zmalloc: Out of memory trying to allocate %zu bytes\n",
        size);
    fflush(stderr);
    // 异常终止进程
    abort();
}

static void (*zmalloc_oom_handler)(size_t) = zmalloc_default_oom;复制代码

头文件 zmalloc.h 定义了可以指定oom异常处理逻辑的函数 zmalloc_set_oom_handler，允许通过传入函数对异常进行处理

void zmalloc_set_oom_handler(void (*oom_handler)(size_t)) {
    zmalloc_oom_handler = oom_handler;
}复制代码

在 server.c 中有对zmalloc_set_oom_handler函数的使用，服务启动时传入redisOutOfMemoryHandler函数，将异常日志打印到日志文件中：

int main(int argc, char **argv) {
    ...
    setlocale(LC_COLLATE,"");
    tzset(); /* Populates 'timezone' global. */
    // 调用函数指定oom异常处理逻辑
    zmalloc_set_oom_handler(redisOutOfMemoryHandler);
    srand(time(NULL)^getpid());
    ...

// 函数实现将错误信息打印到日志文件    
void redisOutOfMemoryHandler(size_t allocation_size) {
    serverLog(LL_WARNING,"Out Of Memory allocating %zu bytes!",
        allocation_size);
    // 打印异常并终止进程    
    serverPanic("Redis aborting for OUT OF MEMORY. Allocating %zu bytes!",
        allocation_size);
}复制代码

整理下代码逻辑，zmalloc函数内部逻辑大致流程如下：

zmalloc流程.png

4、zcalloc函数

zcalloc函数也用于申请分配内存，zcalloc函数跟zmalloc函数唯一区别是在实际分配内存空间时，调用的是calloc函数

/* Allocate memory and zero it or panic */
void *zcalloc(size_t size) {
    // 尝试分配内存
    void *ptr = ztrycalloc_usable(size, NULL);
    // oom异常处理
    if (!ptr) zmalloc_oom_handler(size);
    return ptr;
}

/* Try allocating memory and zero it, and return NULL if failed.
 * '*usable' is set to the usable size if non NULL. */
void *ztrycalloc_usable(size_t size, size_t *usable) {
    // 检查size大小
    ASSERT_NO_SIZE_OVERFLOW(size);
    // 申请分配内存，这里调用的是 calloc 函数
    void *ptr = calloc(1, MALLOC_MIN_SIZE(size)+PREFIX_SIZE);
    if (ptr == NULL) return NULL;

// 统计已分配内存大小
#ifdef HAVE_MALLOC_SIZE
    size = zmalloc_size(ptr);
    update_zmalloc_stat_alloc(size);
    if (usable) *usable = size;
    return ptr;
#else
    *((size_t*)ptr) = size;
    update_zmalloc_stat_alloc(size+PREFIX_SIZE);
    if (usable) *usable = size;
    return (char*)ptr+PREFIX_SIZE;
#endif
}复制代码

5、zfree函数

zfree函数用于内存回收，释放由zmalloc、zcalloc函数申请分配的内存空间

void zfree(void *ptr) {
#ifndef HAVE_MALLOC_SIZE
    void *realptr;
    size_t oldsize;
#endif
    // ptr指针为空时直接返回
    if (ptr == NULL) return;
#ifdef HAVE_MALLOC_SIZE
     // 更新已使用大小used_memory，释放的内存块大小由zmalloc_size函数提供
    update_zmalloc_stat_free(zmalloc_size(ptr));
    // 调用 free 函数释放内存
    free(ptr);
#else
    // 获取内存块起始地址
    realptr = (char*)ptr-PREFIX_SIZE;
    oldsize = *((size_t*)realptr);
    // 更新已使用大小used_memory
    // 释放的内存块大小为ptr指向的内存块大小加上PREFIX_SIZE
    update_zmalloc_stat_free(oldsize+PREFIX_SIZE);
    // 调用 free 函数释放内存
    free(realptr);
#endif
}复制代码

zfree函数内部的实现逻辑，也区分不同的底层库。例如使用tcmalloc，HAVE_MALLOC_SIZE变量的值为true，此时直接调用zmalloc_size函数获取ptr指针指向的内存空间大小，然后直接释放ptr即可。否则需要将ptr指针向前偏移PREFIX_SIZE字节的长度，获取到内存块实际的起始地址进行释放；计算释放的内存空间大小，也需要加上PREFIX_SIZE字节。

6、选择内存分配器

内存的分配、释放都依赖于底层使用的内存分配器（算法库），那么Redis是怎么指定底层使用的具体内存分配器的呢？

在 README.md 文件中有对内存分配器（Allocator）的描述：

Allocator
---------

Selecting a non-default memory allocator when building Redis is done by setting
the `MALLOC` environment variable. Redis is compiled and linked against libc
malloc by default, with the exception of jemalloc being the default on Linux
systems. This default was picked because jemalloc has proven to have fewer
fragmentation problems than libc malloc.

To force compiling against libc malloc, use:

    % make MALLOC=libc

To compile against jemalloc on Mac OS X systems, use:

    % make MALLOC=jemalloc复制代码

大抵意思是Redis在Linux系统上默认为jemalloc，但是可以通过设置“MALLOC”环境变量进行指定。

启动一个Redis服务验证一下，使用 info memory 命令查看运行中的Redis服务内存信息：

[root@localhost redis-6.2.6]# ./src/redis-cli info memory |grep mem_allocator
mem_allocator:jemalloc-5.1.0复制代码

可以确定，默认情况下Redis使用的内存分配器是jemalloc。接着尝试指定内存分配器，Redis的内存分配器在程序编译时进行指定，所以需要编译redis源码；在使用make命令编译时，直接指定使用的内存分配器：

[root@localhost redis-6.2.6]# make MALLOC=libc
...复制代码

运行编译好的Redis服务后，通过 redis-cli 直接查看，可以发现使用的内存分配器为 libc：

[root@localhost redis-6.2.6]# ./src/redis-cli info memory |grep mem_allocator
mem_allocator:libc复制代码

源码中初始化选择内存分配器的逻辑，在 zmalloc.h 中通过判断变量引入不同的头文件来实现。

// 使用tcmalloc时，引入google/tcmalloc.h文件
#if defined(USE_TCMALLOC)
#define ZMALLOC_LIB ("tcmalloc-" __xstr(TC_VERSION_MAJOR) "." __xstr(TC_VERSION_MINOR))
#include <google/tcmalloc.h>
#if (TC_VERSION_MAJOR == 1 && TC_VERSION_MINOR >= 6) || (TC_VERSION_MAJOR > 1)
#define HAVE_MALLOC_SIZE 1
#define zmalloc_size(p) tc_malloc_size(p)
#else
#error "Newer version of tcmalloc required"
#endif

// 使用jemalloc时，，引入jemalloc/jemalloc.h文件
#elif defined(USE_JEMALLOC)
#define ZMALLOC_LIB ("jemalloc-" __xstr(JEMALLOC_VERSION_MAJOR) "." __xstr(JEMALLOC_VERSION_MINOR) "." __xstr(JEMALLOC_VERSION_BUGFIX))
#include <jemalloc/jemalloc.h>
#if (JEMALLOC_VERSION_MAJOR == 2 && JEMALLOC_VERSION_MINOR >= 1) || (JEMALLOC_VERSION_MAJOR > 2)
#define HAVE_MALLOC_SIZE 1
#define zmalloc_size(p) je_malloc_usable_size(p)
#else
#error "Newer version of jemalloc required"
#endif

// 使用Mac时，引入 malloc/malloc.h 文件
#elif defined(__APPLE__)
#include <malloc/malloc.h>
#define HAVE_MALLOC_SIZE 1
#define zmalloc_size(p) malloc_size(p)
#endif

 // 否则使用libc，此时未声明 HAVE_MALLOC_SIZE 
#ifndef ZMALLOC_LIB
#define ZMALLOC_LIB "libc"复制代码

通过条件编译逻辑可以知道，zmalloc根据Makefile定义的不同变量进行判断，查看 Makefile 文件：

# Default allocator defaults to Jemalloc if it's not an ARM
MALLOC=libc
ifneq ($(uname_M),armv6l)
ifneq ($(uname_M),armv7l)
ifeq ($(uname_S),Linux)
   MALLOC=jemalloc
endif
endif
endif

# To get ARM stack traces if Redis crashes we need a special C flag.
ifneq (,$(filter aarch64 armv,$(uname_M)))
        CFLAGS+=-funwind-tables
else
ifneq (,$(findstring armv,$(uname_M)))
        CFLAGS+=-funwind-tables
endif
endif

# Backwards compatibility for selecting an allocator
ifeq ($(USE_TCMALLOC),yes)
   MALLOC=tcmalloc
endif

ifeq ($(USE_TCMALLOC_MINIMAL),yes)
   MALLOC=tcmalloc_minimal
endif

ifeq ($(USE_JEMALLOC),yes)
   MALLOC=jemalloc
endif

ifeq ($(USE_JEMALLOC),no)
   MALLOC=libc
endif
...
ifeq ($(MALLOC),tcmalloc)
   FINAL_CFLAGS+= -DUSE_TCMALLOC
   FINAL_LIBS+= -ltcmalloc
endif

ifeq ($(MALLOC),tcmalloc_minimal)
   FINAL_CFLAGS+= -DUSE_TCMALLOC
   FINAL_LIBS+= -ltcmalloc_minimal
endif

ifeq ($(MALLOC),jemalloc)
   DEPENDENCY_TARGETS+= jemalloc
   FINAL_CFLAGS+= -DUSE_JEMALLOC -I../deps/jemalloc/include
   FINAL_LIBS := ../deps/jemalloc/lib/libjemalloc.a $(FINAL_LIBS)
endif复制代码

Makefile 文件中判断到 $(USE_JEMALLOC) 参数的值为“no”时，将会使用 libc 默认的内存分配器，那似乎可以来点这样的操作：

[root@localhost redis-6.2.6]# make USE_JEMALLOC=no
cd src && make all
make[1]: Entering directory `/home/redis-6.2.6/src'
    CC Makefile.dep
    ...
Hint: It's a good idea to run 'make test' ;)
make[1]: Leaving directory `/home/dpm/redis-6.2.6/src'  复制代码

运行编译好的Redis服务后，通过 redis-cli 直接查看，可以发现使用的内存分配器为 libc：

[root@localhost redis-6.2.6]# ./src/redis-cli info memory |grep mem_allocator
mem_allocator:libc复制代码

7、内存大小与碎片率

在查看Redis服务器内存信息的时候，会发现里面包含有内存碎片率mem_fragmentation_ratio信息，但是在zmalloc中只有一个全局变量used_memory用于统计已分配内存大小。那么碎片率是怎么来的？

[root@localhost redis-6.2.6]# redis-cli info memory
# Memory
used_memory:934384
used_memory_rss:2830336
...
mem_fragmentation_ratio:3.20
mem_fragmentation_bytes:1946360
...复制代码

获取内存大小

先来看看获取内存大小的函数 zmalloc_used_memory：

size_t zmalloc_used_memory(void) {
    size_t um;
    atomicGet(used_memory,um);
    return um;
}复制代码

zmalloc_used_memory函数直接获取used_memory的值进行返回，这里使用了个原子的赋值操作，除此之外也就没有其他什么逻辑了。

我们知道used_memory是Redis自身统计维护的内存大小总数，实际上操作系统分配给Redis进程的内存大小未必就是这个数。例如在调用free函数释放内存空间时，zmalloc将used_memory的值减去了被释放的空间大小，但是free函数内部的实现为了减少调用系统调用接口，可能并没有实际释放这部分内存空间，而是由进程继续持有着这块空间，用于下次malloc函数申请内存时复用。

zmalloc定义了从操作系统角度获取当前Redis进程分配内存大小的函数 zmalloc_get_rss。既然从操作系统的角度获取进程的内存占用大小，那么就需要区分不同的操作系统实现。

我们正常需要获取进程的内存分配的情况，可以通过虚拟文件 /proc/$pid/stat 得到。 /proc 以文件系统的形式为内核和进程提供通信的接口，实际上并不存储在磁盘上，而是系统内存的映射。

先查询Redis服务的进程号，然后直接看进程号对应的 /stat 文件：

// 1、获取进程号
[root@localhost ~]# ps -ef|grep redis
root     17046     1  0 Nov08 ?        00:06:36 ./src/redis-server *:6379
// 或者直接查看redis的info信息
[root@localhost ~]# ./redis-6.2.6/src/redis-cli info |grep process_id
process_id:17046
// 2、获取进程信息
[root@localhost ~]# cat /proc/17046/stat
17046 (redis-server) S 1 17046 14637 0 -1 4202752 778 0 0 0 19039 20606 0 0 20 0 4 0 
182674605 146497536 692 18446744073709551615 4194304 5576996 140724681065824 
140724681065280 139868670661187 0 0 4097 17642 18446744073709551615 0 0  17 1 0 0 0 0 0复制代码

/state文件的第24个数据项就是进程当前驻留物理地址空间的大小，单位是page（物理内存页）。这里可以看到，进程当前占用了692个内存页。

系统内存页的大小，可以通过getconf命令获取：

[root@localhost ~]# getconf PAGESIZE
4096复制代码

单个页面大小为 4096 字节，692个内存页也就是 2834432 字节。看下Redis的内存信息 used_memory_rss ，跟计算出来的数据是一致的。

[root@localhost redis-6.2.6]# ./src/redis-cli info memory |grep mem |grep rss
used_memory_rss:2834432
used_memory_rss_human:2.70M复制代码

再来看下zmalloc_get_rss函数的实现，能够发现逻辑几乎是一致的：

#if defined(HAVE_PROC_STAT)
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>

size_t zmalloc_get_rss(void) {
    int page = sysconf(_SC_PAGESIZE);
    size_t rss;
    char buf[4096];
    char filename[256];
    int fd, count;
    char *p, *x;

    snprintf(filename,256,"/proc/%ld/stat",(long) getpid());
    if ((fd = open(filename,O_RDONLY)) == -1) return 0;
    if (read(fd,buf,4096) <= 0) {
        close(fd);
        return 0;
    }
    close(fd);

    p = buf;
    count = 23; /* RSS is the 24th field in /proc/<pid>/stat */
    while(p && count--) {
        p = strchr(p,' ');
        if (p) p++;
    }
    if (!p) return 0;
    x = strchr(p,' ');
    if (!x) return 0;
    *x = '\0';

    rss = strtoll(p,NULL,10);
    rss *= page;
    return rss;
}复制代码

如果有 /proc/ $p i d / s t a t * 文件，则从该文件中获取。在不同的操作系统下，可能并不一定有 * / p r o c /$ pid/stat 文件，所以zmalloc还提供了其他的实现。例如读取 /proc/$pid/psinfo 文件、调用task_info函数，在上面这些信息都没有的情况下，就返回 used_memory 。完整的函数定义逻辑如下：

/* Get the RSS information in an OS-specific way.
 *
 * WARNING: the function zmalloc_get_rss() is not designed to be fast
 * and may not be called in the busy loops where Redis tries to release
 * memory expiring or swapping out objects.
 *
 * For this kind of "fast RSS reporting" usages use instead the
 * function RedisEstimateRSS() that is a much faster (and less precise)
 * version of the function. */

#if defined(HAVE_PROC_STAT)
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>

size_t zmalloc_get_rss(void) {
    int page = sysconf(_SC_PAGESIZE);
    size_t rss;
    char buf[4096];
    char filename[256];
    int fd, count;
    char *p, *x;

    snprintf(filename,256,"/proc/%ld/stat",(long) getpid());
    if ((fd = open(filename,O_RDONLY)) == -1) return 0;
    if (read(fd,buf,4096) <= 0) {
        close(fd);
        return 0;
    }
    close(fd);

    p = buf;
    count = 23; /* RSS is the 24th field in /proc/<pid>/stat */
    while(p && count--) {
        p = strchr(p,' ');
        if (p) p++;
    }
    if (!p) return 0;
    x = strchr(p,' ');
    if (!x) return 0;
    *x = '\0';

    rss = strtoll(p,NULL,10);
    rss *= page;
    return rss;
}
#elif defined(HAVE_TASKINFO)
#include <sys/types.h>
#include <sys/sysctl.h>
#include <mach/task.h>
#include <mach/mach_init.h>

size_t zmalloc_get_rss(void) {
    task_t task = MACH_PORT_NULL;
    struct task_basic_info t_info;
    mach_msg_type_number_t t_info_count = TASK_BASIC_INFO_COUNT;

    if (task_for_pid(current_task(), getpid(), &task) != KERN_SUCCESS)
        return 0;
    task_info(task, TASK_BASIC_INFO, (task_info_t)&t_info, &t_info_count);

    return t_info.resident_size;
}
#elif defined(__FreeBSD__) || defined(__DragonFly__)
#include <sys/types.h>
#include <sys/sysctl.h>
#include <sys/user.h>

size_t zmalloc_get_rss(void) {
    struct kinfo_proc info;
    size_t infolen = sizeof(info);
    int mib[4];
    mib[0] = CTL_KERN;
    mib[1] = KERN_PROC;
    mib[2] = KERN_PROC_PID;
    mib[3] = getpid();

    if (sysctl(mib, 4, &info, &infolen, NULL, 0) == 0)
#if defined(__FreeBSD__)
        return (size_t)info.ki_rssize * getpagesize();
#else
        return (size_t)info.kp_vm_rssize * getpagesize();
#endif

    return 0L;
}
#elif defined(__NetBSD__)
#include <sys/types.h>
#include <sys/sysctl.h>

size_t zmalloc_get_rss(void) {
    struct kinfo_proc2 info;
    size_t infolen = sizeof(info);
    int mib[6];
    mib[0] = CTL_KERN;
    mib[1] = KERN_PROC;
    mib[2] = KERN_PROC_PID;
    mib[3] = getpid();
    mib[4] = sizeof(info);
    mib[5] = 1;
    if (sysctl(mib, 4, &info, &infolen, NULL, 0) == 0)
        return (size_t)info.p_vm_rssize * getpagesize();

    return 0L;
}
#elif defined(HAVE_PSINFO)
#include <unistd.h>
#include <sys/procfs.h>
#include <fcntl.h>

size_t zmalloc_get_rss(void) {
    struct prpsinfo info;
    char filename[256];
    int fd;

    snprintf(filename,256,"/proc/%ld/psinfo",(long) getpid());

    if ((fd = open(filename,O_RDONLY)) == -1) return 0;
    if (ioctl(fd, PIOCPSINFO, &info) == -1) {
        close(fd);
   return 0;
    }

    close(fd);
    return info.pr_rssize;
}
#else
size_t zmalloc_get_rss(void) {
    /* If we can't get the RSS in an OS-specific way for this system just
     * return the memory usage we estimated in zmalloc()..
     *
     * Fragmentation will appear to be always 1 (no fragmentation)
     * of course... */
    return zmalloc_used_memory();
}
#endif复制代码

内存碎片率

知道了Redis向系统申请的内存大小，也知道了系统实际分配给Redis进程的内存大小，似乎就不难得出内存碎片率了：

mem_fragmentation_ratio（内存碎片率）= used_memory_rss/used_memory复制代码

来看下Redis的实现，内存碎片率 mem_fragmentation_ratio 信息对应 redisMemOverhead 对象的total_frag属性，计算逻辑在 object.c 文件的 getMemoryOverheadData函数中：

struct redisMemOverhead *getMemoryOverheadData(void) {
    int j;
    size_t mem_total = 0;
    size_t mem = 0;
    size_t zmalloc_used = zmalloc_used_memory();
    struct redisMemOverhead *mh = zcalloc(sizeof(*mh));
    
    mh->total_allocated = zmalloc_used;
    mh->startup_allocated = server.initial_memory_usage;
    mh->peak_allocated = server.stat_peak_memory;
    // 内存碎片率 = 系统分配内存大小 / Redis服务器统计内存大小
    mh->total_frag =
        (float)server.cron_malloc_stats.process_rss / server.cron_malloc_stats.zmalloc_used;
    // 内存碎片空间大小 = 系统分配内存大小 - Redis服务器统计内存大小    
    mh->total_frag_bytes =
        server.cron_malloc_stats.process_rss - server.cron_malloc_stats.zmalloc_used;复制代码

管理碎片率

一般认为，合理的内存碎片率应该控制在 1~1.5 之间。

（1）碎片率过低

如果内存碎片率低于1，那么说明系统分配给Redis的内存不能满足实际需求，此时的Redis实例可能会把部分数据交换到磁盘上。因为磁盘I/O的读写速度远远慢与内存读写，频繁的磁盘换出换入操作会给Redis带来性能问题。

内存碎片率过低的问题，根据情况可以通过扩展物理内存、调整Redis实例的maxmemory配置或者是禁用SWAP 来解决。

Redis的配置文件 redis.conf 提供了实例可用最大内存配置项maxmemory，搭配maxmemory-policy配置项控制Redis可使用内存空间的大小，以避免将数据交换到磁盘交换区：

# Redis使用内存到达指定值时，根据policy删除keys
# 如果无法删除keys腾出空间，Redis会拒绝执行需要申请内存空间的命令，并返回错误信息，例如set、lpush等
maxmemory <bytes>
# 最大内存的处理策略，在达到maxmemory时根据此配置项进行处理
maxmemory-policy noeviction复制代码

Redis服务的生产环境通常是建议禁用SWAP的，进程的内存使用情况可以通过查看 /proc/$pid/smap 文件辅助判断：

// 系统swap使用情况
[root@localhost redis-6.2.6]# free -h
             total       used       free     shared    buffers     cached
Mem:           23G        22G       887M       200K       585M       4.6G
-/+ buffers/cache:        17G       6.0G             
Swap:         7.7G        74M       7.7G
// 进程swap使用情况
[root@localhost redis-6.2.6]# cat /proc/17046/smaps |egrep '^(Swap|Size)'
Size:               1352 kB
Swap:                  0 kB
Size:                 48 kB
Swap:                  0 kB
Size:                116 kB
...复制代码

数据中的Swap部分表示的就是进程被swap到交换空间的大小（不包含mmap内存）。

（2）碎片率过高

相反地，如果内存碎片率大于1.5，那么说明此时的Redis有较大的内存浪费。

问题产生的原因，是Redis释放了内存，但是内存分配器（Allocator）并没有将这部分内存返还给操作系统。从我们对内存分配的了解可知，这是并不是Redis的特性，而是malloc函数导致的。

Redis提供了自动内存碎片整理功能（defragmentation），通过 redis.conf 文件的配置项 activedefrag 启用：

# 是否启用碎片整理功能
# 默认no，运行时通过 CONFIG SET activedefrag yes 可启用
activedefrag no
# 最小碎片量，达到阈值开始内存碎片整理；默认100mb
active-defrag-ignore-bytes 100mb
# 最小碎片百分比，内存碎片超过指定百分比时开始整理；默认10%
active-defrag-threshold-lower 10
# 最大碎片百分比，内存碎片超过指定百分比时尽最大努力整理
active-defrag-threshold-upper 100
# 内存自动整理占用资源最小百分比
active-defrag-cycle-min 1
# 内存自动整理占用资源最大百分比
active-defrag-cycle-max 25复制代码

内存碎片整理功能defragmentation允许Redis实例在运行中对碎片空间进行回收，而不需要重启服务。不过需要注意的是，Redis只有在使用jemalloc内存分配器时才支持该功能。

作者：张小胜
链接：https://juejin.cn/post/7031116740796874760