linux系统监控示例：vmstat,linux系统状态监控程序

摘要：

一、基本演示：[nwom@WLAN-linux-3~]$vmstat-n210（[nwom@WLAN-linux-3~]Vmstat–n210每2秒执行10次采样）procs-----------内存-----------交换-------io-------系统-------cpu-------rbswpdfreebuffcachesisobiboincsussyid

一。基础演示：

[nwom@WLAN-linux-3 ~]$ vmstat -n 2 10 ([nwom@WLAN-linux-3~]vmstat –n 2 10 以每2秒钟的频率执行10次取样)
procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
19  0    192 1120896 386040 14156336    0    0     1    18    0    0 10  8 82  0  0
14  0    192 1120976 386048 14156348    0    0     0    86 1015 114979 12 30 58  0  0
 7  0    192 1121204 386048 14156380    0    0     0    44 1001 113762 12 30 58  0  0
 7  0    192 1121448 386048 14156380    0    0     0    44 1005 116078 12 30 58  0  0
 6  0    192 1121620 386048 14156380    0    0     0    94 1008 115518 12 30 58  0  0
 5  0    192 1121744 386048 14156380    0    0     0     0 1010 112765 12 29 59  0  0
10  0    192 1121968 386048 14156388    0    0     0     0 1004 113684 12 30 58  0  0
30  0    192 1121972 386048 14156388    0    0     0   192 1012 111992 15 31 54  0  0
 6  0    192 1122200 386048 14156388    0    0     0     0 1009 112802 15 31 53  0  0
12  0    192 1121984 386048 14156388    0    0     0    38 1007 113815 12 30 58  0  0
[nwom@WLAN-linux-3 ~]$

注释：vmstat输出结果中第一行展示的是自最后一次启动以来的平均值，所以此行可以忽略。

输出栏位如下：

Process（procs）

r ：等待运行时间的进程数。
b ：处于不可中断睡眠状态的进程数。

Memory

swpd ：虚拟内存使用量（KB）。
free ：空闲内存量（KB）。
buff ：用作buffer的内存量（KB）。
cache ：用作cache的内存量（KB）。

swap

si ：从硬盘交换到内存的数量（KBps）。
so ：交换到硬盘的内存数量（KBps）。

bi：发送到块设备的块的数量（blocks/s）。
bo：从块设备获取的块的数量（blocks/s）。

System

in ：每秒钟的中断数量，包括时钟中断。
cs ：每秒钟上下文交换的数量。

CPU（整个CPU时间的百分比）

us ：花费在非内核代码的CPU时间（用户时间，包括Nice时间）。
sy ：花费在内核代码的CPU时间（系统时间）。
id ：空闲时间。在2.5.41内核以前，还包括I/O等待时间。
wa ：IO等待时间。在2.5.41内核以前，显示为0。

vmstat命令提供了许多命令行参数，使用man手册查看参数的详细文档。常用的参数有：

-m ：显示内核的内存使用情况（slabs）

-a ：显示活动和非活动内存分页相关信息

-n ：只显示一次栏位名称行，当在取样模式通下将输出信息存储到文件时非常有用。
（例如，root#vmstat –n 2 5 以每2秒钟的频率执行5次取样）

备注：
如果 r经常大于4 ，且id经常少于40，表示cpu的负荷很重。
如果空闲时间(id)持续为0并且系统时间(sy)是用户时间(us)两倍系统则面临着CPU资源的短缺

二、系统监控的实验:

　　以下实验转自(http://home.lupaworld.com/home-space-uid-56821-do-blog-id-233122.html)，感兴趣的可以动手操作

实例一:大量的算术运算

 1 #本程序会进入一个死循环,不断的进行求平方根的操作,模拟大量的算术运算的环境.
 2 #测试源程序如下:
 3 #include <stdio.h> 
 4 #include <math.h> 
 5 #include <unistd.h> 
 6 #include <stdlib.h> 
 7  
 8 void 
 9 run_status(void) 
10 { 
11     double pi = M_PI; 
12     double pisqrt; 
13     long i; 
14     while(1){ 
15         pisqrt = sqrt(pi); 
16     } 
17 } 
18  
19 int main (void) 
20 { 
21     run_status(); 
22     exit(EXIT_SUCCESS); 
23 } 
24 #编译
25 gcc run.c -o run -lm
26 #运行
27 ./run&

监测：

root@debian6:~# vmstat 1
procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa
 0  0      0 772300  42420 175356    0   31     5   138   22   14  0  0 99  0
 1  0      0 772292  42420 175356    0    0     0     0   45   22  5  0 95  0
 1  0      0 772284  42420 175356    0    0     0     0  276   15 100  0  0  0
 1  0      0 772284  42420 175356    0    0     0     0  298   12 100  0  0  0
 1  0      0 772284  42420 175356    0    0     0     0  273   11 100  0  0  0
 1  0      0 772284  42420 175356    0    0     0     0  278   16 100  0  0  0
 1  0      0 772284  42420 175356    0    0     0     0  276   14 100  0  0  0
 1  0      0 772284  42420 175356    0    0     0     0  275   16 100  0  0  0
 1  0      0 772284  42420 175356    0    0     0     0  284   14 99  1  0  0
 1  0      0 772284  42420 175356    0    0     0     0  285   14 100  0  0  0
 1  0      0 772284  42420 175356    0    0     0     0  281   13 100  0  0  0
 1  0      0 772284  42420 175356    0    0     0     0  270   18 100  0  0  0
 0  0      0 772292  42420 175356    0    0     0     0   51   28  4  0 96  0
 0  0      0 772292  42420 175356    0    0     0     0   25   11  0  0 100  0

从上面可以看出：

1. r表示在运行队列中等待的进程数，上面的数据表示r=1，一直有进程在等待。

2. in表示每秒的中断数，包括时钟中断，运行队列中有等待的进程(看参数r的值)，中断数in就上来了

3. us表示用户进程使用的cpu时间，随着r=1，用户的cpu占用时间直接达到了100%

4. id表示cpu的空闲时间，一开始的时候id很高，达到95%，后来程序开始跑，cpu一直处于繁忙状态(看参数r，us的值)，id就一直为0，等程序终止，id就是上去了

实例二:大量的系统调用

 1 #本脚本会进入一个死循环,不断的执行cd命令,从而模拟大量系统调用的环境
 2 #测试脚本如下:
 3 #!/bin/bash 
 4  
 5 while (true) 
 6 do 
 7  cd ; 
 8 done 
 9 
10 运行
11 ./loop.sh

监测：

root@debian6:~# vmstat 1
procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa
 0  0      0 772300  42500 175364    0   30     5   136   22   14  0  0 99  0
 0  0      0 772300  42500 175364    0    0     0     0   27   14  0  0 100  0
 1  0      0 772220  42500 175364    0    0     0     0  213 2482  6 70 24  0
 1  0      0 772204  42500 175364    0    0     0     0  283 3298  8 92  0  0
 1  0      0 772204  42500 175364    0    0     0     0  281 3343  5 95  0  0
 1  0      0 772204  42500 175364    0    0     0     0  283 3381  5 95  0  0
 1  0      0 772204  42500 175364    0    0     0     0  271 3362  8 92  0  0
 1  0      0 772204  42508 175356    0    0     0    12  267 3359  8 92  0  0
 0  0      0 772276  42508 175364    0    0     0     0  253 2883  8 76 16  0
 0  0      0 772276  42508 175364    0    0     0     0   29   12  0  0 100  0
 0  0      0 772276  42508 175364    0    0     0     0   39   18  0  0 100  0

结论：

　　随着程序不断调用cd命令，运行队列有等待的进程r(看参数r)，每秒的中断数in(看参数in)，下文切换的次数cs骤然提高(看参数cs)，系统占用的cpu时间sy(看参数sy)也不断提高，cpu空闲时间id(看参数id)一直为0。当程序终止的时候，r，in，cs，sy数据都下来了，id上去了，表示系统已经空闲下来了。

实例三:大量的io操作

1.用dd命令,从/dev/zero读数据,写入到/tmp/data文件中,如下:

dd if=/dev/zero of=/tmp/data bs=1M count=1000

监测：

root@debian6:~# vmstat 1
procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 0  0      0 302160  25868 149004    0    0    77   116 1027  151 14 17 69  0  0
 1  0      0 302160  25868 149004    0    0     0     0 1018   35  0  1 99  0  0
 3  0      0 134884  26032 311628    0    0     0 109872 1423  102  0 100  0  0  0
 1  0      0  14596  26148 428808    0    0     0 117208 1372  120  0 100  0  0  0
 1  0      0   6224  22908 440592    0    0     4 64944 1305  322  0 98  0  2  0
 1  0      0   5976  21836 441016    0    0     4 79072 1447  162  0 51  0 49  0
 0  2      0   5716  21956 439672    0    0     4 79016 1431  374  0 81  0 19  0
 2  2      0   6180  22044 438064    0    0     0 61432 1392  285  0 61  0 39  0
 2  2      0   6912  22104 436828    0    0     4 73980 1486  253  1 59  0 40  0
 0  4      0   5876  14132 448856    0    0     8 63784 1378  313  0 69  0 31  0
 0  2      4   5980   4140 457860    0    0     0 46756 1399  274  0 65  0 35  0
 1  3      4   6060   3892 457580    0    0     8 69876 1398  214  0 46  0 54  0
 1  4      4   6120   2872 457348    0    0     0 59920 1364  327  0 71  0 29  0

注:dd不断的向磁盘写入数据,所以bo的值会骤然提高,而cpu的wait数值也变高,说明由于大量的IO操作,系统的瓶径出现在低速的设备上.由于对文件系统的写入操作,cache也从149004KB提高到了457348KB,又由于大量的写中断调用,in的值也从1018提高到1364.

2.还用dd命令,这回从/tmp/data文件读,写到/dev/null文件中,如下:

dd if=/tmp/test1 of=/dev/null bs=1M

监测：

root@debian6:~# vmstat 1
procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 0  0     60   7056   2492 464560    0    0   177   517 1028  116 10 12 78  1  0
 0  0     60   7056   2492 464560    0    0     0     0 1006   32  0  0 100  0  0
 0  1     60   5768   2296 465032    0    4 94340     4 1514  252  0 65 17 18  0
 1  1     60   5876   2220 466032    0    0 150148    56 1770  306  0 93  0  7  0
 0  1     60   5792   2180 467152    0    0 98872     0 1598  281  0 81  0 19  0
 0  1     60   6308    988 469816    0   52 89556    52 1722  303  0 88  0 12  0
 2  1     60   5620   1004 470488    0    0 79052     0 1671  690  0 72  0 28  0
 0  1     60   6548   1028 469540    0    0 67392     4 1535  657  1 66  0 33  0
 1  1     60   5648   1060 470588    0    0 47408    16 1400  482  0 44  0 56  0
 0  1     60   6368   1088 469836    0    0 70212     0 1561  666  0 66  0 34  0

注:dd不断的从/tmp/data磁盘文件中读取数据,所以bi的值会骤然变高,最后我们看到b(不可中断进程数)也由0变成了1.

3.#接下来我们继续用dd命令,把数据写到/dev/ram1里,如下:

dd if=/dev/zero of=/dev/ram1 bs=1M count=16
16+0 records in
16+0 records out
16777216 bytes (17 MB) copied, 0.0635522 seconds, 264 MB/s

监测：

root@debian6:~# vmstat 1
procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 0  0     60   6156   6256 466280    0    0   366   480 1029  111  9 11 79  1  0
 0  0     60   6156   6256 466280    0    0     0     0 1011   32  0  0 100  0  0
 0  0     60   6156   6256 466292    0    0    12     0 1031   65  0  3 96  1  0
 0  0     60   6156   6264 466284    0    0     0    48 1022   48  0  1 99  0  0
 0  0     60   6148  17920 454652    0    0     0     4 1021   81  0  8 92  0  0
 0  0     60   6148  17920 454652    0    0     0     0 1013   32  1  0 99  0  0
 0  0     60   6148  17920 454652    0    0     0     0 1016   36  0  1 99  0  0
 0  0     60   6148  17920 454652    0    0     0     0 1006   31  0  0 100  0  0
 0  0     60   6148  17920 454652    0    0     0     0 1026   42  0  0 100  0  0

注:dd从/dev/zero读取数据,写入到/dev/ram1里面,由于/dev/ram1是设备节点,所以buff会增加.

实例四:大量的占用内存

 1 #本程序会不断分配内存,直到系统崩溃
 2 #include <stdio.h> 
 3 #include <string.h> 
 4 #include <stdlib.h> 
 5  
 6 int main (int argc, char *argv[]) 
 7 { 
 8     void *ptr; 
 9     int n = 0; 
10     while (1){ 
11         ptr = malloc(0x100000); 
12  
13         if (ptr == NULL) 
14             break; 
15  
16         memset(ptr, 1, 0x100000); 
17         printf("malloced %d MB\n", ++n); 
18     } 
19  
20     pause(); 
21 } 
22 #编译
23 gcc callmem.c -o callmem
24 #运行
25 ./callmem

监测：

root@debian6:~# vmstat 1
procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa
 0  0      0 880944  70656  51692    0    0   125    13   27   35  0  2 97  1
 0  0      0 880944  70656  51692    0    0     0     0   17   12  0  0 100  0
 1  0      0 733344  70656  51692    0    0     0     0  259  339  2 52 46  0
 1  0      0 312240  70656  51692    0    0     0     0  484  674  2 98  0  0
 1  0      0 152776  70656  51692    0    0     0     0  417  469  0 100  0  0
 0  2      0  12396  68868  45748    0    0     0     0  410  444  1 97  0  2
 1  0    652 605960  60932  39120    0  908     0   908  141  130  0 34  0 66
 0  0    524 903632  60932  39136    0    0     0     0   32   14  0  3 97  0
 0  0    524 903632  60932  39136    0    0     0     0   13    8  0  0 100  0
 0  0    524 903632  60932  39136    0    0     0     0   13    9  0  0 100  0
 0  0    524 903632  60932  39136   32    0    32     0   14   12  0  0 99  1
 0  0    524 903632  60932  39136    0    0     0     0   15   18  0  0 100  0
 0  0    524 903632  60932  39140    0    0     0     0   26    8  0  0 100  0
 0  0    524 903632  60932  39140    0    0     0     0   20    7  0  0 100  0
 0  0    524 903632  60932  39140    0    0     0     0   18    9  0  0 100  0
 0  0    524 903632  60932  39140    0    0     0     0   17   19  0  0 100  0
 0  0    524 903632  60932  39140    0    0     0     0   17   12  0  0 100  0

注:我们看到cache迅速减少,而swpd迅速增加,这是因为系统为了分配给新的程序,而从cache(文件系统缓存)回收空间,当空间依然不足时,会用到swap空间.而于此同时,si/so也会增加,尤其是so,而swap属于磁盘空间,所以bo也会增加

linux系统监控示例：vmstat

相关文章

Linux进程调度与源码分析（三）——do_fork()的实现原理

谁动了我的cpu——oprofile使用札记

[Linux之旅一] .NET Core 2.2部署到Docker中

Mac电脑mds_store进程占用cpu过高

Linux操作系统的curl命令的基本使用

教你摸清 Linux PC 的性能底细？

最新文章

随机推荐

思享工具箱导航

JSON工具

格式化转换

加解密编码

文本数字

网络

站长

计算

其他

对照列表