Administrator
Administrator
Published on 2026-05-13 / 7 Visits
0
0

Nginx编译安装(主动健康检查、流量统计)

1. 环境准备

1.1 操作系统要求

项目 要求
操作系统 Ubuntu 22.04 / CentOS 7+ / Debian 11+
内核版本 ≥ 3.10(支持 epoll)
权限 root 或具有 sudo 权限的用户

1.2 安装编译依赖

# Ubuntu / Debian
apt-get update
apt-get install -y \
    build-essential \
    libpcre3 libpcre3-dev \
    zlib1g zlib1g-dev \
    libssl-dev \
    libgeoip-dev \
    libxml2-dev \
    libxslt1-dev \
    libgd-dev \
    wget \
    curl \
    patch \
    git

# CentOS / RHEL
yum groupinstall -y "Development Tools"
yum install -y \
    pcre pcre-devel \
    zlib zlib-devel \
    openssl openssl-devel \
    geoip geoip-devel \
    libxml2 libxml2-devel \
    libxslt libxslt-devel \
    gd gd-devel \
    wget curl patch git

1.3 安装 Docker

# 安装 Docker Engine
curl -fsSL https://get.docker.com | sh

# 启动并设置开机自启
systemctl enable docker
systemctl start docker

# 安装 docker compose
apt-get install -y docker-compose-plugin

# 验证
docker --version
docker compose version

1.4 确认目录规划

用途 路径
Nginx 安装根目录 /usr/local/nginx
配置文件目录 /usr/local/nginx/conf
日志目录 /var/log/nginx
网站根目录 /usr/local/nginx/html
Nginx 源码 /opt/src/
mkdir -p /opt/src /var/log/nginx

2. 编译安装 Nginx

2.1 下载 Nginx 源码

cd /opt/src

# 下载稳定版(当前最新稳定版,按需调整版本号)
NGINX_VERSION=1.26.3
wget https://nginx.org/download/nginx-${NGINX_VERSION}.tar.gz
tar zxvf nginx-${NGINX_VERSION}.tar.gz

版本选择建议:

  • mainline(奇数版本,如 1.27.x):包含最新功能,但稳定性稍低
  • stable(偶数版本,如 1.26.x):生产环境首选

2.2 下载第三方模块

cd /opt/src

# nginx_upstream_check_module — 主动健康检查
wget https://github.com/yaoweibin/nginx_upstream_check_module/archive/refs/tags/v0.4.0.tar.gz \
     -O nginx_upstream_check_module-0.4.0.tar.gz
tar zxvf nginx_upstream_check_module-0.4.0.tar.gz

# nginx-module-vts — 流量统计(可选但推荐)
git clone https://github.com/vozlt/nginx-module-vts.git

2.3 打补丁(upstream_check_module 必须步骤)

cd /opt/src/nginx-${NGINX_VERSION}

# 查看可用补丁文件
ls /opt/src/nginx_upstream_check_module-0.4.0/*.patch

# 根据 Nginx 版本选择对应补丁
# Nginx 1.26.x / 1.25.x / 1.24.x / 1.22.x → check_1.20.1+.patch
patch -p1 < /opt/src/nginx_upstream_check_module-0.4.0/check_1.20.1+.patch

补丁版本速查:

Nginx 版本 使用补丁
1.20.x ~ 1.26.x check_1.20.1+.patch
1.18.x check_1.16.1+.patch
1.14.x check_1.14.0+.patch

2.4 配置编译选项

cd /opt/src/nginx-${NGINX_VERSION}

./configure \
    --prefix=/usr/local/nginx \
    --user=nginx \
    --group=nginx \
    --with-http_ssl_module \
    --with-http_v2_module \
    --with-http_realip_module \
    --with-http_gzip_static_module \
    --with-http_stub_status_module \
    --with-http_sub_module \
    --with-stream \
    --with-stream_ssl_module \
    --with-pcre \
    --with-threads \
    --add-module=/opt/src/nginx_upstream_check_module-0.4.0 \
    --add-module=/opt/src/nginx-module-vts

常用编译参数说明:

参数 说明
--with-http_ssl_module HTTPS 支持
--with-http_v2_module HTTP/2 支持
--with-http_realip_module 获取真实客户端 IP(CDN/代理场景)
--with-http_stub_status_module 基础状态监控接口
--with-stream TCP/UDP 代理(四层负载均衡)
--with-threads 线程池支持
--add-module=PATH 添加第三方模块

2.5 编译并安装

# 编译(-j 后跟 CPU 核心数,加速编译)
make -j$(nproc)

# 安装
make install

⚠️ 注意:安装完成后不要立即执行 nginx -t
nginx.confuser nginx; 引用了系统用户 nginx,必须先完成 2.6 节的用户创建,
否则会报 getpwnam("nginx") failed 错误。

2.6 创建 nginx 系统用户

# 创建无登录权限的系统用户
useradd -r -s /sbin/nologin -d /usr/local/nginx nginx

# 调整目录权限
chown -R nginx:nginx /usr/local/nginx/html
chown -R nginx:nginx /var/log/nginx

2.7 创建 systemd 服务

cat > /etc/systemd/system/nginx.service << 'EOF'
[Unit]
Description=Nginx HTTP Server
After=network.target

[Service]
Type=forking
PIDFile=/usr/local/nginx/logs/nginx.pid
ExecStartPre=/usr/local/nginx/sbin/nginx -t
ExecStart=/usr/local/nginx/sbin/nginx
ExecReload=/bin/kill -s HUP $MAINPID
ExecStop=/bin/kill -s QUIT $MAINPID
PrivateTmp=true
Restart=on-failure
RestartSec=5s

[Install]
WantedBy=multi-user.target
EOF

systemctl daemon-reload
systemctl enable nginx

2.8 添加到 PATH

ln -sf /usr/local/nginx/sbin/nginx /usr/local/bin/nginx

# 验证
nginx -v

输出示例:

nginx version: nginx/1.26.3

3. 安装 nginx_upstream_check_module

第三方模块已在编译阶段通过 --add-module 集成,此节验证模块是否生效。

# 测试配置(如果能识别 check 指令,说明模块已正确编译进去)
nginx -t

# 查看编译进去的模块列表
nginx -V 2>&1 | tr ' ' '\n' | grep add-module

预期输出:

--add-module=/opt/src/nginx_upstream_check_module-0.4.0
--add-module=/opt/src/nginx-module-vts

4. 用 Docker 拉起示例后端集群

用三个 Docker 容器模拟三台后端 Web 服务器,每台返回不同内容以便区分流量分配。

4.1 目录结构

/opt/demo/
├── docker-compose.yml
├── app1/
│   └── index.html
├── app2/
│   └── index.html
└── app3/
    └── index.html
mkdir -p /opt/demo/app{1,2,3}

4.2 创建各后端的首页

# Server 1
cat > /opt/demo/app1/index.html << 'EOF'
<!DOCTYPE html>
<html lang="zh">
<head><meta charset="UTF-8"><title>Backend Server 1</title>
<style>body{font-family:sans-serif;text-align:center;padding:60px;background:#e8f5e9;}
h1{color:#2e7d32;font-size:3em;}p{font-size:1.2em;color:#555;}</style>
</head>
<body>
  <h1>&#9989; Backend Server 1</h1>
  <p>IP: 172.20.0.11 &nbsp;|&nbsp; Port: 8080</p>
  <p>Nginx 负载均衡演示 — 你好,我是节点 <strong>A</strong></p>
</body>
</html>
EOF

# Server 2
cat > /opt/demo/app2/index.html << 'EOF'
<!DOCTYPE html>
<html lang="zh">
<head><meta charset="UTF-8"><title>Backend Server 2</title>
<style>body{font-family:sans-serif;text-align:center;padding:60px;background:#e3f2fd;}
h1{color:#1565c0;font-size:3em;}p{font-size:1.2em;color:#555;}</style>
</head>
<body>
  <h1>&#9989; Backend Server 2</h1>
  <p>IP: 172.20.0.12 &nbsp;|&nbsp; Port: 8080</p>
  <p>Nginx 负载均衡演示 — 你好,我是节点 <strong>B</strong></p>
</body>
</html>
EOF

# Server 3
cat > /opt/demo/app3/index.html << 'EOF'
<!DOCTYPE html>
<html lang="zh">
<head><meta charset="UTF-8"><title>Backend Server 3</title>
<style>body{font-family:sans-serif;text-align:center;padding:60px;background:#fff3e0;}
h1{color:#e65100;font-size:3em;}p{font-size:1.2em;color:#555;}</style>
</head>
<body>
  <h1>&#9989; Backend Server 3</h1>
  <p>IP: 172.20.0.13 &nbsp;|&nbsp; Port: 8080</p>
  <p>Nginx 负载均衡演示 — 你好,我是节点 <strong>C</strong></p>
</body>
</html>
EOF

4.3 健康检查接口(每台后端均需有 /health 路径)

Nginx 的 index.html 已通过 Docker volume 挂载,健康检查接口直接用 nginx 容器自带的 /health 路由即可。这里通过额外挂载一个 health 文件实现:

for i in 1 2 3; do
  mkdir -p /opt/demo/app${i}/health
  echo "OK" > /opt/demo/app${i}/health/index.html
done
# 没有root权限
for i in 1 2 3; do
  sudo mkdir -p /opt/demo/app${i}/health
  echo "OK" | sudo tee /opt/demo/app${i}/health/index.html > /dev/null
done

4.4 docker-compose.yml

cat > /opt/demo/docker-compose.yml << 'EOF'
version: "3.9"

networks:
  backend_net:
    driver: bridge
    ipam:
      config:
        - subnet: 172.20.0.0/24

services:
  app1:
    image: nginx:alpine
    container_name: backend_app1
    networks:
      backend_net:
        ipv4_address: 172.20.0.11
    volumes:
      - ./app1:/usr/share/nginx/html:ro
    ports:
      - "8081:80"    # 宿主机端口映射,方便直接访问调试
    restart: unless-stopped
    healthcheck:
      test: ["CMD", "wget", "-qO-", "http://localhost/health/"]
      interval: 10s
      timeout: 3s
      retries: 3

  app2:
    image: nginx:alpine
    container_name: backend_app2
    networks:
      backend_net:
        ipv4_address: 172.20.0.12
    volumes:
      - ./app2:/usr/share/nginx/html:ro
    ports:
      - "8082:80"
    restart: unless-stopped
    healthcheck:
      test: ["CMD", "wget", "-qO-", "http://localhost/health/"]
      interval: 10s
      timeout: 3s
      retries: 3

  app3:
    image: nginx:alpine
    container_name: backend_app3
    networks:
      backend_net:
        ipv4_address: 172.20.0.13
    volumes:
      - ./app3:/usr/share/nginx/html:ro
    ports:
      - "8083:80"
    restart: unless-stopped
    healthcheck:
      test: ["CMD", "wget", "-qO-", "http://localhost/health/"]
      interval: 10s
      timeout: 3s
      retries: 3
EOF

4.5 启动后端容器

cd /opt/demo
docker compose up -d

# 查看运行状态
docker compose ps

预期输出:

NAME             IMAGE          COMMAND                  STATUS
backend_app1     nginx:alpine   "/docker-entrypoint.…"   Up (healthy)
backend_app2     nginx:alpine   "/docker-entrypoint.…"   Up (healthy)
backend_app3     nginx:alpine   "/docker-entrypoint.…"   Up (healthy)

4.6 验证后端可直接访问

curl http://172.20.0.11/
curl http://172.20.0.12/
curl http://172.20.0.13/

# 验证健康检查接口
curl http://172.20.0.11/health/
# 预期返回:OK

5. 完整 Nginx 配置文件

5.1 目录结构

/usr/local/nginx/conf/
├── nginx.conf              ← 主配置文件
└── conf.d/
    ├── upstream.conf       ← upstream 定义(后端集群 + 健康检查)
    └── demo.conf           ← 虚拟主机 + location 规则
mkdir -p /usr/local/nginx/conf/conf.d

5.2 主配置文件 nginx.conf

cat > /usr/local/nginx/conf/nginx.conf << 'EOF'
# ============================================================
# nginx.conf — 主配置文件
# 编译安装路径: /usr/local/nginx
# ============================================================

user  nginx;
worker_processes  auto;                          # 自动匹配 CPU 核心数

# 错误日志
error_log  /var/log/nginx/error.log  warn;
pid        /usr/local/nginx/logs/nginx.pid;

# 最大打开文件描述符(与 ulimit -n 保持一致)
worker_rlimit_nofile 65535;

events {
    worker_connections  10240;
    use epoll;                                   # Linux 高性能 I/O 多路复用
    multi_accept on;                             # 一次 accept 尽量多的连接
}

http {
    include       /usr/local/nginx/conf/mime.types;
    default_type  application/octet-stream;

    # ----------------------------------------------------------
    # 日志格式(含 upstream 排查字段)
    # ----------------------------------------------------------
    log_format  main  '$remote_addr - $remote_user [$time_local] '
                      '"$request" $status $body_bytes_sent '
                      '"$http_referer" "$http_user_agent" '
                      'rt=$request_time '
                      'upstream=$upstream_addr '
                      'up_status=$upstream_status '
                      'up_rt=$upstream_response_time';

    access_log  /var/log/nginx/access.log  main;

    # ----------------------------------------------------------
    # 基础优化
    # ----------------------------------------------------------
    sendfile            on;
    tcp_nopush          on;
    tcp_nodelay         on;
    keepalive_timeout   65;
    server_tokens       off;                     # 隐藏版本号

    # Gzip 压缩
    gzip                on;
    gzip_comp_level     4;
    gzip_min_length     1k;
    gzip_types          text/plain text/css application/json
                        application/javascript text/xml application/xml;

    # ----------------------------------------------------------
    # 引入子配置
    # ----------------------------------------------------------
    include /usr/local/nginx/conf/conf.d/*.conf;
}
EOF

5.3 upstream 定义文件 conf.d/upstream.conf

cat > /usr/local/nginx/conf/conf.d/upstream.conf << 'EOF'
# ============================================================
# upstream.conf — 后端集群定义 + 健康检查
# 后端:3 个 Docker 容器,固定 IP 172.20.0.11-13
# ============================================================

# ----------------------------------------------------------
# 通用后端集群 — 加权轮询 + 被动健康检查
# ----------------------------------------------------------
upstream demo_backend {
    # 加权轮询:三台配置相同,权重相等
    server 172.20.0.11:80  weight=1  max_fails=3  fail_timeout=30s;
    server 172.20.0.12:80  weight=1  max_fails=3  fail_timeout=30s;
    server 172.20.0.13:80  weight=1  max_fails=3  fail_timeout=30s;

    # ----------------------------------------------------------
    # 主动健康检查(nginx_upstream_check_module)
    # 每 5 秒探测一次;连续失败 3 次剔除;连续成功 2 次恢复
    # ----------------------------------------------------------
    check interval=5000 rise=2 fall=3 timeout=2000 type=http;
    check_http_send "GET /health/ HTTP/1.1\r\nHost: localhost\r\nConnection: close\r\n\r\n";
    check_http_expect_alive http_2xx;

    # 与后端保持长连接,减少 TCP 握手开销
    keepalive 16;
}

# ----------------------------------------------------------
# IP Hash 集群(用于演示会话保持)
# ----------------------------------------------------------
upstream demo_iphash {
    ip_hash;
    server 172.20.0.11:80  max_fails=3  fail_timeout=30s;
    server 172.20.0.12:80  max_fails=3  fail_timeout=30s;
    server 172.20.0.13:80  max_fails=3  fail_timeout=30s;

    check interval=5000 rise=2 fall=3 timeout=2000 type=http;
    check_http_send "GET /health/ HTTP/1.1\r\nHost: localhost\r\nConnection: close\r\n\r\n";
    check_http_expect_alive http_2xx;
}

# ----------------------------------------------------------
# 最少连接集群(用于演示长耗时请求分发)
# ----------------------------------------------------------
upstream demo_least_conn {
    least_conn;
    server 172.20.0.11:80  max_fails=3  fail_timeout=30s;
    server 172.20.0.12:80  max_fails=3  fail_timeout=30s;
    server 172.20.0.13:80  max_fails=3  fail_timeout=30s;

    check interval=5000 rise=2 fall=3 timeout=2000 type=http;
    check_http_send "GET /health/ HTTP/1.1\r\nHost: localhost\r\nConnection: close\r\n\r\n";
    check_http_expect_alive http_2xx;
}
EOF

5.4 虚拟主机配置文件 conf.d/demo.conf

cat > /usr/local/nginx/conf/conf.d/demo.conf << 'EOF'
# ============================================================
# demo.conf — 虚拟主机配置
# 监听端口:
#   80   → 主业务(轮询)
#   81   → IP Hash 演示
#   82   → 最少连接演示
#   8888 → 健康状态监控页(仅内网)
# ============================================================

# ----------------------------------------------------------
# 主业务:加权轮询
# ----------------------------------------------------------
server {
    listen       80;
    server_name  _;                              # 匹配所有域名

    # 代理通用头
    proxy_set_header  Host              $host;
    proxy_set_header  X-Real-IP         $remote_addr;
    proxy_set_header  X-Forwarded-For   $proxy_add_x_forwarded_for;
    proxy_set_header  X-Forwarded-Proto $scheme;

    # 超时配置(关键!避免后端宕机时请求卡住)
    proxy_connect_timeout  3s;
    proxy_send_timeout     10s;
    proxy_read_timeout     30s;

    # 使用 HTTP/1.1 与 upstream 保持长连接
    proxy_http_version  1.1;
    proxy_set_header    Connection "";

    # ----------------------------------------------------------
    # 读操作(GET 为主):开启多台重试
    # ----------------------------------------------------------
    location / {
        proxy_pass http://demo_backend;

        proxy_next_upstream         error timeout http_502 http_503 http_504;
        proxy_next_upstream_tries   3;
        proxy_next_upstream_timeout 10s;
    }

    # ----------------------------------------------------------
    # 写操作(POST 为主):关闭自动重试,防止重复提交
    # ----------------------------------------------------------
    location /api/write/ {
        proxy_pass http://demo_backend;
        proxy_next_upstream off;                 # 写操作不重试!
    }

    # ----------------------------------------------------------
    # 兜底页(所有后端都挂时展示)
    # ----------------------------------------------------------
    proxy_intercept_errors on;
    error_page 502 503 504 /50x.html;

    location = /50x.html {
        root  /usr/local/nginx/html;
        internal;
    }
}

# ----------------------------------------------------------
# IP Hash 演示:同一 IP 始终打到同一后端
# ----------------------------------------------------------
server {
    listen  81;
    server_name _;

    proxy_set_header Host            $host;
    proxy_set_header X-Real-IP       $remote_addr;
    proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
    proxy_connect_timeout 3s;
    proxy_read_timeout    30s;
    proxy_http_version    1.1;
    proxy_set_header      Connection "";

    location / {
        proxy_pass http://demo_iphash;
        # IP Hash 下不推荐配置 proxy_next_upstream(会打破哈希绑定)
        proxy_next_upstream off;
    }
}

# ----------------------------------------------------------
# 最少连接演示
# ----------------------------------------------------------
server {
    listen  82;
    server_name _;

    proxy_set_header Host            $host;
    proxy_set_header X-Real-IP       $remote_addr;
    proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
    proxy_connect_timeout 3s;
    proxy_read_timeout    60s;
    proxy_http_version    1.1;
    proxy_set_header      Connection "";

    location / {
        proxy_pass http://demo_least_conn;
        proxy_next_upstream error timeout http_502 http_503 http_504;
        proxy_next_upstream_tries 3;
    }
}

# ----------------------------------------------------------
# 健康状态监控页(仅内网访问)
# ----------------------------------------------------------
server {
    listen  8888;
    server_name _;

    # 仅允许内网 IP 访问
    allow  127.0.0.1;
    allow  10.0.0.0/8;
    allow  172.16.0.0/12;
    allow  192.168.0.0/16;
    deny   all;

    # nginx_upstream_check_module 状态页
    location /upstream_check {
        check_status;
        access_log off;
    }

    # nginx_upstream_check_module JSON 格式(适合程序解析)
    location /upstream_check_json {
        check_status json;
        access_log off;
    }

    # Nginx 基础连接数状态
    location /nginx_status {
        stub_status;
        access_log off;
    }

    # nginx-module-vts 流量统计(如已编译该模块)
    location /vts_status {
        vhost_traffic_status_display;
        vhost_traffic_status_display_format html;
        access_log off;
    }
}
EOF

5.5 创建兜底 50x 页面

cat > /usr/local/nginx/html/50x.html << 'EOF'
<!DOCTYPE html>
<html lang="zh">
<head>
  <meta charset="UTF-8">
  <title>服务暂时不可用</title>
  <style>
    body { font-family: sans-serif; text-align: center; padding: 80px; background: #fafafa; }
    h1   { font-size: 4em; color: #d32f2f; }
    p    { font-size: 1.3em; color: #555; }
  </style>
</head>
<body>
  <h1>&#128683; 503</h1>
  <p>后端服务暂时不可用,请稍后重试</p>
</body>
</html>
EOF

6. 启动与管理

6.1 语法检查与启动

# 检查配置语法
nginx -t

# 预期输出
# nginx: the configuration file /usr/local/nginx/conf/nginx.conf syntax is ok
# nginx: configuration file /usr/local/nginx/conf/nginx.conf test is successful

# 通过 systemd 启动
systemctl start nginx
systemctl status nginx

6.2 常用管理命令速查

# 启动
systemctl start nginx
# 或
/usr/local/nginx/sbin/nginx

# 停止(优雅退出,等现有连接处理完)
nginx -s quit

# 强制停止
nginx -s stop
# 或
systemctl stop nginx

# 平滑重载配置(不中断服务,生产环境变更配置用这个)
nginx -s reload
# 或
systemctl reload nginx

# 重新打开日志文件(日志切割后用)
nginx -s reopen

# 查看 master/worker 进程
ps aux | grep nginx

# 查看监听端口
ss -tlnp | grep nginx

6.3 开机自启验证

systemctl is-enabled nginx
# 输出:enabled

7. 功能验证

7.1 验证轮询负载均衡

连续发 6 次请求,观察轮询分配:

for i in $(seq 1 6); do
    start=$(date +%s%N)
    result=$(curl -s http://localhost:80/ | grep -oP 'Backend Server \K\d')
    end=$(date +%s%N)
    elapsed=$(( (end - start) / 1000000 ))
    echo "${result}  ← 第 ${i} 次请求  耗时: ${elapsed}ms"
done

预期输出(三台轮流接收):

yfbastion@ytb-app-finance-data-ai-3-129:/usr/local/nginx/conf$ for i in $(seq 1 6); do
    curl -s http://localhost/ | grep -oP 'Backend Server \K\d'
    echo "  ← 第 ${i} 次请求"
done
1
1
  ← 第 1 次请求
2
2
  ← 第 2 次请求
3
3
  ← 第 3 次请求
1
1
  ← 第 4 次请求
2
2
  ← 第 5 次请求
3
3
  ← 第 6 次请求

7.2 验证 IP Hash 会话保持

for i in $(seq 1 6); do
    curl -s http://localhost:81/ | grep -oP 'Backend Server \K\d'
done

预期:6 次全部返回同一个节点编号(同一 IP 总是命中同一后端)。

7.3 验证主动健康检查状态页

# HTML 格式
curl http://localhost:8888/upstream_check

# JSON 格式(适合脚本解析)
curl http://localhost:8888/upstream_check_json | python3 -m json.tool

JSON 输出示例:

{
    "servers": {
        "total": 3,
        "generation": 1,
        "peers": [
            {"index": 0, "upstream": "demo_backend", "name": "172.20.0.11:80", "status": "up", ...},
            {"index": 1, "upstream": "demo_backend", "name": "172.20.0.12:80", "status": "up", ...},
            {"index": 2, "upstream": "demo_backend", "name": "172.20.0.13:80", "status": "up", ...}
        ]
    }
}

7.4 模拟后端故障,验证自动剔除

# 停掉 app2 容器
docker stop backend_app2

# 等待 ~15 秒(健康检查 5s 间隔 × 3次 fall = 15s)
sleep 16

# 查看健康检查状态(app2 应为 down)
curl -s http://localhost:8888/upstream_check_json | python3 -m json.tool

# 验证请求不再打到 app2
for i in $(seq 1 9); do
    curl -s http://localhost/ | grep -oP 'Backend Server \K\d'
done
# 预期:只出现 1 和 3,不出现 2

7.5 模拟后端恢复,验证自动加回

# 恢复 app2
docker start backend_app2

# 等待 ~10 秒(5s 间隔 × 2次 rise = 10s)
sleep 12

# 再次验证(2 应重新出现在轮询中)
for i in $(seq 1 9); do
    curl -s http://localhost/ | grep -oP 'Backend Server \K\d'
done

7.6 查看 access_log 中的 upstream 字段

tail -f /var/log/nginx/access.log

示例日志:

127.0.0.1 - - [11/May/2026:10:00:01 +0800] "GET / HTTP/1.1" 200 512 "-" "curl/7.81.0" rt=0.003 upstream=172.20.0.11:80 up_status=200 up_rt=0.002
127.0.0.1 - - [11/May/2026:10:00:02 +0800] "GET / HTTP/1.1" 200 512 "-" "curl/7.81.0" rt=0.002 upstream=172.20.0.12:80 up_status=200 up_rt=0.001

8. 常用维护命令

8.1 日志管理

# 实时查看访问日志
tail -f /var/log/nginx/access.log

# 实时查看错误日志
tail -f /var/log/nginx/error.log

# 按状态码过滤(查看所有 5xx 错误)
awk '$9 ~ /^5/' /var/log/nginx/access.log | tail -20

# 查看 upstream 故障记录
grep 'upstream server temporarily disabled' /var/log/nginx/error.log

# 日志切割后通知 Nginx 重开日志文件
mv /var/log/nginx/access.log /var/log/nginx/access.log.$(date +%Y%m%d)
nginx -s reopen

8.2 配置变更流程

# 1. 编辑配置
vi /usr/local/nginx/conf/conf.d/demo.conf

# 2. 语法检查(不停服)
nginx -t

# 3. 平滑重载(不中断现有连接)
nginx -s reload

# 4. 确认变更生效
curl -I http://localhost/

8.3 Docker 后端管理

cd /opt/demo

# 查看所有容器状态
docker compose ps

# 停止单个后端(模拟故障)
docker compose stop app2

# 恢复单个后端
docker compose start app2

# 重建所有容器(配置变更后)
docker compose up -d --force-recreate

# 查看某台后端的实时日志
docker compose logs -f app1

# 停止并删除所有容器
docker compose down

8.4 性能压测(ab 基础验证)

# 安装 ab(apache bench)
apt-get install -y apache2-utils

# 并发 50,共发送 5000 个请求
ab -n 5000 -c 50 http://localhost/

# 查看各节点接收请求数量
grep 'upstream=' /var/log/nginx/access.log | \
    grep -oP 'upstream=\K[^:]+' | \
    sort | uniq -c | sort -rn

9. nginx-module-vts 流量统计使用 Demo

nginx-module-vts(Virtual Host Traffic Status)提供实时的 per-虚拟主机、per-upstream、per-filter 流量统计,支持 HTML / JSON / Prometheus 格式输出。

9.1 在 nginx.conf 中开启统计

vhost_traffic_status_zone 必须放在 http {} 块内,告诉模块开辟共享内存:

http {
    # 开启 vts 统计,分配 10MB 共享内存
    vhost_traffic_status_zone;

    # (可选)过滤统计维度:按 upstream 分组
    vhost_traffic_status_filter_by_host on;

    include /usr/local/nginx/conf/conf.d/*.conf;
}

将以上两行加入 nginx.confhttp {} 块(在 include 之前),然后重载:

nginx -t && nginx -s reload

9.2 访问 HTML 可视化页面

# 浏览器或 curl 直接访问
curl http://localhost:8888/vts_status

页面会展示:

  • 服务器总请求数 / 连接数 / 带宽
  • 每个 server_name 的请求量、响应码分布、响应时间
  • 每个 upstream 节点的请求数、失败数、响应时间

9.3 JSON 接口 Demo

获取完整 JSON 数据

curl -s http://localhost:8888/vts_status/format/json | python3 -m json.tool

JSON 结构:

{
  "nowMsec": 1746931200000,
  "connections": {
    "active": 3,
    "reading": 0,
    "writing": 1,
    "waiting": 2,
    "accepted": 12580,
    "handled": 12580,
    "requests": 38291
  },
  "serverZones": {
    "_": {
      "requestCounter": 38291,
      "inBytes": 9823442,
      "outBytes": 187634210,
      "responses": {
        "1xx": 0, "2xx": 37854, "3xx": 0, "4xx": 112, "5xx": 325
      },
      "requestMsec": 4
    }
  },
  "upstreamZones": {
    "demo_backend": [
      {
        "server": "172.20.0.11:80",
        "requestCounter": 12762,
        "inBytes": 3274481,
        "outBytes": 62544780,
        "responses": {"1xx":0,"2xx":12762,"3xx":0,"4xx":0,"5xx":0},
        "responseMsec": 2,
        "weight": 1,
        "maxFails": 3,
        "failTimeout": 30,
        "backup": false,
        "down": false
      },
      {
        "server": "172.20.0.12:80",
        "requestCounter": 12755,
        ...
      },
      {
        "server": "172.20.0.13:80",
        "requestCounter": 12774,
        ...
      }
    ]
  }
}

提取关键字段(Shell 脚本示例)

⚠️ 重要说明:VTS 的 down 字段不反映动态健康检查结果!

VTS 中每个节点的 down 字段只表示 nginx.conf 里静态配置了 down 参数的服务器,
不感知 nginx_upstream_check_module 的动态剔除/恢复状态。
因此即使某节点已被主动健康检查标记为不可用,VTS 里 down 依然是 false,状态始终显示 UP。

正确做法:节点健康状态从 /upstream_check_json 取,流量统计从 VTS JSON 取,两者合并输出。

python3 - << 'EOF'
import json, urllib.request

MONITOR_BASE = "http://localhost:8888"

# 从 upstream_check_module 获取真实动态健康状态
check_data = json.loads(
    urllib.request.urlopen(f"{MONITOR_BASE}/upstream_check_json").read()
)
# 构建 {upstream名: {节点地址: status}} 的查找表
health_map = {}
for peer in check_data["servers"]["peers"]:
    health_map.setdefault(peer["upstream"], {})[peer["name"]] = peer["status"]

# 从 VTS 获取流量统计
vts_data = json.loads(
    urllib.request.urlopen(f"{MONITOR_BASE}/vts_status/format/json").read()
)

print(f"\n{'健康':^6} {'节点':<22} {'请求数':>10} {'2xx':>8} {'5xx':>6} {'均响应ms':>10}")
print("=" * 68)

for upstream, peers in vts_data.get("upstreamZones", {}).items():
    print(f"\n[{upstream}]")
    node_health = health_map.get(upstream, {})
    for p in peers:
        # 从 upstream_check_json 取真实健康状态
        real_status = node_health.get(p["server"], "unknown")
        tag = "✓ UP  " if real_status == "up" else "✗ DOWN"
        print(f"  {tag} {p['server']:<20} "
              f"{p['requestCounter']:>10} "
              f"{p['responses']['2xx']:>8} "
              f"{p['responses']['5xx']:>6} "
              f"{p['responseMsec']:>10}")
EOF

输出示例(app2 已被健康检查剔除): ``` 健康 节点 请求数 2xx 5xx 均响应ms

[demo_backend] ✓ UP 172.20.0.11:80 12762 12762 0 2 ✗ DOWN 172.20.0.12:80 12755 12755 0 2 ✓ UP 172.20.0.13:80 12774 12774 0 2


> **两个模块的职责边界:**
>
> | 数据来源 | 提供内容 | 接口 |
> |---------|---------|------|
> | `nginx_upstream_check_module` | 节点**是否健康**(动态剔除/恢复) | `/upstream_check_json` |
> | `nginx-module-vts` | 节点**流量统计**(请求数/响应码/带宽/响应时间) | `/vts_status/format/json` |

### 9.4 Prometheus 格式输出(对接监控系统)

```bash
curl -s http://localhost:8888/vts_status/format/prometheus

输出示例(可直接被 Prometheus 抓取):

# HELP nginx_vts_server_requests_total The requests counter
nginx_vts_server_requests_total{host="_",code="2xx"} 37854
nginx_vts_server_requests_total{host="_",code="5xx"} 325

# HELP nginx_vts_upstream_requests_total Upstream requests counter
nginx_vts_upstream_requests_total{upstream="demo_backend",server="172.20.0.11:80",code="2xx"} 12762
nginx_vts_upstream_requests_total{upstream="demo_backend",server="172.20.0.12:80",code="2xx"} 12755
nginx_vts_upstream_requests_total{upstream="demo_backend",server="172.20.0.13:80",code="2xx"} 12774

# HELP nginx_vts_upstream_response_seconds Average upstream response time
nginx_vts_upstream_response_seconds{upstream="demo_backend",server="172.20.0.11:80"} 0.002

在 Prometheus 的 prometheus.yml 中添加 scrape job:

scrape_configs:
  - job_name: 'nginx-vts'
    static_configs:
      - targets: ['nginx-host:8888']
    metrics_path: '/vts_status/format/prometheus'

9.5 按需重置统计数据

# 重置所有统计计数器(不重启 Nginx)
curl -X DELETE http://localhost:8888/vts_status

# 只重置某个 upstream 的统计
curl -X DELETE "http://localhost:8888/vts_status/upstreams/demo_backend"

9.6 完整 demo.conf 中 vts 相关配置补充

将以下内容追加到 8888 server 块中的 /vts_status location(替换原有的简单配置):

# nginx-module-vts 多格式状态接口
location /vts_status {
    vhost_traffic_status_display;
    vhost_traffic_status_display_format html;
    access_log off;

    allow 127.0.0.1;
    allow 10.0.0.0/8;
    allow 172.16.0.0/12;
    allow 192.168.0.0/16;
    deny  all;
}

# JSON 格式(方便脚本解析)
location /vts_status/format/json {
    vhost_traffic_status_display;
    vhost_traffic_status_display_format json;
    access_log off;

    allow 127.0.0.1;
    allow 10.0.0.0/8;
    deny  all;
}

# Prometheus 格式(对接 Prometheus + Grafana)
location /vts_status/format/prometheus {
    vhost_traffic_status_display;
    vhost_traffic_status_display_format prometheus;
    access_log off;

    allow 127.0.0.1;
    allow 10.0.0.0/8;
    deny  all;
}

9.7 常用统计接口速查

接口 格式 用途
/vts_status HTML 可视化看板,人工查看
/vts_status/format/json JSON 脚本/程序解析
/vts_status/format/prometheus Prometheus 对接 Prometheus/Grafana
DELETE /vts_status 重置所有计数器
/upstream_check HTML upstream_check_module 节点健康状态
/upstream_check_json JSON upstream_check_module 节点健康状态(JSON)
/nginx_status text 基础连接数(Active/Reading/Writing/Waiting)

附录:快速排查 checklist

问题现象 排查命令
Nginx 无法启动 nginx -t 查看配置错误
后端不通 curl http://172.20.0.11/ 直接访问容器
请求卡顿 检查 proxy_connect_timeout 是否过大
节点未被剔除 curl http://localhost:8888/upstream_check 查看状态
所有请求打同一台 确认是否配置了 ip_hash,检查 upstream 策略
日志无 upstream 字段 检查 log_format 是否包含 $upstream_addr
Docker 容器健康检查失败 docker exec backend_app1 wget -qO- http://localhost/health/
编译后 check 指令不识别 检查 nginx -V 输出是否包含 nginx_upstream_check_module


Comment