1. 环境准备
1.1 操作系统要求
| 项目 | 要求 |
|---|---|
| 操作系统 | Ubuntu 22.04 / CentOS 7+ / Debian 11+ |
| 内核版本 | ≥ 3.10(支持 epoll) |
| 权限 | root 或具有 sudo 权限的用户 |
1.2 安装编译依赖
# Ubuntu / Debian
apt-get update
apt-get install -y \
build-essential \
libpcre3 libpcre3-dev \
zlib1g zlib1g-dev \
libssl-dev \
libgeoip-dev \
libxml2-dev \
libxslt1-dev \
libgd-dev \
wget \
curl \
patch \
git
# CentOS / RHEL
yum groupinstall -y "Development Tools"
yum install -y \
pcre pcre-devel \
zlib zlib-devel \
openssl openssl-devel \
geoip geoip-devel \
libxml2 libxml2-devel \
libxslt libxslt-devel \
gd gd-devel \
wget curl patch git
1.3 安装 Docker
# 安装 Docker Engine
curl -fsSL https://get.docker.com | sh
# 启动并设置开机自启
systemctl enable docker
systemctl start docker
# 安装 docker compose
apt-get install -y docker-compose-plugin
# 验证
docker --version
docker compose version
1.4 确认目录规划
| 用途 | 路径 |
|---|---|
| Nginx 安装根目录 | /usr/local/nginx |
| 配置文件目录 | /usr/local/nginx/conf |
| 日志目录 | /var/log/nginx |
| 网站根目录 | /usr/local/nginx/html |
| Nginx 源码 | /opt/src/ |
mkdir -p /opt/src /var/log/nginx
2. 编译安装 Nginx
2.1 下载 Nginx 源码
cd /opt/src
# 下载稳定版(当前最新稳定版,按需调整版本号)
NGINX_VERSION=1.26.3
wget https://nginx.org/download/nginx-${NGINX_VERSION}.tar.gz
tar zxvf nginx-${NGINX_VERSION}.tar.gz
版本选择建议:
mainline(奇数版本,如 1.27.x):包含最新功能,但稳定性稍低stable(偶数版本,如 1.26.x):生产环境首选
2.2 下载第三方模块
cd /opt/src
# nginx_upstream_check_module — 主动健康检查
wget https://github.com/yaoweibin/nginx_upstream_check_module/archive/refs/tags/v0.4.0.tar.gz \
-O nginx_upstream_check_module-0.4.0.tar.gz
tar zxvf nginx_upstream_check_module-0.4.0.tar.gz
# nginx-module-vts — 流量统计(可选但推荐)
git clone https://github.com/vozlt/nginx-module-vts.git
2.3 打补丁(upstream_check_module 必须步骤)
cd /opt/src/nginx-${NGINX_VERSION}
# 查看可用补丁文件
ls /opt/src/nginx_upstream_check_module-0.4.0/*.patch
# 根据 Nginx 版本选择对应补丁
# Nginx 1.26.x / 1.25.x / 1.24.x / 1.22.x → check_1.20.1+.patch
patch -p1 < /opt/src/nginx_upstream_check_module-0.4.0/check_1.20.1+.patch
补丁版本速查:
Nginx 版本 使用补丁 1.20.x ~ 1.26.x check_1.20.1+.patch1.18.x check_1.16.1+.patch1.14.x check_1.14.0+.patch
2.4 配置编译选项
cd /opt/src/nginx-${NGINX_VERSION}
./configure \
--prefix=/usr/local/nginx \
--user=nginx \
--group=nginx \
--with-http_ssl_module \
--with-http_v2_module \
--with-http_realip_module \
--with-http_gzip_static_module \
--with-http_stub_status_module \
--with-http_sub_module \
--with-stream \
--with-stream_ssl_module \
--with-pcre \
--with-threads \
--add-module=/opt/src/nginx_upstream_check_module-0.4.0 \
--add-module=/opt/src/nginx-module-vts
常用编译参数说明:
参数 说明 --with-http_ssl_moduleHTTPS 支持 --with-http_v2_moduleHTTP/2 支持 --with-http_realip_module获取真实客户端 IP(CDN/代理场景) --with-http_stub_status_module基础状态监控接口 --with-streamTCP/UDP 代理(四层负载均衡) --with-threads线程池支持 --add-module=PATH添加第三方模块
2.5 编译并安装
# 编译(-j 后跟 CPU 核心数,加速编译)
make -j$(nproc)
# 安装
make install
⚠️ 注意:安装完成后不要立即执行
nginx -t!nginx.conf中user nginx;引用了系统用户nginx,必须先完成 2.6 节的用户创建,
否则会报getpwnam("nginx") failed错误。
2.6 创建 nginx 系统用户
# 创建无登录权限的系统用户
useradd -r -s /sbin/nologin -d /usr/local/nginx nginx
# 调整目录权限
chown -R nginx:nginx /usr/local/nginx/html
chown -R nginx:nginx /var/log/nginx
2.7 创建 systemd 服务
cat > /etc/systemd/system/nginx.service << 'EOF'
[Unit]
Description=Nginx HTTP Server
After=network.target
[Service]
Type=forking
PIDFile=/usr/local/nginx/logs/nginx.pid
ExecStartPre=/usr/local/nginx/sbin/nginx -t
ExecStart=/usr/local/nginx/sbin/nginx
ExecReload=/bin/kill -s HUP $MAINPID
ExecStop=/bin/kill -s QUIT $MAINPID
PrivateTmp=true
Restart=on-failure
RestartSec=5s
[Install]
WantedBy=multi-user.target
EOF
systemctl daemon-reload
systemctl enable nginx
2.8 添加到 PATH
ln -sf /usr/local/nginx/sbin/nginx /usr/local/bin/nginx
# 验证
nginx -v
输出示例:
nginx version: nginx/1.26.3
3. 安装 nginx_upstream_check_module
第三方模块已在编译阶段通过 --add-module 集成,此节验证模块是否生效。
# 测试配置(如果能识别 check 指令,说明模块已正确编译进去)
nginx -t
# 查看编译进去的模块列表
nginx -V 2>&1 | tr ' ' '\n' | grep add-module
预期输出:
--add-module=/opt/src/nginx_upstream_check_module-0.4.0
--add-module=/opt/src/nginx-module-vts
4. 用 Docker 拉起示例后端集群
用三个 Docker 容器模拟三台后端 Web 服务器,每台返回不同内容以便区分流量分配。
4.1 目录结构
/opt/demo/
├── docker-compose.yml
├── app1/
│ └── index.html
├── app2/
│ └── index.html
└── app3/
└── index.html
mkdir -p /opt/demo/app{1,2,3}
4.2 创建各后端的首页
# Server 1
cat > /opt/demo/app1/index.html << 'EOF'
<!DOCTYPE html>
<html lang="zh">
<head><meta charset="UTF-8"><title>Backend Server 1</title>
<style>body{font-family:sans-serif;text-align:center;padding:60px;background:#e8f5e9;}
h1{color:#2e7d32;font-size:3em;}p{font-size:1.2em;color:#555;}</style>
</head>
<body>
<h1>✅ Backend Server 1</h1>
<p>IP: 172.20.0.11 | Port: 8080</p>
<p>Nginx 负载均衡演示 — 你好,我是节点 <strong>A</strong></p>
</body>
</html>
EOF
# Server 2
cat > /opt/demo/app2/index.html << 'EOF'
<!DOCTYPE html>
<html lang="zh">
<head><meta charset="UTF-8"><title>Backend Server 2</title>
<style>body{font-family:sans-serif;text-align:center;padding:60px;background:#e3f2fd;}
h1{color:#1565c0;font-size:3em;}p{font-size:1.2em;color:#555;}</style>
</head>
<body>
<h1>✅ Backend Server 2</h1>
<p>IP: 172.20.0.12 | Port: 8080</p>
<p>Nginx 负载均衡演示 — 你好,我是节点 <strong>B</strong></p>
</body>
</html>
EOF
# Server 3
cat > /opt/demo/app3/index.html << 'EOF'
<!DOCTYPE html>
<html lang="zh">
<head><meta charset="UTF-8"><title>Backend Server 3</title>
<style>body{font-family:sans-serif;text-align:center;padding:60px;background:#fff3e0;}
h1{color:#e65100;font-size:3em;}p{font-size:1.2em;color:#555;}</style>
</head>
<body>
<h1>✅ Backend Server 3</h1>
<p>IP: 172.20.0.13 | Port: 8080</p>
<p>Nginx 负载均衡演示 — 你好,我是节点 <strong>C</strong></p>
</body>
</html>
EOF
4.3 健康检查接口(每台后端均需有 /health 路径)
Nginx 的 index.html 已通过 Docker volume 挂载,健康检查接口直接用 nginx 容器自带的 /health 路由即可。这里通过额外挂载一个 health 文件实现:
for i in 1 2 3; do
mkdir -p /opt/demo/app${i}/health
echo "OK" > /opt/demo/app${i}/health/index.html
done
# 没有root权限
for i in 1 2 3; do
sudo mkdir -p /opt/demo/app${i}/health
echo "OK" | sudo tee /opt/demo/app${i}/health/index.html > /dev/null
done
4.4 docker-compose.yml
cat > /opt/demo/docker-compose.yml << 'EOF'
version: "3.9"
networks:
backend_net:
driver: bridge
ipam:
config:
- subnet: 172.20.0.0/24
services:
app1:
image: nginx:alpine
container_name: backend_app1
networks:
backend_net:
ipv4_address: 172.20.0.11
volumes:
- ./app1:/usr/share/nginx/html:ro
ports:
- "8081:80" # 宿主机端口映射,方便直接访问调试
restart: unless-stopped
healthcheck:
test: ["CMD", "wget", "-qO-", "http://localhost/health/"]
interval: 10s
timeout: 3s
retries: 3
app2:
image: nginx:alpine
container_name: backend_app2
networks:
backend_net:
ipv4_address: 172.20.0.12
volumes:
- ./app2:/usr/share/nginx/html:ro
ports:
- "8082:80"
restart: unless-stopped
healthcheck:
test: ["CMD", "wget", "-qO-", "http://localhost/health/"]
interval: 10s
timeout: 3s
retries: 3
app3:
image: nginx:alpine
container_name: backend_app3
networks:
backend_net:
ipv4_address: 172.20.0.13
volumes:
- ./app3:/usr/share/nginx/html:ro
ports:
- "8083:80"
restart: unless-stopped
healthcheck:
test: ["CMD", "wget", "-qO-", "http://localhost/health/"]
interval: 10s
timeout: 3s
retries: 3
EOF
4.5 启动后端容器
cd /opt/demo
docker compose up -d
# 查看运行状态
docker compose ps
预期输出:
NAME IMAGE COMMAND STATUS
backend_app1 nginx:alpine "/docker-entrypoint.…" Up (healthy)
backend_app2 nginx:alpine "/docker-entrypoint.…" Up (healthy)
backend_app3 nginx:alpine "/docker-entrypoint.…" Up (healthy)
4.6 验证后端可直接访问
curl http://172.20.0.11/
curl http://172.20.0.12/
curl http://172.20.0.13/
# 验证健康检查接口
curl http://172.20.0.11/health/
# 预期返回:OK
5. 完整 Nginx 配置文件
5.1 目录结构
/usr/local/nginx/conf/
├── nginx.conf ← 主配置文件
└── conf.d/
├── upstream.conf ← upstream 定义(后端集群 + 健康检查)
└── demo.conf ← 虚拟主机 + location 规则
mkdir -p /usr/local/nginx/conf/conf.d
5.2 主配置文件 nginx.conf
cat > /usr/local/nginx/conf/nginx.conf << 'EOF'
# ============================================================
# nginx.conf — 主配置文件
# 编译安装路径: /usr/local/nginx
# ============================================================
user nginx;
worker_processes auto; # 自动匹配 CPU 核心数
# 错误日志
error_log /var/log/nginx/error.log warn;
pid /usr/local/nginx/logs/nginx.pid;
# 最大打开文件描述符(与 ulimit -n 保持一致)
worker_rlimit_nofile 65535;
events {
worker_connections 10240;
use epoll; # Linux 高性能 I/O 多路复用
multi_accept on; # 一次 accept 尽量多的连接
}
http {
include /usr/local/nginx/conf/mime.types;
default_type application/octet-stream;
# ----------------------------------------------------------
# 日志格式(含 upstream 排查字段)
# ----------------------------------------------------------
log_format main '$remote_addr - $remote_user [$time_local] '
'"$request" $status $body_bytes_sent '
'"$http_referer" "$http_user_agent" '
'rt=$request_time '
'upstream=$upstream_addr '
'up_status=$upstream_status '
'up_rt=$upstream_response_time';
access_log /var/log/nginx/access.log main;
# ----------------------------------------------------------
# 基础优化
# ----------------------------------------------------------
sendfile on;
tcp_nopush on;
tcp_nodelay on;
keepalive_timeout 65;
server_tokens off; # 隐藏版本号
# Gzip 压缩
gzip on;
gzip_comp_level 4;
gzip_min_length 1k;
gzip_types text/plain text/css application/json
application/javascript text/xml application/xml;
# ----------------------------------------------------------
# 引入子配置
# ----------------------------------------------------------
include /usr/local/nginx/conf/conf.d/*.conf;
}
EOF
5.3 upstream 定义文件 conf.d/upstream.conf
cat > /usr/local/nginx/conf/conf.d/upstream.conf << 'EOF'
# ============================================================
# upstream.conf — 后端集群定义 + 健康检查
# 后端:3 个 Docker 容器,固定 IP 172.20.0.11-13
# ============================================================
# ----------------------------------------------------------
# 通用后端集群 — 加权轮询 + 被动健康检查
# ----------------------------------------------------------
upstream demo_backend {
# 加权轮询:三台配置相同,权重相等
server 172.20.0.11:80 weight=1 max_fails=3 fail_timeout=30s;
server 172.20.0.12:80 weight=1 max_fails=3 fail_timeout=30s;
server 172.20.0.13:80 weight=1 max_fails=3 fail_timeout=30s;
# ----------------------------------------------------------
# 主动健康检查(nginx_upstream_check_module)
# 每 5 秒探测一次;连续失败 3 次剔除;连续成功 2 次恢复
# ----------------------------------------------------------
check interval=5000 rise=2 fall=3 timeout=2000 type=http;
check_http_send "GET /health/ HTTP/1.1\r\nHost: localhost\r\nConnection: close\r\n\r\n";
check_http_expect_alive http_2xx;
# 与后端保持长连接,减少 TCP 握手开销
keepalive 16;
}
# ----------------------------------------------------------
# IP Hash 集群(用于演示会话保持)
# ----------------------------------------------------------
upstream demo_iphash {
ip_hash;
server 172.20.0.11:80 max_fails=3 fail_timeout=30s;
server 172.20.0.12:80 max_fails=3 fail_timeout=30s;
server 172.20.0.13:80 max_fails=3 fail_timeout=30s;
check interval=5000 rise=2 fall=3 timeout=2000 type=http;
check_http_send "GET /health/ HTTP/1.1\r\nHost: localhost\r\nConnection: close\r\n\r\n";
check_http_expect_alive http_2xx;
}
# ----------------------------------------------------------
# 最少连接集群(用于演示长耗时请求分发)
# ----------------------------------------------------------
upstream demo_least_conn {
least_conn;
server 172.20.0.11:80 max_fails=3 fail_timeout=30s;
server 172.20.0.12:80 max_fails=3 fail_timeout=30s;
server 172.20.0.13:80 max_fails=3 fail_timeout=30s;
check interval=5000 rise=2 fall=3 timeout=2000 type=http;
check_http_send "GET /health/ HTTP/1.1\r\nHost: localhost\r\nConnection: close\r\n\r\n";
check_http_expect_alive http_2xx;
}
EOF
5.4 虚拟主机配置文件 conf.d/demo.conf
cat > /usr/local/nginx/conf/conf.d/demo.conf << 'EOF'
# ============================================================
# demo.conf — 虚拟主机配置
# 监听端口:
# 80 → 主业务(轮询)
# 81 → IP Hash 演示
# 82 → 最少连接演示
# 8888 → 健康状态监控页(仅内网)
# ============================================================
# ----------------------------------------------------------
# 主业务:加权轮询
# ----------------------------------------------------------
server {
listen 80;
server_name _; # 匹配所有域名
# 代理通用头
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
# 超时配置(关键!避免后端宕机时请求卡住)
proxy_connect_timeout 3s;
proxy_send_timeout 10s;
proxy_read_timeout 30s;
# 使用 HTTP/1.1 与 upstream 保持长连接
proxy_http_version 1.1;
proxy_set_header Connection "";
# ----------------------------------------------------------
# 读操作(GET 为主):开启多台重试
# ----------------------------------------------------------
location / {
proxy_pass http://demo_backend;
proxy_next_upstream error timeout http_502 http_503 http_504;
proxy_next_upstream_tries 3;
proxy_next_upstream_timeout 10s;
}
# ----------------------------------------------------------
# 写操作(POST 为主):关闭自动重试,防止重复提交
# ----------------------------------------------------------
location /api/write/ {
proxy_pass http://demo_backend;
proxy_next_upstream off; # 写操作不重试!
}
# ----------------------------------------------------------
# 兜底页(所有后端都挂时展示)
# ----------------------------------------------------------
proxy_intercept_errors on;
error_page 502 503 504 /50x.html;
location = /50x.html {
root /usr/local/nginx/html;
internal;
}
}
# ----------------------------------------------------------
# IP Hash 演示:同一 IP 始终打到同一后端
# ----------------------------------------------------------
server {
listen 81;
server_name _;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_connect_timeout 3s;
proxy_read_timeout 30s;
proxy_http_version 1.1;
proxy_set_header Connection "";
location / {
proxy_pass http://demo_iphash;
# IP Hash 下不推荐配置 proxy_next_upstream(会打破哈希绑定)
proxy_next_upstream off;
}
}
# ----------------------------------------------------------
# 最少连接演示
# ----------------------------------------------------------
server {
listen 82;
server_name _;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_connect_timeout 3s;
proxy_read_timeout 60s;
proxy_http_version 1.1;
proxy_set_header Connection "";
location / {
proxy_pass http://demo_least_conn;
proxy_next_upstream error timeout http_502 http_503 http_504;
proxy_next_upstream_tries 3;
}
}
# ----------------------------------------------------------
# 健康状态监控页(仅内网访问)
# ----------------------------------------------------------
server {
listen 8888;
server_name _;
# 仅允许内网 IP 访问
allow 127.0.0.1;
allow 10.0.0.0/8;
allow 172.16.0.0/12;
allow 192.168.0.0/16;
deny all;
# nginx_upstream_check_module 状态页
location /upstream_check {
check_status;
access_log off;
}
# nginx_upstream_check_module JSON 格式(适合程序解析)
location /upstream_check_json {
check_status json;
access_log off;
}
# Nginx 基础连接数状态
location /nginx_status {
stub_status;
access_log off;
}
# nginx-module-vts 流量统计(如已编译该模块)
location /vts_status {
vhost_traffic_status_display;
vhost_traffic_status_display_format html;
access_log off;
}
}
EOF
5.5 创建兜底 50x 页面
cat > /usr/local/nginx/html/50x.html << 'EOF'
<!DOCTYPE html>
<html lang="zh">
<head>
<meta charset="UTF-8">
<title>服务暂时不可用</title>
<style>
body { font-family: sans-serif; text-align: center; padding: 80px; background: #fafafa; }
h1 { font-size: 4em; color: #d32f2f; }
p { font-size: 1.3em; color: #555; }
</style>
</head>
<body>
<h1>🚫 503</h1>
<p>后端服务暂时不可用,请稍后重试</p>
</body>
</html>
EOF
6. 启动与管理
6.1 语法检查与启动
# 检查配置语法
nginx -t
# 预期输出
# nginx: the configuration file /usr/local/nginx/conf/nginx.conf syntax is ok
# nginx: configuration file /usr/local/nginx/conf/nginx.conf test is successful
# 通过 systemd 启动
systemctl start nginx
systemctl status nginx
6.2 常用管理命令速查
# 启动
systemctl start nginx
# 或
/usr/local/nginx/sbin/nginx
# 停止(优雅退出,等现有连接处理完)
nginx -s quit
# 强制停止
nginx -s stop
# 或
systemctl stop nginx
# 平滑重载配置(不中断服务,生产环境变更配置用这个)
nginx -s reload
# 或
systemctl reload nginx
# 重新打开日志文件(日志切割后用)
nginx -s reopen
# 查看 master/worker 进程
ps aux | grep nginx
# 查看监听端口
ss -tlnp | grep nginx
6.3 开机自启验证
systemctl is-enabled nginx
# 输出:enabled
7. 功能验证
7.1 验证轮询负载均衡
连续发 6 次请求,观察轮询分配:
for i in $(seq 1 6); do
start=$(date +%s%N)
result=$(curl -s http://localhost:80/ | grep -oP 'Backend Server \K\d')
end=$(date +%s%N)
elapsed=$(( (end - start) / 1000000 ))
echo "${result} ← 第 ${i} 次请求 耗时: ${elapsed}ms"
done
预期输出(三台轮流接收):
yfbastion@ytb-app-finance-data-ai-3-129:/usr/local/nginx/conf$ for i in $(seq 1 6); do
curl -s http://localhost/ | grep -oP 'Backend Server \K\d'
echo " ← 第 ${i} 次请求"
done
1
1
← 第 1 次请求
2
2
← 第 2 次请求
3
3
← 第 3 次请求
1
1
← 第 4 次请求
2
2
← 第 5 次请求
3
3
← 第 6 次请求
7.2 验证 IP Hash 会话保持
for i in $(seq 1 6); do
curl -s http://localhost:81/ | grep -oP 'Backend Server \K\d'
done
预期:6 次全部返回同一个节点编号(同一 IP 总是命中同一后端)。
7.3 验证主动健康检查状态页
# HTML 格式
curl http://localhost:8888/upstream_check
# JSON 格式(适合脚本解析)
curl http://localhost:8888/upstream_check_json | python3 -m json.tool
JSON 输出示例:
{
"servers": {
"total": 3,
"generation": 1,
"peers": [
{"index": 0, "upstream": "demo_backend", "name": "172.20.0.11:80", "status": "up", ...},
{"index": 1, "upstream": "demo_backend", "name": "172.20.0.12:80", "status": "up", ...},
{"index": 2, "upstream": "demo_backend", "name": "172.20.0.13:80", "status": "up", ...}
]
}
}
7.4 模拟后端故障,验证自动剔除
# 停掉 app2 容器
docker stop backend_app2
# 等待 ~15 秒(健康检查 5s 间隔 × 3次 fall = 15s)
sleep 16
# 查看健康检查状态(app2 应为 down)
curl -s http://localhost:8888/upstream_check_json | python3 -m json.tool
# 验证请求不再打到 app2
for i in $(seq 1 9); do
curl -s http://localhost/ | grep -oP 'Backend Server \K\d'
done
# 预期:只出现 1 和 3,不出现 2
7.5 模拟后端恢复,验证自动加回
# 恢复 app2
docker start backend_app2
# 等待 ~10 秒(5s 间隔 × 2次 rise = 10s)
sleep 12
# 再次验证(2 应重新出现在轮询中)
for i in $(seq 1 9); do
curl -s http://localhost/ | grep -oP 'Backend Server \K\d'
done
7.6 查看 access_log 中的 upstream 字段
tail -f /var/log/nginx/access.log
示例日志:
127.0.0.1 - - [11/May/2026:10:00:01 +0800] "GET / HTTP/1.1" 200 512 "-" "curl/7.81.0" rt=0.003 upstream=172.20.0.11:80 up_status=200 up_rt=0.002
127.0.0.1 - - [11/May/2026:10:00:02 +0800] "GET / HTTP/1.1" 200 512 "-" "curl/7.81.0" rt=0.002 upstream=172.20.0.12:80 up_status=200 up_rt=0.001
8. 常用维护命令
8.1 日志管理
# 实时查看访问日志
tail -f /var/log/nginx/access.log
# 实时查看错误日志
tail -f /var/log/nginx/error.log
# 按状态码过滤(查看所有 5xx 错误)
awk '$9 ~ /^5/' /var/log/nginx/access.log | tail -20
# 查看 upstream 故障记录
grep 'upstream server temporarily disabled' /var/log/nginx/error.log
# 日志切割后通知 Nginx 重开日志文件
mv /var/log/nginx/access.log /var/log/nginx/access.log.$(date +%Y%m%d)
nginx -s reopen
8.2 配置变更流程
# 1. 编辑配置
vi /usr/local/nginx/conf/conf.d/demo.conf
# 2. 语法检查(不停服)
nginx -t
# 3. 平滑重载(不中断现有连接)
nginx -s reload
# 4. 确认变更生效
curl -I http://localhost/
8.3 Docker 后端管理
cd /opt/demo
# 查看所有容器状态
docker compose ps
# 停止单个后端(模拟故障)
docker compose stop app2
# 恢复单个后端
docker compose start app2
# 重建所有容器(配置变更后)
docker compose up -d --force-recreate
# 查看某台后端的实时日志
docker compose logs -f app1
# 停止并删除所有容器
docker compose down
8.4 性能压测(ab 基础验证)
# 安装 ab(apache bench)
apt-get install -y apache2-utils
# 并发 50,共发送 5000 个请求
ab -n 5000 -c 50 http://localhost/
# 查看各节点接收请求数量
grep 'upstream=' /var/log/nginx/access.log | \
grep -oP 'upstream=\K[^:]+' | \
sort | uniq -c | sort -rn
9. nginx-module-vts 流量统计使用 Demo
nginx-module-vts(Virtual Host Traffic Status)提供实时的 per-虚拟主机、per-upstream、per-filter 流量统计,支持 HTML / JSON / Prometheus 格式输出。
9.1 在 nginx.conf 中开启统计
vhost_traffic_status_zone 必须放在 http {} 块内,告诉模块开辟共享内存:
http {
# 开启 vts 统计,分配 10MB 共享内存
vhost_traffic_status_zone;
# (可选)过滤统计维度:按 upstream 分组
vhost_traffic_status_filter_by_host on;
include /usr/local/nginx/conf/conf.d/*.conf;
}
将以上两行加入 nginx.conf 的 http {} 块(在 include 之前),然后重载:
nginx -t && nginx -s reload
9.2 访问 HTML 可视化页面
# 浏览器或 curl 直接访问
curl http://localhost:8888/vts_status
页面会展示:
- 服务器总请求数 / 连接数 / 带宽
- 每个
server_name的请求量、响应码分布、响应时间 - 每个
upstream节点的请求数、失败数、响应时间
9.3 JSON 接口 Demo
获取完整 JSON 数据
curl -s http://localhost:8888/vts_status/format/json | python3 -m json.tool
JSON 结构:
{
"nowMsec": 1746931200000,
"connections": {
"active": 3,
"reading": 0,
"writing": 1,
"waiting": 2,
"accepted": 12580,
"handled": 12580,
"requests": 38291
},
"serverZones": {
"_": {
"requestCounter": 38291,
"inBytes": 9823442,
"outBytes": 187634210,
"responses": {
"1xx": 0, "2xx": 37854, "3xx": 0, "4xx": 112, "5xx": 325
},
"requestMsec": 4
}
},
"upstreamZones": {
"demo_backend": [
{
"server": "172.20.0.11:80",
"requestCounter": 12762,
"inBytes": 3274481,
"outBytes": 62544780,
"responses": {"1xx":0,"2xx":12762,"3xx":0,"4xx":0,"5xx":0},
"responseMsec": 2,
"weight": 1,
"maxFails": 3,
"failTimeout": 30,
"backup": false,
"down": false
},
{
"server": "172.20.0.12:80",
"requestCounter": 12755,
...
},
{
"server": "172.20.0.13:80",
"requestCounter": 12774,
...
}
]
}
}
提取关键字段(Shell 脚本示例)
⚠️ 重要说明:VTS 的
down字段不反映动态健康检查结果!VTS 中每个节点的
down字段只表示 nginx.conf 里静态配置了down参数的服务器,
不感知nginx_upstream_check_module的动态剔除/恢复状态。
因此即使某节点已被主动健康检查标记为不可用,VTS 里down依然是false,状态始终显示 UP。正确做法:节点健康状态从
/upstream_check_json取,流量统计从 VTS JSON 取,两者合并输出。
python3 - << 'EOF'
import json, urllib.request
MONITOR_BASE = "http://localhost:8888"
# 从 upstream_check_module 获取真实动态健康状态
check_data = json.loads(
urllib.request.urlopen(f"{MONITOR_BASE}/upstream_check_json").read()
)
# 构建 {upstream名: {节点地址: status}} 的查找表
health_map = {}
for peer in check_data["servers"]["peers"]:
health_map.setdefault(peer["upstream"], {})[peer["name"]] = peer["status"]
# 从 VTS 获取流量统计
vts_data = json.loads(
urllib.request.urlopen(f"{MONITOR_BASE}/vts_status/format/json").read()
)
print(f"\n{'健康':^6} {'节点':<22} {'请求数':>10} {'2xx':>8} {'5xx':>6} {'均响应ms':>10}")
print("=" * 68)
for upstream, peers in vts_data.get("upstreamZones", {}).items():
print(f"\n[{upstream}]")
node_health = health_map.get(upstream, {})
for p in peers:
# 从 upstream_check_json 取真实健康状态
real_status = node_health.get(p["server"], "unknown")
tag = "✓ UP " if real_status == "up" else "✗ DOWN"
print(f" {tag} {p['server']:<20} "
f"{p['requestCounter']:>10} "
f"{p['responses']['2xx']:>8} "
f"{p['responses']['5xx']:>6} "
f"{p['responseMsec']:>10}")
EOF
输出示例(app2 已被健康检查剔除): ``` 健康 节点 请求数 2xx 5xx 均响应ms
[demo_backend] ✓ UP 172.20.0.11:80 12762 12762 0 2 ✗ DOWN 172.20.0.12:80 12755 12755 0 2 ✓ UP 172.20.0.13:80 12774 12774 0 2
> **两个模块的职责边界:**
>
> | 数据来源 | 提供内容 | 接口 |
> |---------|---------|------|
> | `nginx_upstream_check_module` | 节点**是否健康**(动态剔除/恢复) | `/upstream_check_json` |
> | `nginx-module-vts` | 节点**流量统计**(请求数/响应码/带宽/响应时间) | `/vts_status/format/json` |
### 9.4 Prometheus 格式输出(对接监控系统)
```bash
curl -s http://localhost:8888/vts_status/format/prometheus
输出示例(可直接被 Prometheus 抓取):
# HELP nginx_vts_server_requests_total The requests counter
nginx_vts_server_requests_total{host="_",code="2xx"} 37854
nginx_vts_server_requests_total{host="_",code="5xx"} 325
# HELP nginx_vts_upstream_requests_total Upstream requests counter
nginx_vts_upstream_requests_total{upstream="demo_backend",server="172.20.0.11:80",code="2xx"} 12762
nginx_vts_upstream_requests_total{upstream="demo_backend",server="172.20.0.12:80",code="2xx"} 12755
nginx_vts_upstream_requests_total{upstream="demo_backend",server="172.20.0.13:80",code="2xx"} 12774
# HELP nginx_vts_upstream_response_seconds Average upstream response time
nginx_vts_upstream_response_seconds{upstream="demo_backend",server="172.20.0.11:80"} 0.002
在 Prometheus 的
prometheus.yml中添加 scrape job:scrape_configs: - job_name: 'nginx-vts' static_configs: - targets: ['nginx-host:8888'] metrics_path: '/vts_status/format/prometheus'
9.5 按需重置统计数据
# 重置所有统计计数器(不重启 Nginx)
curl -X DELETE http://localhost:8888/vts_status
# 只重置某个 upstream 的统计
curl -X DELETE "http://localhost:8888/vts_status/upstreams/demo_backend"
9.6 完整 demo.conf 中 vts 相关配置补充
将以下内容追加到 8888 server 块中的 /vts_status location(替换原有的简单配置):
# nginx-module-vts 多格式状态接口
location /vts_status {
vhost_traffic_status_display;
vhost_traffic_status_display_format html;
access_log off;
allow 127.0.0.1;
allow 10.0.0.0/8;
allow 172.16.0.0/12;
allow 192.168.0.0/16;
deny all;
}
# JSON 格式(方便脚本解析)
location /vts_status/format/json {
vhost_traffic_status_display;
vhost_traffic_status_display_format json;
access_log off;
allow 127.0.0.1;
allow 10.0.0.0/8;
deny all;
}
# Prometheus 格式(对接 Prometheus + Grafana)
location /vts_status/format/prometheus {
vhost_traffic_status_display;
vhost_traffic_status_display_format prometheus;
access_log off;
allow 127.0.0.1;
allow 10.0.0.0/8;
deny all;
}
9.7 常用统计接口速查
| 接口 | 格式 | 用途 |
|---|---|---|
/vts_status |
HTML | 可视化看板,人工查看 |
/vts_status/format/json |
JSON | 脚本/程序解析 |
/vts_status/format/prometheus |
Prometheus | 对接 Prometheus/Grafana |
DELETE /vts_status |
— | 重置所有计数器 |
/upstream_check |
HTML | upstream_check_module 节点健康状态 |
/upstream_check_json |
JSON | upstream_check_module 节点健康状态(JSON) |
/nginx_status |
text | 基础连接数(Active/Reading/Writing/Waiting) |
附录:快速排查 checklist
| 问题现象 | 排查命令 |
|---|---|
| Nginx 无法启动 | nginx -t 查看配置错误 |
| 后端不通 | curl http://172.20.0.11/ 直接访问容器 |
| 请求卡顿 | 检查 proxy_connect_timeout 是否过大 |
| 节点未被剔除 | curl http://localhost:8888/upstream_check 查看状态 |
| 所有请求打同一台 | 确认是否配置了 ip_hash,检查 upstream 策略 |
| 日志无 upstream 字段 | 检查 log_format 是否包含 $upstream_addr |
| Docker 容器健康检查失败 | docker exec backend_app1 wget -qO- http://localhost/health/ |
| 编译后 check 指令不识别 | 检查 nginx -V 输出是否包含 nginx_upstream_check_module |