加速你的 WordPress 站点：宝塔面板使用自动预热脚本优化 Nginx FastCGI 缓存

首页

WordPress,高级优化

在做 WordPress 或 WooCommerce 网站时，如果启用了 Nginx FastCGI 缓存，通常能大幅提升页面访问速度。但 FastCGI 缓存有个问题：首次访问时才会生成缓存。这意味着，搜索引擎爬虫或用户第一次打开页面时或者缓存过期时，仍然会触发较慢的动态请求。

为了避免这种情况，我们需要一个 缓存预热脚本 ，定期访问你的 sitemap，确保所有页面都预先缓存好，访客随时享受“秒开”体验。

这篇文章将分享一个经过优化的 Python 脚本 (cache_warmer.py) 和 Bash 脚本 (cache_warmer.sh)，专为 WordPress 站点设计，支持多站点管理、低配服务器运行，并跳过动态页面（如登录页）和静态文件（如图片）。我们还会提供 Nginx 配置和部署步骤，助你轻松提升站点性能。

脚本功能概述

我们的自动预热脚本有以下核心功能：

1、解析 Sitemap：

支持递归解析 sitemap_index.xml
收集所有页面 URL

2、智能过滤：

跳过静态资源（.jpg, .css, .js 等）
跳过手动指定的不缓存 URL
跳过已经存在缓存文件的 URL

3、并发预热：

使用 ThreadPoolExecutor 多线程请求未缓存的页面
默认线程数可配置（例如 10 个）

4、多站点支持：

通过 sites.json 配置多个 WordPress 站点。

5、日志记录：

生成清晰的日志文件，记录新增缓存、失败 URL 和运行耗时。
新增缓存 URL → 写入 new_urls.log
失败 URL → 写入 failed_urls.log
过程日志 → cache_warmer.log

6、统计结果：

打印新增、跳过、失败数量
总耗时统计

7、低资源占用：

优化线程数和日志写入，适合低配服务器（如单核 CPU、512MB 内存）。

代码实现

1. Python 脚本：cache_warmer.py

以下是核心 Python 脚本，负责解析 sitemap、预热缓存并生成日志：

#!/usr/bin/env python3
# -*- coding: utf-8 -*-

import os
import json
import time
import hashlib
import logging
import requests
from urllib.parse import urlparse
from concurrent.futures import ThreadPoolExecutor, as_completed
from lxml import etree
from tenacity import retry, stop_after_attempt, wait_fixed

# ================= 配置 =================
CACHE_METHOD = "GET"
THREADS = 3
TIMEOUT = 15
HEADERS = {"User-Agent": "Mozilla/5.0 (CacheWarmer/1.0)"}
SITES_FILE = "sites.json"
LOG_DIR = "./logs"
FAILED_LOG_FILE = os.path.join(LOG_DIR, "failed_urls.log")
NEW_LOG_FILE = os.path.join(LOG_DIR, "new_urls.log")

logging.basicConfig(
    level=logging.INFO,
    format="%(asctime)s [%(levelname)s] %(message)s",
    datefmt="%Y-%m-%d %H:%M:%S",
    handlers=[
        logging.StreamHandler(),
        logging.FileHandler(os.path.join(LOG_DIR, "cache_warmer.log"))
    ]
)

# ================= 工具函数 =================
def ensure_log_dir():
    """确保日志目录存在并清空日志文件"""
    if not os.path.exists(LOG_DIR):
        try:
            os.makedirs(LOG_DIR)
            logging.info(f"成功创建日志目录: {LOG_DIR}")
        except OSError as e:
            logging.error(f"无法创建日志目录: {LOG_DIR}，错误: {e}")
            raise
    for log_file in [FAILED_LOG_FILE, NEW_LOG_FILE]:
        with open(log_file, 'w', encoding='utf-8') as f:
            f.write("")

def fastcgi_cache_path(site: dict, url: str, method: str = CACHE_METHOD) -> str:
    """根据 Nginx fastcgi_cache_key 生成缓存文件路径，直接使用 sitemap URL"""
    cache_dir = site.get("cache_dir")
    if not cache_dir:
        raise ValueError(f"站点 {site.get('name', '未知')} 未定义 cache_dir。")
    if not site.get("name"):
        raise ValueError("站点未定义 name。")

    parsed = urlparse(url)
    scheme = parsed.scheme
    host = parsed.netloc
    request_uri = parsed.path
    if parsed.query:
        request_uri += f"?{parsed.query}"
    
    key_str = f"{scheme}{method}{host}{request_uri}"
    md5_name = hashlib.md5(key_str.encode('latin-1')).hexdigest()
    
    subdir1 = md5_name[-1]
    subdir2 = md5_name[-3:-1]
    return os.path.join(cache_dir, subdir1, subdir2, md5_name)

def parse_sitemap(url: str) -> list:
    """递归解析 sitemap_index 和 sitemap，返回 URL 列表"""
    urls = []
    try:
        resp = requests.get(url, headers=HEADERS, timeout=TIMEOUT)
        if resp.status_code != 200:
            logging.warning(f"⚠️ 无法访问 {url}: {resp.status_code}")
            return urls
        tree = etree.fromstring(resp.content)
        locs = tree.xpath("//*[local-name()='loc']")
        for loc in locs:
            if loc.text:
                url_text = loc.text.strip()
                if url_text.endswith('.xml') or url_text.endswith('.xml/'):
                    urls.extend(parse_sitemap(url_text))
                else:
                    urls.append(url_text)
    except Exception as e:
        logging.warning(f"⚠️ 无法解析 sitemap {url}: {e}")
    return urls

@retry(stop=stop_after_attempt(3), wait=wait_fixed(2))
def warm_url(site_name: str, url: str) -> tuple[bool, str]:
    """请求 URL 并返回是否成功及原因"""
    try:
        resp = requests.get(url, headers=HEADERS, timeout=TIMEOUT)
        return resp.status_code == 200, f"Status: {resp.status_code}"
    except requests.exceptions.RequestException as e:
        logging.error(f"请求 {url} 失败: {str(e)}", exc_info=True)
        return False, f"Error: {str(e)}"

def warm_site(site: dict) -> dict:
    """预热单个站点"""
    site_name = site.get("name", "未知")
    cache_dir = site.get("cache_dir")
    
    if not cache_dir or not os.path.isdir(cache_dir):
        logging.error(f"缓存目录 {cache_dir} 不存在或不可访问")
        return {"site": site_name, "total": 0, "new": 0, "skipped": 0, "failed": 0, "skipped_static": 0, "skipped_nocache": 0, "time": 0}
    if not os.access(cache_dir, os.R_OK):
        logging.error(f"缓存目录 {cache_dir} 无读取权限")
        return {"site": site_name, "total": 0, "new": 0, "skipped": 0, "failed": 0, "skipped_static": 0, "skipped_nocache": 0, "time": 0}
    
    logging.info(f"--- 🚀 正在为 {site_name} 预热 ---")
    start_time = time.time()
    
    urls = []
    for sitemap_url in site.get("sitemaps", []):
        urls.extend(parse_sitemap(sitemap_url))
    urls = list(set(urls))

    total_urls = len(urls)
    if total_urls == 0:
        logging.warning(f"⚠️ {site_name} 未发现任何 URL。请检查您的 sitemap 配置或网络连接。")
        return {"site": site_name, "total": 0, "new": 0, "skipped": 0, "failed": 0, "skipped_static": 0, "skipped_nocache": 0, "time": 0}

    count_new, count_skipped, count_failed = 0, 0, 0
    count_skipped_static, count_skipped_nocache = 0, 0
    new_urls = []

    static_exts = ('.jpg', '.jpeg', '.png', '.gif', '.css', '.js', '.ico', '.svg', '.woff', '.woff2', '.ttf', '.webp')
    no_cache_urls = site.get("no_cache_urls", [])

    urls_to_warm = []
    for url in urls:
        if url.lower().endswith(static_exts):
            count_skipped_static += 1
            continue
        if url in no_cache_urls:
            count_skipped_nocache += 1
            continue
        if os.path.exists(fastcgi_cache_path(site, url)):
            count_skipped += 1
            continue
        urls_to_warm.append(url)

    threads_to_use = min(len(urls_to_warm), THREADS) if urls_to_warm else 1
    logging.info(f"启动 {threads_to_use} 个线程进行预热...")

    with ThreadPoolExecutor(max_workers=threads_to_use) as executor:
        future_to_url = {executor.submit(warm_url, site_name, url): url for url in urls_to_warm}

        for future in as_completed(future_to_url):
            url = future_to_url[future]
            try:
                success, reason = future.result()
                if success:
                    count_new += 1
                    new_urls.append(f"{time.strftime('%Y-%m-%d %H:%M:%S')} - {site_name}: {url}")
                else:
                    count_failed += 1
                    with open(FAILED_LOG_FILE, 'a', encoding='utf-8') as f:
                        f.write(f"{time.strftime('%Y-%m-%d %H:%M:%S')} - {site_name}: {url} ({reason})\n")
            except Exception as e:
                count_failed += 1
                with open(FAILED_LOG_FILE, 'a', encoding='utf-8') as f:
                    f.write(f"{time.strftime('%Y-%m-%d %H:%M:%S')} - {site_name}: {url} (Exception: {str(e)})\n")

    if new_urls:
        with open(NEW_LOG_FILE, 'a', encoding='utf-8') as f:
            f.write("\n".join(new_urls) + "\n")
        logging.info(f"新增缓存的 URL: {len(new_urls)} 个")
    if count_failed:
        logging.info(f"失败的 URL 已保存至 {FAILED_LOG_FILE}")

    elapsed = time.time() - start_time
    logging.info(f"--- ✅ {site_name} 完成: 总 {total_urls}, 新增 {count_new}, 跳过 {count_skipped}, 失败 {count_failed}, 跳过静态文件 {count_skipped_static}, 跳过不缓存 {count_skipped_nocache}, 耗时 {elapsed:.2f}s ---")

    return {"site": site_name, "total": total_urls, "new": count_new, "skipped": count_skipped, "failed": count_failed, "skipped_static": count_skipped_static, "skipped_nocache": count_skipped_nocache, "time": elapsed}

# ================= 主程序 =================
def main():
    start_all = time.time()
    if not os.path.exists(SITES_FILE):
        logging.error(f"{SITES_FILE} 文件不存在")
        return
    
    try:
        ensure_log_dir()
    except Exception as e:
        logging.error(f"脚本无法运行: {e}")
        return

    with open(SITES_FILE, "r") as f:
        sites = json.load(f)

    for site in sites:
        warm_site(site)

    elapsed_all = time.time() - start_all
    logging.info(f"\n=== 🎯 所有站点预热完成，总耗时 {elapsed_all:.2f}s ===")
    print("\n")  # 输出纯空行到控制台
    with open(os.path.join(LOG_DIR, "cache_warmer.log"), 'a', encoding='utf-8') as f:
        f.write("\n")  # 输出纯空行到日志文件

if __name__ == "__main__":
    main()

关键点：

线程数：THREADS = 3，适合低配服务器，平衡性能和资源占用。
过滤规则：跳过静态文件和不适合缓存页面（如 oddbbo.com/wishlist）。
日志：输出到 logs/cache_warmer.log，每次运行后添加纯空行分隔。
重试机制：使用 tenacity 库，失败 URL 自动重试 3 次。

2. 配置文件：sites.json

配置多个 WordPress 站点的 sitemap 和缓存路径：

[
    {
        "name": "soezworld",
        "sitemaps": ["https://soez.world/sitemap_index.xml"],
        "cache_dir": "/cache/fastcgi_cache/soezworld",
        "no_cache_urls": []
    },
    {
        "name": "oddbboworld",
        "sitemaps": ["https://oddbbo.world/sitemap_index.xml"],
        "cache_dir": "/cache/fastcgi_cache/oddbboworld",
        "no_cache_urls": []
    },
    {
        "name": "websitesoez",
        "sitemaps": ["https://websitesoez.com/sitemap_index.xml"],
        "cache_dir": "/cache/fastcgi_cache/websitesoez",
        "no_cache_urls": []
    },
    {
        "name": "oddbbo",
        "sitemaps": ["https://oddbbo.com/sitemap_index.xml"],
        "cache_dir": "/cache/fastcgi_cache/oddbbo",
        "no_cache_urls": [
            "https://oddbbo.com/wishlist",
            "https://oddbbo.com/random",
            "https://oddbbo.com/my-account"
        ]
    }
]

注意：确保 no_cache_urls 的 URL 格式与 sitemap 一致（带或不带斜杠）。

3. Bash 脚本：cache_warmer.sh

自动化运行 Python 脚本：

#!/bin/bash
# ------------------------------------------
# Cache Warmer 自动任务脚本
# ------------------------------------------

# 脚本所在目录
SCRIPT_DIR="/cache/fastcgi_cache_warmer"
# Python 可执行路径
PYTHON_BIN="/usr/bin/python3"
# 日志目录
LOG_DIR="$SCRIPT_DIR/logs"

# 确保日志目录存在
mkdir -p "$LOG_DIR"

# 切换到脚本目录
cd "$SCRIPT_DIR" || exit 1

# 运行 Python 脚本，输出到控制台
$PYTHON_BIN cache_warmer.py

说明：脚本不重定向输出，日志由 Python 脚本写入 logs/ 目录，生成 3 个日志文件：cache_warmer.log、new_urls.log、failed_urls.log。

4. Nginx 配置

确保 Nginx FastCGI 缓存与脚本配合，如果 sitemap URL 带斜杠，改为 rewrite ^/(.+[^/])$ /$1/ permanent;。

Nginx FastCGI 缓存配置请查看：宝塔面板配置 Nginx FastCGI 缓存：全面提升WordPress网站加载速度。

部署与验证

部署步骤

1、安装依赖：

pip install requests lxml tenacity

2、保存脚本：

将 cache_warmer.py 和 sites.json 保存到 /cache/fastcgi_cache_warmer/，或自己指定名录，上面代码也要改成你设置的目录。
保存 cache_warmer.sh 并赋予执行权限：

chmod +x /cache/fastcgi_cache_warmer/cache_warmer.sh

3、设置定时任务

在宝塔面板–计划任务中添加Shell脚本任务，设定执行周期，脚本内容填写：

/cache/fastcgi_cache_warmer/cache_warmer.sh

4、运行测试：

./cache_warmer.sh

验证缓存效果

1、检查日志：

cat logs/cache_warmer.log

2、示例输出：

2025-08-22 15:00:00 [INFO] --- 🚀 正在为 websitesoez 预热 ---
2025-08-22 15:00:02 [INFO] 启动 3 个线程进行预热...
2025-08-22 15:00:02 [INFO] --- ✅ websitesoez 完成: 总 192, 新增 0, 跳过 100, 失败 0, 跳过静态文件 92, 跳过不缓存 0, 耗时 1.61s ---
2025-08-22 15:00:02 [INFO] --- 🚀 正在为 oddbbo 预热 ---
2025-08-22 15:00:10 [INFO] 启动 3 个线程进行预热...
2025-08-22 15:00:10 [INFO] --- ✅ oddbbo 完成: 总 137, 新增 0, 跳过 87, 失败 0, 跳过静态文件 47, 跳过不缓存 3, 耗时 7.35s ---
2025-08-22 15:00:10 [INFO] === 🎯 所有站点预热完成，总耗时 18.60s ===

3、检查缓存文件：

find /cache/fastcgi_cache/oddbbo/ -type f

4、验证响应头：

curl -I https://oddbbo.com/some-page | grep X-Cache

X-Cache: HIT 表示缓存命中。

优化与注意事项

1、低配服务器优化：

将 THREADS 设为 1 或 2（在 cache_warmer.py 中）：

批量写入失败 URL，减少 I/O：

failed_urls = []
if not success:
    count_failed += 1
    failed_urls.append(f"{time.strftime('%Y-%m-%d %H:%M:%S')} - {site_name}: {url} ({reason})")
if failed_urls:
    with open(FAILED_LOG_FILE, 'a', encoding='utf-8') as f:
        f.write("\n".join(failed_urls) + "\n")

2、URL 格式一致性：

检查 sitemap：

curl https://oddbbo.com/sitemap_index.xml | grep -E "wishlist|random|my-account"

如果 URL 带斜杠，更新 sites.json 的 no_cache_urls。

特别注意

在 Nginx 配置中，缓存文件通常存储在：

fastcgi_cache_path /cache/fastcgi_cache levels=1:2 keys_zone=MYCACHE:100m inactive=1d max_size=1g;

其中的 levels=1:2 表示：

levels: 这个参数决定了缓存文件的目录层次和每层目录名的长度。
1:2: 这是一个冒号分隔的列表，指定了每一级目录名的字符长度。
- 1 (第一级): 表示第一级子目录的名称由缓存键（key）的最后1个字符构成。
- 2 (第二级): 表示第二级子目录的名称由缓存键的倒数第2到第3个字符（共2个字符）构成。

工作原理

生成缓存键 (Key): Nginx 会根据 fastcgi_cache_key 指令定义的规则（通常是 $scheme$request_method$host$request_uri 的哈希值）生成一个唯一的缓存键。
计算目录路径: Nginx 会取这个缓存键的MD5哈希值（一个32位的十六进制字符串，如 d41d8cd98f00b204e9800998ecf8427e）。
应用 levels 规则:
- 从哈希值的末尾开始取字符。
- 取最后1个字符作为第一级目录名。
- 取倒数第2到第3个字符作为第二级目录名。