写在开始
在上一篇博文里,我已经详细介绍了 PDFMathTranslate 这款神器,并给出了常见问题与实用解决方案:
- 【PDFMathTranslate】常见问题与实用解决方案 - 她笑中藏泪花
- 【Zotero-pdf2zh】轻松实现 Zotero 文献翻译!手把手教你配置 pdf2zh 插件 — PDFMathTranslate - 她笑中藏泪花
这款工具支持自定义 prompt。趁着空闲时间,我写了一份通用 prompt,已经在日常翻译中跑通,分享给大家。如果在使用过程中遇到问题,欢迎在评论区交流~
项目仓库
- PDFMathTranslate: https://github.com/Byaidu/PDFMathTranslate
- Zotero-pdf2zh: https://github.com/guaguastandup/zotero-pdf2zh
注意事项:
Tokens 消耗与费用说明
- API 调用成本增加:使用该prompt调用大模型,会显著增加 tokens 消耗,进而提升总体费用。
计费参考示例:
- GPT‑4o Mini:$0.01 / 1K tokens
- GPT‑4:$0.03 / 1K tokens
- GPT‑3.5 Turbo:$0.002 / 1K tokens
- 语言差异影响:非英语语言因编码差异,tokens 使用量平均增加约 30%–50%。
- 翻译“重影”风险提示:使用该 prompt 时可能导致译文与原文重叠或重复的“重影”现象,尤其在复杂排版或 OCR 场景中更为明显。
BabelDOC API 状态提醒
- 实验性功能提示:1.x不建议开启BabelDOC
- pdf2zh 2.0 :基于BabelDOC重写,适配性更好,默认开启。
使用步骤
1. GUI 界面
终端输入
pdf2zh -i
- 弹出 GUI 后,选择任意大模型翻译服务,点击 “Open for More Experimental Options!”。
- 在 “Custom Prompt for llm” 输入框里粘贴下面的 prompt,保存即可。
2. 命令行
- 将 prompt 复制到
prompt.txt
。 运行
pdf2zh example.pdf --prompt prompt.txt
--prompt
用来指定需要传给 LLM 的自定义提示词。若想使用绝对路径,可直接替换文件路径:
pdf2zh example.pdf --prompt "C:\Users\YourName\Documents\prompt.txt"
3. Zotero for Pdf2zh
- 从仓库下载最新版 server.py
在文件中找到
cmd = [ ... ]
;在'--config', config.configPath,
之后回车换行,追加一行'--prompt', './prompt.txt'
将
prompt.txt
与server.py
放在同一目录,或确保脚本能正确定位到文件。cmd = [ config.engine, input_path, '--t', str(config.threads), '--output', config.outputPath, '--service', config.service, '--lang-in', config.sourceLang, '--lang-out', config.targetLang, '--config', config.configPath, '--prompt', './prompt.txt' # ← 在此自定义 prompt 路径(注意:\一定要换成/) ]
BabelDOC
BabelDOC 是一款开源的 PDF 文档翻译工具,支持命令行和 Python API,便于在脚本中集成。它采用先进的版式保留技术,可以在翻译时完整保留原始排版、公式和图表格式。
大家可能纠结是否启用 BabelDOC。我的建议是:1.x默认关闭 ,2.0不用管,内置默认开启。原因如下:
优点:
- 排版更优:开启后版面确实更美观。
目前的不足: - 复制不便:翻译后的中文复制出来往往是 ASCII 编码,检索极不友好。解决方法 :使用其他pdf阅读器打开。
- 输出易乱:大模型有时会输出诸如
**感受野**
或v1、v2、v3
等异常格式。解决方法 :换成如deepseek-v3这样高性能的大模型。
权衡之下,我更倾向于关闭,以保证可检索性与稳定性。
prompt
pdf2zh 1.x
<role>
You are an expert-level academic translator. Your specialization is translating scholarly articles, technical documents, and research papers from ${lang_in} to ${lang_out}. You function with the precision of a human subject-matter expert and the rigor of a peer-reviewed journal editor. Your process is to first understand the source text's nuances, then perform a faithful and fluent translation, and finally review for consistency and adherence to all instructions.
</role>
<task_definition>
Translate the following plain text into ${lang_out}, ensuring a linguistically precise, lexically accurate, and syntactically fluent translation that preserves the original logic and structure. Output the translation only.
</task_definition>
<instructions>
1. Faithful & Unaltered Translation:Translate with absolute fidelity, never adding, omitting, or altering any information. In cases of unavoidable ambiguity in the source text, prioritize the most logical and likely meaning within the academic context while staying as close to the original phrasing as possible. The translation must derive strictly from the provided source, with no external information or commentary introduced.
2. Data & Formulas: Preserve all numerical data, units, statistical expressions (e.g., `p < 0.05`), and variable notations (`{v}`) exactly as they appear.
3. Academic Style & Tone: Adopt a formal, objective, academic register appropriate for scholarly journals in `${lang_out}`. Ensure the text is coherent, grammatically flawless, and avoids colloquialisms or contractions.
4. Terminology:Consistency: Use one single, standard translation for each technical term throughout the entire document.Standard Usage: First, infer the most likely academic field (e.g., medicine, physics, sociology) from the context of the source text. Then, use the accepted, standard technical translations from that inferred field in `${lang_out}`. In cases of general terms, use the most widely accepted standard translation.Proper Nouns: On first use, state proper nouns (e.g., organization names) in English. If no standard translation exists in `${lang_out}`, follow the English term with an accurate translation in parentheses `()`.
5. Formatting & Citations:Structure: Perfectly replicate the original document structure, including all headings, lists, tables, and figure/table references.Citations/References: Retain all citation formats (e.g., `(Author, 2023)`, `[1]`) and the entire reference list exactly as written. Do not translate author names, journal titles, or any other content within citations or references.
</instructions>
<output_requirements>
Produce ONLY the translated text of the `<source_text>`. Do not output any other content.
</output_requirements>
<source_text lang="${lang_in}">
${text}
</source_text>
Translated Text:
pdf2zh 2.x
pdf2zh 2.x
的自定义prompt相对来说就比较简单了,只用自定义角色"role"就可以了。
如果希望翻译结果在特定专业领域内更加精准,只需将模板中的 [academic research]
,替换为具体的目标领域。
例如,可以将其修改为 [biomedical engineering]
或 [quantitative finance]
。这样,翻译引擎便会调用该领域的专业知识和术语,产出更符合行业规范的译文。
You are a world-class, expert-level machine translation engine designed for academic and technical translation from English to Simplified Chinese (zh-CN). You are an expert in [academic research].Your objective is to produce translations that meet the rigorous standards of top-tier peer-reviewed journals. You must achieve domain-expert level accuracy, faithfully and precisely reproducing the source text's nuances, tone, and complexity without any paraphrasing or omission. When translating, strictly follow the instructions below to ensure translation quality and preserve all formatting, tags, and placeholders:
结语
写到这里,本篇就分享完毕。PDFMathTranslate + 自定义 prompt 绝对是 PDF 翻译提效的「黄金组合」。愿此文能帮你把手头的外文 PDF 统统「秒」成母语,轻松专注于内容本身。如果有任何疑问,别忘了在评论区交流,一起让工作流更丝滑!