大模型翻译个人博客post | Zhen Lu

背景

前一段时间，大家想必已经在小红书上感受到了大模型双语翻译的魅力，应该来讲，信达雅的程度已经相当高了。对于一些个人技术博客网站而言，依靠大模型中英文翻译的能力，我们已经可以实现快速发布中英文双语版本的博客文章。

受到 Rico00121/hugo-translator 项目的启发（地址：https://github.com/Rico00121/hugo-translator），我们利用 python 和 deepseek 对自己依靠 quarto 构建的个人网站上的文章进行了全部的自动翻译，下面是翻译的效果：

实现

API 接入

首先是接入硅基流动上可选的模型，我们没有使用 deepseek 官网的 api：

# load the env file
    load_dotenv('Python/materials/.env', override= True)
    if not os.path.exists('Python/materials/.env'):
        print(f'The file .env does not exist.')
        sys.exit(1)
    if not os.getenv('OPENAI_API_KEY'):
        print(f'The OPENAI_API_KEY is not set.')
        sys.exit(1)
    
    client= None
    if llm_type== 'openai':
        client= openai.OpenAI(api_key= os.getenv('OPENAI_API_KEY'))
        print('Using OpenAI to translate the post...')
    else:
        client= openai.OpenAI(
            api_key= os.getenv('OPENAI_API_KEY'),
            base_url= os.getenv('DEEPSEEK_API_URL')
        )
        print('Using DeepSeek to translate the post...')

这里，提醒大家，注意通过 .env 文件管理 API 密钥，避免 hard code 造成泄露风险。关于硅基流动的 API 使用，大家可以参考他们的官方文档或者我们之前的这篇文章。

读取 qmd 文件

对于 qmd 文件，我们需要识别应用于我们网页的 metadata，对于这部分信息，我们翻译了 title 并回写，以保证翻译后生成的 _en.qmd 文件可直接用于静态网站生成，无需手动调整。代码如下：

# Load the QMD file
with open(file_path, 'r', encoding='utf-8') as f:
    post = frontmatter.load(f)

# Get the title and content of the post
title = post.metadata.get('title', 'Untitled')
content = post.content

# Translate the title and content
en_title = translate_title(title, llm_type)
en_content = translate_text(content, llm_type)

# Prepare metadata for the translated post
en_metadata = post.metadata.copy()
en_metadata['title'] = en_title

# Create a new post with translated content
en_post = frontmatter.Post(content=en_content, **en_metadata)

# Save the translated post to a new file
en_file_path = file_path.replace('.qmd', '_en.qmd')
with open(en_file_path, "w", encoding="utf-8") as f:
    f.write(frontmatter.dumps(en_post))

print(f'Successfully translated the post to {en_file_path}')

翻译函数

再次感谢 Rico00121/hugo-translator 项目，由于原项目在翻译时，对于 code 的支持并不好，我们在原有的基础上，修改了大模型的 prompt，增加了对代码的翻译支持：要求翻译时，保留代码的原有格式和内容，不要破坏 Markdown 格式。

def translate_text(text, llm_type):
    """
    translate text to english using llm
    """
    total_length= len(text)
    translated_text= ''

    print(f'Start translating the main text...')
    # translate in chunks to show progress
    chunk_size= 1000
    for i in range(0, total_length, chunk_size):
        chunk= text[i:i+chunk_size]
        translated_text+= get_translation(llm_type,
                                          [
                                              {
                                                'role': 'system', 
                                                'content': 'You are a professional translator. '
                                                   'Translate the following Chinese text into English. '
                                                   'Strictly preserve the original text format in qmd file, including line breaks, indentation, and any special characters for inserting codes. '
                                                   'Do not add, remove, or modify any content. '
                                                   'Only return the translated text without any additional explanation, comments, or extra content.'
                                              },
                                              {
                                                'role': 'user', 
                                                'content': chunk
                                              }
                                          ]
                                         )
        progress= min((i+chunk_size)/total_length*100, 100)
        print(f'Tranlation progress: {progress: .2f}%')
    
    return translated_text

这里我们同样采取了分块翻译的方式，避免大模型的 token 长度限制。最终效果前面已经展示。最终输出的格式完美的英文 .qmd 文件，我们可以直接部署到 GitHub Pages。完整 python 代码已经放进了星球里。

Takeaway

基于这种方式，对于需要中英文双语的 API 文档、个人技术教程、博客论文等，我们可以实现快速的中英文双语版本的生成。对于一些需要定期更新的内容，我们可以通过定时任务，自动化的实现中英文双语版本的更新。

--- title: "大模型翻译个人博客post" date: 2025-04-16 description: "LLM Translator" image: "https://cdn.jsdelivr.net/gh/Leslie-Lu/WeChatOfficialAccount/img_2025/aa.png" categories: - python - llm - translator - qmd - large-language-model format: html: shift-heading-level-by: 1 include-in-header: - text: | <style type="text/css"> hr.dinkus { width: 50px; margin: 2em auto 2em; border-top: 5px dotted #454545; } div.column-margin+hr.dinkus { margin: 1em auto 2em; } </style> --- ## 背景前一段时间，大家想必已经在小红书上感受到了大模型双语翻译的魅力，应该来讲，信达雅的程度已经相当高了。对于一些个人技术博客网站而言，依靠大模型中英文翻译的能力，我们已经可以实现快速发布中英文双语版本的博客文章。受到 `Rico00121/hugo-translator` 项目的启发（地址：https://github.com/Rico00121/hugo-translator），我们利用 python 和 deepseek 对自己依靠 quarto 构建的个人网站上的文章进行了全部的自动翻译，下面是翻译的效果： ![中英本双语版本对比](https://cdn.jsdelivr.net/gh/Leslie-Lu/WeChatOfficialAccount/img_2025/aa.png) ## 实现 ### API 接入首先是接入硅基流动上可选的模型，我们没有使用 deepseek 官网的 api： ```{python} #| eval: false # load the env file load_dotenv('Python/materials/.env', override= True) if not os.path.exists('Python/materials/.env'): print(f'The file .env does not exist.') sys.exit(1) if not os.getenv('OPENAI_API_KEY'): print(f'The OPENAI_API_KEY is not set.') sys.exit(1) client= None if llm_type== 'openai': client= openai.OpenAI(api_key= os.getenv('OPENAI_API_KEY')) print('Using OpenAI to translate the post...') else: client= openai.OpenAI( api_key= os.getenv('OPENAI_API_KEY'), base_url= os.getenv('DEEPSEEK_API_URL') ) print('Using DeepSeek to translate the post...') ``` 这里，提醒大家，注意通过 `.env` 文件管理 API 密钥，避免 hard code 造成泄露风险。关于硅基流动的 API 使用，大家可以参考他们的官方文档或者我们之前的这篇[文章](https://mp.weixin.qq.com/s/aVknGB4hCdEhYxEseV1YnQ)。 ### 读取 qmd 文件对于 qmd 文件，我们需要识别应用于我们网页的 metadata，对于这部分信息，我们翻译了 title 并回写，以保证翻译后生成的 _en.qmd 文件可直接用于静态网站生成，无需手动调整。代码如下： ```{python} #| eval: false # Load the QMD file with open(file_path, 'r', encoding='utf-8') as f: post = frontmatter.load(f) # Get the title and content of the post title = post.metadata.get('title', 'Untitled') content = post.content # Translate the title and content en_title = translate_title(title, llm_type) en_content = translate_text(content, llm_type) # Prepare metadata for the translated post en_metadata = post.metadata.copy() en_metadata['title'] = en_title # Create a new post with translated content en_post = frontmatter.Post(content=en_content, **en_metadata) # Save the translated post to a new file en_file_path = file_path.replace('.qmd', '_en.qmd') with open(en_file_path, "w", encoding="utf-8") as f: f.write(frontmatter.dumps(en_post)) print(f'Successfully translated the post to {en_file_path}') ``` ### 翻译函数再次感谢 `Rico00121/hugo-translator` 项目，由于原项目在翻译时，对于 code 的支持并不好，我们在原有的基础上，修改了大模型的 prompt，增加了对代码的翻译支持：要求翻译时，保留代码的原有格式和内容，不要破坏 Markdown 格式。 ```{python} #| eval: false def translate_text(text, llm_type): """ translate text to english using llm """ total_length= len(text) translated_text= '' print(f'Start translating the main text...') # translate in chunks to show progress chunk_size= 1000 for i in range(0, total_length, chunk_size): chunk= text[i:i+chunk_size] translated_text+= get_translation(llm_type, [ { 'role': 'system', 'content': 'You are a professional translator. ' 'Translate the following Chinese text into English. ' 'Strictly preserve the original text format in qmd file, including line breaks, indentation, and any special characters for inserting codes. ' 'Do not add, remove, or modify any content. ' 'Only return the translated text without any additional explanation, comments, or extra content.' }, { 'role': 'user', 'content': chunk } ] ) progress= min((i+chunk_size)/total_length*100, 100) print(f'Tranlation progress: {progress: .2f}%') return translated_text ``` 这里我们同样采取了分块翻译的方式，避免大模型的 token 长度限制。最终效果前面已经展示。最终输出的格式完美的英文 `.qmd` 文件，我们可以直接部署到 GitHub Pages。完整 python 代码已经放进了[星球](https://mp.weixin.qq.com/s/4IR-KMAZ-q2VbI0Fz4fYRg)里。 ## Takeaway 基于这种方式，对于需要中英文双语的 API 文档、个人技术教程、博客论文等，我们可以实现快速的中英文双语版本的生成。对于一些需要定期更新的内容，我们可以通过定时任务，自动化的实现中英文双语版本的更新。