博客已经更新了半年多了,前期做了些seo优化,但是发现好像作用不大,google和必应都没怎么收录,最多就收录了主页,去看了一下爬虫日志,频率也非常低,基本就没怎么爬,没办法,是时候要主动提交了。
博客是hexo框架,文章资源保存在github,push提交会触发cloudflare pages部署,基于这个思路,本来想着有没有什么插件可以实现自动提交,找了一圈,发现网上说得比较多的是:
按照官方文档部署后,触发测试,发现并没有提交成功,我这里猜测是因为这两个框架是在hexo d得时候才进行触发的,但是我的部署逻辑是不会进行hexo d的操作的,在cloudflare端,直接hexo g后,取用public目录部署就完了。没办法,得想一想其他办法,所以就想直接使用github actions自动提交触发,而且这个方法也不错,如果使用差价,你在提交到google时,因为是在国内,不能直接访问,所以还要配置代理,太蛮烦了,而且我也没海外服务器,直接用guthub不香吗?不废话了,直接给教程。
获取必要key和配置 必应
注册、登陆必应新站长平台 Bing Webmaster Tools
添加网站
进入网站管理页面,设置,API 访问,API 密钥 ,记下 API 密匙
谷歌 可以参考官方教程:Indexing API 快速入门 | Google 搜索中心 | Google for Developers
这里不赘诉了,重要的文件是导出的json文件,这个要保留好。
github actions配置 在博客源码目录下面,找到.github文件,进去,然后创建workflows文件夹,然后在这个文件夹创建bing-submission.yml和google-indexing.yml文件,结构如下:
主目录
.github
workflows
bing-submission.yml
google-indexing.yml
bing-submission.yml内容如下:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 name: Bing URL Submission on: schedule: - cron: '0 0 * * *' workflow_dispatch: jobs: submit-urls: runs-on: ubuntu-latest steps: - name: Checkout code uses: actions/checkout@v4 - name: Set up Python uses: actions/setup-python@v4 with: python-version: '3.x' - name: Install dependencies run: | python -m pip install --upgrade pip pip install requests - name: Run Bing submission script env: BING_API_KEY: ${{ secrets.BING_API_KEY }} run: python bing_submission.py
google-indexing.yml内容如下:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 name: Submit URLs to Google Indexing API on: workflow_dispatch: # 允许手动触发 schedule: # 根据需要设置定时触发,例如每天UTC 12:00运行 - cron: '0 12 * * *' env: CREDENTIALS_JSON: ${{ secrets.GOOGLE_CREDENTIALS_JSON }} jobs: submit-urls: name: Submit URLs to Google Indexing API runs-on: ubuntu-latest steps: - name: Checkout code uses: actions/checkout@v4 - name: Set up Python uses: actions/setup-python@v5 with: python-version: '3.x' - name: Install dependencies run: | python -m pip install --upgrade pip pip install google-api-python-client oauth2client requests - name: Create credentials file run: echo "$CREDENTIALS_JSON" > service-account.json - name: Run Google Indexing Script run: python google_indexing.py # 确保脚本名与实际一致 env: # 如有需要,可在此设置其他环境变量 SITEMAP_URL: 'https://www.flyday.top/sitemap.xml' - name: Upload results as artifact uses: actions/upload-artifact@v4 with: name: submission-results path: submission_results.txt
注意到这里有两个环境变量,也就是key,我们通过环境变量形式导入,而不直接写入文件,注意安全。
进入github仓库,点击顶部settings,找到左边侧栏Secrets and variables下面的actions,点击New repository secret,分别添加如下:
name
内容
BING_API_KEY
必应key,前面获取的
GOOGLE_CREDENTIALS_JSON
复制前面导出的json文件内容
到这里,actions配置就完成了。
提交脚本配置 直接在仓库根目录下创建如下两个脚本:
bing_submission.py
google_indexing.py
bing_submission.py内容如下:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 import requestsimport reimport randomimport osfrom typing import List sitemap_url = "https://www.xx.top/sitemap.xml" bing_api_url = "https://ssl.bing.com/webmaster/api.svc/json/SubmitUrlbatch" apikey = os.getenv("BING_API_KEY" ) site_url = "https://www.xx.top" n = 10 def fetch_sitemap_urls (sitemap_url: str ) -> List [str ]: """提取 Sitemap 中的链接""" response = requests.get(sitemap_url) response.raise_for_status() sitemap_content = response.text urls = re.findall(r"<loc>(.*?)</loc>" , sitemap_content) return urls def submit_urls_to_bing (api_url: str , apikey: str , site_url: str , url_list: List [str ] ): """提交链接到 Bing API""" headers = { "Content-Type" : "application/json; charset=utf-8" , } payload = { "siteUrl" : site_url, "urlList" : url_list } params = { "apikey" : apikey } response = requests.post(api_url, headers=headers, params=params, json=payload) response.raise_for_status() return response.json() def main (): try : if not apikey: print ("Error: BING_API_KEY environment variable is not set" ) return print ("Fetching URLs from sitemap..." ) all_urls = fetch_sitemap_urls(sitemap_url) print (f"Fetched {len (all_urls)} URLs from sitemap." ) if not all_urls: print ("No URLs found in sitemap." ) return urls_to_submit = random.sample(all_urls, min (n, len (all_urls))) print (f"Randomly selected {len (urls_to_submit)} URLs for submission." ) for url in urls_to_submit: print (f" - {url} " ) print ("Submitting URLs to Bing..." ) response = submit_urls_to_bing(bing_api_url, apikey, site_url, urls_to_submit) print (f"URLs submitted successfully: {response} " ) except Exception as e: print (f"An error occurred: {e} " ) raise if __name__ == "__main__" : main()
google_indexing.py内容如下:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 from googleapiclient.discovery import buildfrom oauth2client.service_account import ServiceAccountCredentialsimport requestsimport xml.etree.ElementTree as ETimport timeimport randomdef get_urls_from_sitemap (sitemap_url ): """从sitemap获取所有URL""" try : print (f"正在获取sitemap: {sitemap_url} " ) response = requests.get(sitemap_url, timeout=30 ) response.raise_for_status() root = ET.fromstring(response.content) namespaces = {'ns' : 'http://www.sitemaps.org/schemas/sitemap/0.9' } urls = [] for url_element in root.findall('ns:url' , namespaces): loc = url_element.find('ns:loc' , namespaces) if loc is not None and loc.text: urls.append(loc.text.strip()) print (f"从sitemap成功获取 {len (urls)} 个URL" ) return urls except requests.exceptions.RequestException as e: print (f"网络请求失败: {e} " ) return [] except ET.ParseError as e: print (f"XML解析失败: {e} " ) return [] except Exception as e: print (f"解析sitemap失败: {e} " ) return [] def publish_batch (urls_batch, credentials_file ): """处理一批URL""" successful = [] if not urls_batch: return successful requests_dict = {url: "URL_UPDATED" for url in urls_batch} SCOPES = ["https://www.googleapis.com/auth/indexing" ] try : credentials = ServiceAccountCredentials.from_json_keyfile_name(credentials_file, scopes=SCOPES) service = build('indexing' , 'v3' , credentials=credentials) def index_api_callback (request_id, response, exception ): if exception is not None : print (f'请求失败 - ID: {request_id} , 错误: {exception} ' ) else : successful_url = response['urlNotificationMetadata' ]['url' ] successful.append(successful_url) print (f'提交成功: {successful_url} ' ) batch = service.new_batch_http_request(callback=index_api_callback) for url, api_type in requests_dict.items(): batch.add(service.urlNotifications().publish( body={"url" : url, "type" : api_type})) print ("正在执行批量提交..." ) batch.execute() except Exception as e: print (f"API调用失败: {e} " ) return successful def publish (): """主函数:从sitemap获取URL并随机选择200个提交到Google Indexing API""" sitemap_urls = [ 'https://www.xxx.top/sitemap.xml' , ] credentials_file = 'service-account.json' max_urls_to_submit = 200 batch_size = 100 delay_between_batches = 2 all_urls = [] for sitemap_url in sitemap_urls: urls = get_urls_from_sitemap(sitemap_url) all_urls.extend(urls) if not all_urls: print ("没有从sitemap获取到任何URL,程序退出" ) return print (f"总共获取到 {len (all_urls)} 个URL" ) if len (all_urls) > max_urls_to_submit: selected_urls = random.sample(all_urls, max_urls_to_submit) print (f"随机选择了 {len (selected_urls)} 个URL进行提交" ) else : selected_urls = all_urls print (f"URL数量不足 {max_urls_to_submit} ,将提交全部 {len (selected_urls)} 个URL" ) all_successful = [] total_batches = (len (selected_urls) + batch_size - 1 ) // batch_size print (f"开始分批处理,共 {total_batches} 批,每批最多 {batch_size} 个URL" ) for i in range (0 , len (selected_urls), batch_size): batch_num = i // batch_size + 1 batch_urls = selected_urls[i:i + batch_size] print (f"\n=== 处理第 {batch_num} /{total_batches} 批 ===" ) print (f"本批包含 {len (batch_urls)} 个URL" ) successful = publish_batch(batch_urls, credentials_file) all_successful.extend(successful) print (f"本批成功提交 {len (successful)} 个URL" ) if batch_num < total_batches and delay_between_batches > 0 : print (f"等待 {delay_between_batches} 秒后处理下一批..." ) time.sleep(delay_between_batches) print (f"\n=== 处理完成 ===" ) print (f"总共成功提交: {len (all_successful)} /{len (selected_urls)} 个URL" ) if len (all_successful) < len (selected_urls): failed_count = len (selected_urls) - len (all_successful) print (f"失败: {failed_count} 个URL" ) if __name__ == "__main__" : print ("开始从sitemap获取URL并随机选择200个提交到Google Indexing API..." ) publish() print ("程序执行完毕" )
最后直接把所有修改提交到仓库即可,每日会自动触发,当然第一次可以手动触发测试,进入仓库,点击顶部的actions,可以在侧边All workflows看到我们新增的两个action:
任选一个,点击Run workflow执行即可:
到这里就没了,当然具体的提交逻辑可以自行修改提交脚本,这里需要注意的是:
bing每日提交是有上限的,大部分人每日好像只有10额度【离谱】
google index api每日免费额度是200,别超了,好像超出是付费的
不过也很奇怪,我网站没有收录,但是访问量也不错,有点奇怪,目前还没仔细去分析流量:
流量排名靠前的国家/地区
国家/地区
流量
Netherlands
2,004
United States
1,488
China
995
Korea, South
450
Japan
419