今日阅读:
- 提高博客加载速度:PNG 图片压缩
小结:之前压缩图片基本就是随便找个在线服务,或者用我一直用的PP压压制一下,没在意过这些软件的横向对比,更不用说底层实现。 - 美团骑手转职指南
小结:喜欢这种第一视角的角色转换体验记录。
今日软件:
- Quartz 4
也就是此博客现在用的框架,一开始想用jekyll框架来搭建,因为我之前有用过这个框架的经验。
但是后面发现想自己做Graph View之类的功能还是挺折腾的,加上我习惯用Obsidian来写日记,后续整理的话还想活用反向链接功能。直到发现Quartz4这个专门为Obsidian设计的发布工具,也是官方推荐的框架。 - Pixzip
图片压缩工具Pxzip。虽然市面上的图片压缩工具很多,但是支持avif的不多见。我新博客中的图片除了部分动态图,其余全部使用avif,加载速度更快。 - voice-models
vits之类的音声模型整合站,各种参数标注都很全。
今日代码:
from bs4 import BeautifulSoup
import requests
import os
import time
from urllib.parse import urljoin
import html2text
import logging
from retrying import retry
# 设置日志
logging.basicConfig(level=logging.INFO)
# 常量和配置
BASE_URL = "http://example.com"
HEADERS = {'Cookie': 'your_cookie_value'}
SESSION = requests.Session()
SESSION.headers.update(HEADERS)
# 重试装饰器设置
@retry(stop_max_attempt_number=3, wait_fixed=2000)
def get_with_retry(url):
response = SESSION.get(url, timeout=10)
response.raise_for_status()
return response
def download_image(img_url, save_folder, image_num):
try:
os.makedirs(save_folder, exist_ok=True)
local_path = os.path.join(save_folder, f'image{image_num}.png')
response = get_with_retry(img_url)
with open(local_path, 'wb') as f:
for chunk in response.iter_content(chunk_size=8192):
f.write(chunk)
return local_path
except Exception as e:
logging.error(f"Error downloading image: {e}")
return None
def html_to_markdown(html, title, base_image_folder='downloaded_images'):
try:
soup = BeautifulSoup(html, 'html.parser')
content_div = soup.find(id="main-content")
if not content_div:
return "No content found in 'main-content'"
image_num = 1
title_for_path = title.replace('/', '-').replace('\\', '-')
image_folder = os.path.join(base_image_folder, title_for_path)
for img_tag in content_div.find_all('img', class_="confluence-embedded-image"):
img_src = img_tag.get('data-image-src') or img_tag.get('src')
if img_src:
img_src = urljoin(BASE_URL, img_src) if not img_src.startswith('http') else img_src
local_image_path = download_image(img_src, image_folder, image_num)
if local_image_path:
relative_image_path = os.path.relpath(local_image_path, os.path.dirname(image_folder)).replace('\\', '/')
img_markdown = f""
img_tag.replace_with(BeautifulSoup(img_markdown, 'html.parser'))
image_num += 1
h = html2text.HTML2Text()
h.ignore_links = False
return h.handle(str(content_div))
except Exception as e:
logging.error(f"Error converting HTML to Markdown: {e}")
return ""
def download_page(page, title):
try:
url = f"{BASE_URL}/{page}"
response = get_with_retry(url)
html_content = response.text
markdown = html_to_markdown(html_content, title)
filename = f"{title}.md"
with open(filename, 'w', encoding='utf-8') as f:
f.write(markdown)
logging.info(f"Markdown content saved to {filename}.")
except Exception as e:
logging.error(f"Error downloading page: {e}")
def get_page_list(month, year, key):
try:
url = f"{BASE_URL}/rest/ia/1.0/pagetree/blog/subtree?spaceKey={key}&groupType=2&groupValue={month}%2F{year}"
response = get_with_retry(url)
json_content = response.json()
if not json_content:
logging.info(f"No blog data for {year}-{month}, skipping.")
return
for page_info in json_content:
download_page(page_info['url'], page_info['title'])
except Exception as e:
logging.error(f"Error getting page list: {e}")
if __name__ == "__main__":
for year in range(2023, 2025):
for month in range(1, 13):
get_page_list(f"{month:02}", year, "your_space_key")
今日见闻:
见闻太多,不想列举。
今日废话:
博客迁移真累,和在跑步机上跑1500米差不多。
跑完1500米再去做博客迁移更累!
买了一台成品NAS,自带4T硬盘的极空间Q2C。自带的内网穿透有点惊喜,还没去公司办理随性私网已经有不错的速度了。
唯一可惜的是我买的早,高配的Z2pro还没出活动就买了,因为是公司内购所以也不支持退换,只能抱着这个没有docker的NAS,慢慢折腾吧。