Fetching Remote Info
Get the latest commit dates for your repos each time you re-build a Jekyll site.
Intro
Most of these posts about code have a front matter key that points to the public repository (like this one) and it’s easy to then use that and make the nice link you see in the bit above this.
But that felt incomplete - as the only date available to me was the default date of the post, anything written years ago looks like I’ve not touched the repo in that long either. (Which is admittedly true for some of them, but I do occasionally update things!)
So this is a quick plugin to get Jekyll retrieving the last commit date for a repository linked to a post. And, as I was/am in the process of moving from GitHub to Codeberg, I thought I’d base it around Codeberg’s RSS feeds.
Requirements
You’ll need to install the down gem for this plugin, because I’ve no reason to doubt that open-uri being iffy isn’t still valid, and I’m not writing a safer wrapper around it when that already exists.
Usage
You can configure the plugin away from the defaults by adding to your _config.yml
file:
get_remote_repo_info:
categories: ['code']
fm_key: 'repository'
cache: '_cache/GetRemoteRepoInfo/'
cache_time: 604800
Which will let you change what categories of post are checked to see if they have a repository front matter key, the name of that key, and the cache location and maximum age (one week by default).
By default, for each post in the code
category, the plugin will fetch the URL + .rss
corresponding to the repository: URL
front matter, as so long as that URL starts with “https://codeberg.org/”.
It’ll save the returned file, using that instead of re-downloading it each time you build the site.
The Plugin
require "jekyll"
# standard libraries
require "date"
require "fileutils"
require "pathname"
require "rexml/document"
# gem
require "down"
module Jekyll
class GetRemoteRepoInfo < Generator
DEFAULT = {
# which categories of post to check for the key
:categories => ["code"],
# and the name of that front matter key that contains the repo url for that post
:fm_key => "repository",
# relative path to cache the remote info
:cache => "_cache/GetRemoteRepoInfo/",
# cached files are valid for 1 week
:cache_time => 7*86400,
# 1Mb max size for whatever info we're retrieving remotely
:remote_max => 1048576,
}
# generate log messages
LOG = true
# which remote repos this plugin supports - ie. only ones with a .rss feed for now
SUPPORTED = [
"https://codeberg.org/"
]
def generate(site)
@site = site
@config = Jekyll::Utils.deep_merge_hashes DEFAULT, @site.config['get_remote_repo_info'].transform_keys(&:to_sym) || {}
log "Using config of #{@config}"
# make sure we've got a cache directory to save the info in
FileUtils.mkdir_p(Pathname.pwd + @config[:cache])
# loop through all the posts
@site.collections["posts"].docs.each_with_index do |post, index|
# check the post has a category that intersects with our config
if post.data.key?("category") and not (Array(post.data["category"]) & @config[:categories]).empty?
log "#{post.data["title"]} is in the right category"
# and if it does, then check to see if it also has the right front matter key too
if post.data.key?(@config[:fm_key]) and not (post.data[@config[:fm_key]].nil? or post.data[@config[:fm_key]].empty?)
repo_url = post.data[@config[:fm_key]]
if not SUPPORTED.map { |url| repo_url.scan /#{url}/ }.flatten.any?
log "#{post.data["title"]} has an UNSUPPORTED repo at #{post.data[@config[:fm_key]]}"
next
else
log "#{post.data["title"]} has a supported repo at #{post.data[@config[:fm_key]]}"
parse repo_url
post.data["#{@config[:fm_key]}_latest_date"] = @result[:date]
post.data["#{@config[:fm_key]}_latest_link"] = @result[:link]
end
end
end
end
end
def parse(repo_url)
# our local copy
cached = Pathname.pwd + @config[:cache] + sanitize_filename(repo_url)
# go get the remote repo info feed (or use the cached copy)
if not File.file?(cached) or (Time.now - File.stat(cached).mtime).to_i > @config[:cache_time]
log "Downloading repo info for #{repo_url}", "info"
Down.download("#{repo_url}.rss", destination: cached, max_redirects: 1, max_size: @config[:remote_max])
else
log "Repo info for #{repo_url} exists in cache"
end
# now parse that saved file for the latest date we've done something to the repo
# NOTE: a fairly big assumption of that "something" being a commit to the main branch
# and not anything else
document = REXML::Document.new(open(cached))
latest = document.elements().to_a("//item").first
@result = {
:date => DateTime.parse(REXML::XPath.first(latest, "pubDate").text),
:link => REXML::XPath.first(latest, "link").text
}
log "Have a latest date of #{@result[:date]} for #{repo_url}", "info"
end
private
# utility function to save typing
def log(message, type = "debug")
case type
when "info"
Jekyll.logger.info "GetRemoteRepoInfo: #{message}"
when "error"
Jekyll.logger.error "GetRemoteRepoInfo: #{message}"
when "warn"
Jekyll.logger.warn "GetRemoteRepoInfo: #{message}"
else
Jekyll.logger.debug "GetRemoteRepoInfo: #{message}"
end if LOG
end
# pretty much just for stripping the slashes etc out of a url
def sanitize_filename(filename)
fn = Array(filename)
fn.map! { |s| s.gsub /[^a-z0-9\-]+/i, '_' }
return fn.first
end
end
end
The latest commit date (and link to that commit) is then available to each post under the {fm_key}_latest_date
and {fm_key}_latest_link
keys - eg. by default {{ page.repository_latest_date }}
will have a Date object you can format how you want on the front end.