Fetching Remote Info

Header image for Fetching Remote Info

Get the latest commit dates for your repos each time you re-build a Jekyll site.

7 minute reading time of 1424 words (inc code) Code available at codeberg.org and the last commit was

Intro

Most of these posts about code have a front matter key that points to the public repository (like this one) and it’s easy to then use that and make the nice link you see in the bit above this.

But that felt incomplete - as the only date available to me was the default date of the post, anything written years ago looks like I’ve not touched the repo in that long either. (Which is admittedly true for some of them, but I do occasionally update things!)

So this is a quick plugin to get Jekyll retrieving the last commit date for a repository linked to a post. And, as I was/am in the process of moving from GitHub to Codeberg, I thought I’d base it around Codeberg’s RSS feeds.

Requirements

You’ll need to install the down gem for this plugin, because I’ve no reason to doubt that open-uri being iffy isn’t still valid, and I’m not writing a safer wrapper around it when that already exists.

Usage

You can configure the plugin away from the defaults by adding to your _config.yml file:

get_remote_repo_info:
  categories: ['code']
  fm_key: 'repository'
  cache: '_cache/GetRemoteRepoInfo/'
  cache_time: 604800

Which will let you change what categories of post are checked to see if they have a repository front matter key, the name of that key, and the cache location and maximum age (one week by default).

By default, for each post in the code category, the plugin will fetch the URL + .rss corresponding to the repository: URL front matter, as so long as that URL starts with “https://codeberg.org/”.

It’ll save the returned file, using that instead of re-downloading it each time you build the site.

The Plugin

require "jekyll"

# standard libraries
require "date"
require "fileutils"
require "pathname"
require "rexml/document"

# gem
require "down"

module Jekyll
  class GetRemoteRepoInfo < Generator
    
    DEFAULT = {
      # which categories of post to check for the key
      :categories => ["code"],
      # and the name of that front matter key that contains the repo url for that post
      :fm_key     => "repository",
      # relative path to cache the remote info
      :cache      => "_cache/GetRemoteRepoInfo/",
      # cached files are valid for 1 week
      :cache_time => 7*86400,
      # 1Mb max size for whatever info we're retrieving remotely
      :remote_max => 1048576,
    }
    
    # generate log messages
    LOG = true
    
    # which remote repos this plugin supports - ie. only ones with a .rss feed for now
    SUPPORTED = [
      "https://codeberg.org/"
    ]
    
    def generate(site)
      @site = site
      @config = Jekyll::Utils.deep_merge_hashes DEFAULT, @site.config['get_remote_repo_info'].transform_keys(&:to_sym) || {}
      log "Using config of #{@config}"
      
      # make sure we've got a cache directory to save the info in
      FileUtils.mkdir_p(Pathname.pwd + @config[:cache])
      
      # loop through all the posts
      @site.collections["posts"].docs.each_with_index do |post, index|
        
        # check the post has a category that intersects with our config
        if post.data.key?("category") and not (Array(post.data["category"]) & @config[:categories]).empty?
          log "#{post.data["title"]} is in the right category"
          
          # and if it does, then check to see if it also has the right front matter key too
          if post.data.key?(@config[:fm_key]) and not (post.data[@config[:fm_key]].nil? or post.data[@config[:fm_key]].empty?)
            repo_url = post.data[@config[:fm_key]]
            if not SUPPORTED.map { |url| repo_url.scan /#{url}/ }.flatten.any?
              log "#{post.data["title"]} has an UNSUPPORTED repo at #{post.data[@config[:fm_key]]}"
              next
            else
              log "#{post.data["title"]} has a supported repo at #{post.data[@config[:fm_key]]}"
              parse repo_url
              post.data["#{@config[:fm_key]}_latest_date"] = @result[:date]
              post.data["#{@config[:fm_key]}_latest_link"] = @result[:link]
            end
          end
        end
        
      end
    end
    
    def parse(repo_url)
      # our local copy
      cached = Pathname.pwd + @config[:cache] + sanitize_filename(repo_url)
      
      # go get the remote repo info feed (or use the cached copy)
      if not File.file?(cached) or (Time.now - File.stat(cached).mtime).to_i > @config[:cache_time]
        log "Downloading repo info for #{repo_url}", "info"
        Down.download("#{repo_url}.rss", destination: cached, max_redirects: 1, max_size: @config[:remote_max])
      else
        log "Repo info for #{repo_url} exists in cache"
      end
        
      # now parse that saved file for the latest date we've done something to the repo
      # NOTE: a fairly big assumption of that "something" being a commit to the main branch
      # and not anything else
      document = REXML::Document.new(open(cached))
      
      latest = document.elements().to_a("//item").first
      @result = {
        :date => DateTime.parse(REXML::XPath.first(latest, "pubDate").text),
        :link => REXML::XPath.first(latest, "link").text
      }
      
      log "Have a latest date of #{@result[:date]} for #{repo_url}", "info"
    end
    
    private
    
    # utility function to save typing
    def log(message, type = "debug")
      case type
        when "info"
          Jekyll.logger.info  "GetRemoteRepoInfo: #{message}"
        when "error"
          Jekyll.logger.error "GetRemoteRepoInfo: #{message}"
        when "warn"
          Jekyll.logger.warn  "GetRemoteRepoInfo: #{message}"
        else
          Jekyll.logger.debug "GetRemoteRepoInfo: #{message}"
      end if LOG
    end
    
    # pretty much just for stripping the slashes etc out of a url
    def sanitize_filename(filename)
      fn = Array(filename)
      fn.map! { |s| s.gsub /[^a-z0-9\-]+/i, '_' }
      return fn.first
    end
  end
  
end

The latest commit date (and link to that commit) is then available to each post under the {fm_key}_latest_date and {fm_key}_latest_link keys - eg. by default {{ page.repository_latest_date }} will have a Date object you can format how you want on the front end.


Reply via email