I’m not very happy with the speed of Jekyll, and with the lack of interaction with the parts of the blog made in Rmarkdown. I also do not like that Jekyll is written in Ruby, one of the languages I do not want to learn.
I have been looking at Hakyll, which is an alternative based on Haskell. The advantages are:
- It is made in the same language as Pandoc, so this is the default markdown flavor
- It is written in Haskell, which is one of the languages I would like to learn
- It is fast, since it can do things in parallel. The configuration file is indeed a Haskell program using some ad hoc libraries.
- It can be extended to do interesting things.
the bad news are:
- It is written in Haskell, which is one of the languages I still have to learn.
- It cannot be extended unless I learn Haskell.
- In particular there is no easy way to create contexts that provide the data available in the
_data
folder.
So I was preparing myself to learn Haskell when I realized that the key ideas of Hakyll can be done in a makefile. The component of Hakyll are:
- Patterns that determine what has to be processed. Stuff like
match 'post/*'
. I was unable (so far) to process files in subfolders ofpost
. - Routes, that determine the name of each output file. These routes can be as simple as keep the same name (for static elements), or replace extension by
.html
. - Rules, that determine how to make a new file based in the existing file.
These three elements can be done in a makefile.
- Patterns can be specified with
$(wildcard)
, or with a small program that fills aSRC
and aTGT
variable - Routes are just makefile rules. In most cases it will be straightforward. In some cases it can be done with a program. I use to do that with a Python template library called Jinja, which is similar to Liquid. In particular right now I have the issue of getting rid of the date in the post filenames.
- Rules are just the commands that we need to execute to transform SRC into TGT. Most of them will be just
pandoc
. One particular useful rule isrelativizePaths
. I’m doing that withawk
right now, maybe there are better ways to do
Types of rules
- static files, such as
images
,static
, andcss
, should be copied, or linked to their target. Content and name are the same - Some files need an easy and deterministic process. For example: SCSS, SASS, ipynb and dot.
- Pages and posts depend on a few templates (see below) and on YAML files that do not change often, such as the site configuration.
- Since all compilation will be done from the root of blog (as it is done today), then bibliographies should work.
- in a second stage we can add some ad hoc variables, such as prev and next. These can be given as
pandoc
command-line options. - another variable in the command line is production, which should be non-null when processing for the real public website. I think it can be defined in a makefile that includes the development makefile. One set of rules, different parameters
- the most complex case are the indices, or files that depend on the content of other files. For these I think I can use the
_data/auto
folder. Probably it will be wise to separate the YAML of previous years —that should change little but depends on several files— from the current year YAML, that changes a lot but depends on a few files. We can even be more precise on the data dependences. - In this schema it is trivial to produce several output types. It will be easy to produce PDF or DOCX version of some pages.
- The output folder structure depends on the input folder structure, and not in the header of the files. I personally like this. The tag ‘published’ will lose its sense. If it is on the
post
folder, it is published. If it is indraft
, then it is not. Category can be automatic, from the filename.
Date format
This is the hard part so far. Files can have different date formats, and we need to translate them. I think this will be handled by the same script that produces _data/auto
.
Templates
Pandoc can use templates, but they cannot use include. Moreover, I like in Jekyll that “specialized” layouts can inherit from “parent” layouts. Maintaining coherently several templates can become a headache. Therefore I think that there must be also some rule to make templates from smaller pieces.
I think this can be done in AWK. Maybe the inheritance can be done in the order of the command line:
BEGIN { old="$body$" }
$0=="$body$" {a = a old}
{a = a $0; next}
SAMEFILE {old=a}
NEWFILE END {print a}
# plus some rule to handle include
Tidying up
The last part is to get rid of files that are no longer valid. Since we can compute TGT
, and maybe some specific extras, we can have a command that removes all “extra” files before compiling.
Care must be taken with the files produced by Rmarkdown. Right now, with intermediate steps, this is not an issue.
Extensions
Pandoc also allows us to use filters to change the meaning of some markup. I think this can be done to transform footnotes into margin-notes. Maybe it can also do some computing, like python chunks.
Let’s see.