Detecting and installing blog posts dependencies on Github Actions using regex and base R

Before you start

In this post I assume you already have some familiarity with Github Actions and Blogdown. If you are interested in a gentle introduction on how to deploy your Blogdown website using Github Actions I recommend you read this post from my great friend Adson.

My first Github Actions “workflow”

Github Actions is a (relatively) new continuous integration tool that can be used for several purposes. Recently, with some help from this book and this presentation, I set up my blogdown website using github actions to prevent some other (IMO) annoying workarounds and make the source of my website easily available. Unfortunately (to me) most of the material available online on deploying Blogdown, since I prefer to deploy my page using Github pages, I had to dig a little deeper to find the needed answers.

That being said, let’s take a look at my first Github actions workflow1. Below you can find the code to do so, which is stored in a .yml file inside a directory called .github/workflow2.

When I first used this code, I thought the Rscript -e 'blogdown::build_site(local = FALSE)' part would compile the blog posts every time I made a new commit. I was amazed, because I thought it would “guess” (in some sense) the system (and R packages) dependencies and installed them. However, when I tried to push a new post last week, it was not being show on the web page. What I realized was that I was pushing the posts compiled locally. A simple change from blogdown::build_site(local = FALSE) to blogdown::build_site(local = FALSE, build_rmd = TRUE) would break the GA (Github Actions) workflow.

on:
  push:
     branches:
       - source

name: deployblog

jobs:
  blogdown:
    name: Render and deploy blogdown
    runs-on: ubuntu-18.04
    steps:
      - uses: actions/checkout@v2
        with:
          submodules: true
          fetch-depth: 0
      - uses: r-lib/actions/setup-r@v1
      - uses: r-lib/actions/setup-pandoc@v1

      - name: Install r packages for blogdown
        run: |
          Rscript -e 'install.packages(c("remotes", "rmarkdown"))' \
                  -e 'remotes::install_github("rstudio/blogdown")'

     - name: install hugo
        run: Rscript -e 'blogdown::install_hugo(extended = TRUE, version = "0.78.2")'

     - name: Get themes
       run: git submodule update --remote

     - name: Look at files
       run: ls ./public

     - name: Render blog
       run: Rscript -e 'blogdown::build_site(local = FALSE)'

     - name: Deploy
       uses: peaceiris/actions-gh-pages@v3
       with:
         github_token: ${{ secrets.GITHUB_TOKEN }}
         publish_branch: master
         publish_dir: ./public
         cname: lcgodoy.me

A workaround to install dependencies

Since I did not want to rewrite the workflow file before committing any new post, I tried to come up with a workaround. This workaround is useful only to install R packages needed by the posts, that is, it does not install system dependencies as, for example, those needed by sf. So, if I write a new post that depends on a system dependency other than the ones already being installed, I would have to update the workflow file.

So, the new workflow file is the one below. You may notice that here are several differences when compared to the last one. Firstly, there is a section installing the Linux system dependencies, including the ones needed by the sf packages. These are needed to compile some of my old posts. Right after that, we install the packages pdftools, tinytex, and run source("pkgs.R"). The first two packages are needed to compile some tikz figures I made for some posts. Lastly, at the “Render blog” part, we use blogdown::build_site(local = FALSE, build_rmd = TRUE) instead of blogdown::build_site(local = FALSE). That is, every time I make a new commit, the blog posts are compiled (have to optimize this part, maybe another workflow file).

on:
  push:
     branches:
       - source

name: deployblog

jobs:
  blogdown:
    name: Render and deploy blogdown
    runs-on: ubuntu-18.04
    steps:
      - uses: actions/checkout@v2
        with:
          submodules: true
          fetch-depth: 0
      - uses: r-lib/actions/setup-r@v1
      - uses: r-lib/actions/setup-pandoc@v1

      - name: Install r packages for blogdown
        run: |
          Rscript -e 'install.packages(c("remotes", "rmarkdown"))' \
                  -e 'remotes::install_github("rstudio/blogdown")'
  
      - name: Install system dependencies
        if: runner.os == 'Linux'
        run: |
          # install spatial dependencies
          # sudo add-apt-repository ppa:ubuntugis/ubuntugis-unstable
          sudo apt update
          sudo apt install \
            libudunits2-dev \
            libgdal-dev \
            libgeos-dev \
            libproj-dev \
            libmagick++-dev \
            imagemagick \
            ffmpeg \
            libpoppler-cpp-dev
                      
      - name: Install r packages for posts
        run: |
          Rscript -e 'install.packages("pdftools")' \
                  -e 'install.packages("tinytex")' \
                  -e 'tinytex::install_tinytex()' \
                  -e 'source("pkgs.R")'
                  
      - name: install hugo
        run: Rscript -e 'blogdown::install_hugo(extended = TRUE, version = "0.78.2")'

      - name: Get themes
        run: git submodule update --remote

      - name: Look at files
        run: ls ./public

      - name: Render blog
        run: Rscript -e 'blogdown::build_site(local = FALSE, build_rmd = TRUE)'

      - name: Deploy
        uses: peaceiris/actions-gh-pages@v3
        with:
          github_token: ${{ secrets.GITHUB_TOKEN }}
          publish_branch: master
          publish_dir: ./public
          cname: lcgodoy.me

A dependencies free script to install the needed packages

The pkgs.R script mentioned above is a dependency free script that needs to be placed in the “main” directory of the repo. The script is as displayed below. First we define a helper function used to extract patterns from a string (regex_extract). Next, we “detect” all the Rmd files within the repository. The lines of code after that are used to read all the Rmd files, then check all the times these files call either library, require, or use the :: operator. Finally, from this last operation, we can easily identify the packages being used by each post and, consequently, install them.

## helper function
regex_extract <- function(x, pattern)
    regmatches(x = x,
               m = regexpr(pattern = pattern,
                           text    = x))

rmd_files <-
  list.files(path       = ".", 
             pattern    = "*\\.Rmd$",
             full.names = TRUE,
             recursive  = TRUE)

rmd_files <- lapply(rmd_files, readLines)

packages_1 <-
  lapply(rmd_files,
         function(x) {
           grep(pattern = "(::|library\\(|require\\()",
                x = x, value = TRUE)
         })


lib_or_req <-
  lapply(packages_1,
         function(x) {
           gsub(pattern = "(require|library|\\(|\\))", "",
                regex_extract(x,
                              "(^library|^require)\\((.*?)\\)"))
         })

lib_or_req <- unlist(lib_or_req)

colon_lib <-
  lapply(packages_1,
         function(x) {
           pkgs <- regex_extract(x,
                                 "\\s*(\\w*)::")
           trimws(gsub("::", "", pkgs))
         })

colon_lib <- unlist(colon_lib)

to_inst <- unique(c(lib_or_req, colon_lib))

install.packages(to_inst[to_inst != ""])

  1. The workflow here is the set of instructions for the deployment of the website.↩︎

  2. As I said before, I’ll omit most of the details about the tool itself, this post can be seen as a complement to Adson’s post.↩︎

Avatar
Lucas Godoy
PhD Candidate / TA /GA

I’m a PhD Candidate in Stats interested in R, Open Data, and the most diverse applications of statistics.

comments powered by Disqus