Debugging slow bash startup files

Recently I found that opening a new bash session (e.g. when opening a new terminal window) was getting a bit slow on my machine. I take reasonable care to make sure my dotfiles don’t get too crufty, and I keep them all in version control.

The following is a walk through of how I went about debugging the issue.


So, how does one go about profiling what bash is doing when starting a login shell/interactive shell?

My initial thought was to use some kind of system call tracing to see what files were being opened/executed. dtrace exists on OS X, so let’s try that:

sudo dtruss -ef bash

Sadly, the output isn’t overly useful due to System Integrity Protection. I don’t want to boot into recovery mode, so what are our options?

I regularly add set -o xtrace to my bash scripts … would the same thing work for my .bashrc? I added the line, and executed bash:

+ source /Users/mnewman/.bash_profile
++ export PATH=/Users/mnewman/bin:/Users/mnewman/perl5/bin:/Users/mnewman/bin:/usr/local/bin:/usr/bin:/bin:/usr/sbin:/sbin:/Users/mnewman/.rvm/bin
++ PATH=/Users/mnewman/bin:/Users/mnewman/perl5/bin:/Users/mnewman/bin:/usr/local/bin:/usr/bin:/bin:/usr/sbin:/sbin:/Users/mnewman/.rvm/bin
++ for file in ~/.{path,bash_prompt,exports,aliases,functions,extra}
++ '[' -r /Users/mnewman/.path ']'
++ for file in ~/.{path,bash_prompt,exports,aliases,functions,extra}
++ '[' -r /Users/mnewman/.bash_prompt ']'
...

It looks like that works (the above is showing the start of my .bash_profile, which is sourced from .bashrc). There is a lot of output there though, and we still don’t have any timing information. A little searching for variants of bash add timestamp to each line led me to an SO answer recommending ts. Looking at the manual page for ts:

$ man ts

NAME
       ts - timestamp input

SYNOPSIS
       ts [-r] [-i | -s] [format]

DESCRIPTION
       ts adds a timestamp to the beginning of each line of input.

       The optional format parameter controls how the timestamp is formatted, as used by strftime(3). The default format is "%b %d %H:%M:%S". In addition to the regular strftime
       conversion specifications, "%.S" and "%.s" are like "%S" and "%s", but provide subsecond resolution (ie, "30.00001" and "1301682593.00001").

       If the -r switch is passed, it instead converts existing timestamps in the input to relative times, such as "15m5s ago". Many common timestamp formats are supported. Note that
       the Time::Duration and Date::Parse perl modules are required for this mode to work. Currently, converting localized dates is not supported.

       If both -r and a format is passed, the existing timestamps are converted to the specified format.

       If the -i or -s switch is passed, ts timestamps incrementally instead. In case of -i, every timestamp will be the time elapsed since the last timestamp. In case of -s, the time
       elapsed since start of the program is used.  The default format changes to "%H:%M:%S", and "%.S" and "%.s" can be used as well.

So far so good, it looks like we could use ts -i and get the duration of every command! I’d like to try this out, but how can we redirect the xtrace output to ts?

Some further Googling led me to this SO answer, which suggests using the BASH_XTRACEFD variable to tell bash where to write its xtrace output. After some trial and error, I added a few lines to my .bashrc:

# open file descriptor 5 such that anything written to /dev/fd/5
# is piped through ts and then to /tmp/timestamps
exec 5> >(ts -i "%.s" >> /tmp/timestamps)

# https://www.gnu.org/software/bash/manual/html_node/Bash-Variables.html
export BASH_XTRACEFD="5"

# Enable tracing
set -x

# Source my .bash_profile script, as usual
[ -n "$PS1" ] && source ~/.bash_profile;

Upon restarting bash, this produces (a lot of) output in /tmp/timestamps, and each line contains an incremental timestamp, like so:

0.000046 ++ which brew
0.003437 +++ brew --prefix
0.025518 ++ '[' -f /usr/local/share/bash-completion/bash_completion ']'
0.000741 +++ brew --prefix

These particular lines tell me that a brew --prefix command executed and took 20ms.

With output like the above, I had enough info to track down a couple of slow loading scripts (like sourcing nvm.sh) and remove them from my .bashrc/.bash_profile.

Denver and the Rocky Mountains

Aspen trees in the Rocky Mountains

Last week I went to ExploreDDD (which I would highly recommend) in Denver on Thursday and Friday, joined a tour to the Rocky Mountains on Saturday, and explored Denver on Sunday.

Content of the conference aside, even the view from the venue was pretty amazing:

View of Denver buildings with Rocky Mountains in the background

Sadly the weather was pretty poor on Saturday for the Roucky Mountains visit, and there wasn’t lots to see out of the lookout 12,000 ft (~3600m) up:

Photo of Matt trying to see a view, but all there is outside is fog

Denver is a cool city, figuratively and literally. It’s quite high, almost exactly one mile above sea level. The Colorado capital building is in Denver; the stairs out the front have a plaque to show which one is exactly one mile above sea level, but measurements kept getting more and more accurate, so there are actually three plaques!

A few random facts from the visit:

  • The trees with yellow leaves in the picture above are aspens, and they actually grow in clumps all connected via a common root system
  • Denver convention center is the 13th largest building in the world
  • The Broncos (the Denver NFL team) have sold out every home game for the last 23 years
  • If attacked by a black bear, fight back
  • Up high, where the trees can only grow for a few months a year, a tree thats 2-3 inches thick might be 80 years old
  • A building that’s part of the Denver art gallery contains no right angles
  • The eagle atop the Denver local government building was made by someone who hated the mayor who planned the construction of the whole building. Apparently you only see eagles with their wings in that position when they’re a) about to take off, or b) defacating
  • In Denver, 1% of every public construction project must be set aside for a public art installation nearby the building
  • The Denver mint makes coins for China
  • This art piece was meant to be for a primary school and was trying to make a statement about the perspective of children (the horse atop the chair is actually a full size horse). However, the school didn’t like it and it ended up at the library. Another installation at the convention center pokes a bit of fun at this piece.
  • ‘Iseewhatyoumean’ is the name of a huge blue bear who’s always trying too see what’s happening in the Denver convention center, and is consistently voted the favorite public are installation

Despite the weather it was a fun weekend, I’m looking forward to going to ExploreDDD again in 2018.

Migrating a GitHub Pages blog with a custom domain to HTTPS

At the time of writing, this blog is hosted on GitHub Pages, which has been working well since I set it up a few years back.

The only thing that has bugged me for a while now is that the whole site was served over HTTP, rather than HTTPS.

I wanted to move to move this blog to HTTPS, but with some constraints:

  • Continue using GitHub pages (it’s free and easy, I don’t want to manage a server)
  • No certificate renewal (smart me plans for stupid me, who would surely forget to renew a cert)
  • Continue using my domain (mdjnewman.me)
  • No cost

GitHub pages doesn’t support HTTPS for custom domain names, as it uses a certificate with a wildcard SAN of *.github.io.

CloudFlare offers HTTPS on a free plan, which Troy Hunt has written about before.

It looks like this will meet my constraints above - I get to keep using GitHub Pages, I don’t have to manage a cert (CloudFlare takes care of this), and I can keep using my custom domain.

The steps I followed to do this were relatively simple:

  1. Exported a zone file from current nameservers
  2. Completed the CloudFlare onboarding, during which I imported the above zone file
  3. Updated the authoritative DNS servers for my domain to the *.ns.cloudflare.com name servers: Update name servers
  4. Tested the site out, fixed a CSS link that was loaded over HTTP
  5. Forced HTTPS in CloudFlare: Enforcing HTTPS with CloudFlare

… and that was it. I finished this in part of an afternoon.

Shortcomings

There is one major shortcoming with this setup: there is no certificate validation between CloudFlare and GitHub (CloudFlare supports fetching from an origin without validating certificates, which is the option I’ve chosen - ‘strict’ HTTPS can be enabled for most use cases).

As we mentioned before, the GitHub cert is valid for *.github.io, and we’re using my custom domain, which is mdjnewman.me.

If we switched off the custom domain on GitHub, and did some smarts in CloudFront to rewrite requests so that the request to GitHub was using mdjnewman.github.io, then we’d get HTTPS all the way to GitHub servers.

CloudFlare does support rewriting HTTP Host headers , but it’s an enterprise feature.

I could switch to using CloudFront with an AWS Certificate Manager cert, which would meet all the above constraints except for ‘no cost’ (admittedly, my tiny blog doesn’t get much traffic, so the cost would be minimal).

Given that most of the shenanigans with injecting content into web sites happens at the last leg of a connection (I’m looking at you, dodgy internet cafe), I’m happy that the new setup for this blog mitigates that problem and am willing to accept the cost/security trade-off. While it’s possible for someone to perform a man in the middle attack and impersonate GitHub, given my site has no sensitive information I’m not too worried about this threat model (Troy Hunt also wrote about this trade-off).

When should I force push after rebasing?

tl;dr: assuming it’s your own branch, immediately.


On the last few projects I have been a part of, I have been the defacto Git problem solver.

XKCD Git Comic
(obligatory xkcd)

On more than one occasion, I was presented with something like the following:

$ git l master
* 232e985 (origin/master, origin/HEAD, master) Feature 4
* 94989d0 Feature 3
* 296d0b3 Feature 2
* 4b856cd Feature 1

$ git l feature-5
* 5c31f6d (HEAD -> feature-5) Feature 4
* 420f83b Feature 3
* 724236a (origin/feature-5) Feature 5
* 296d0b3 Feature 2
* 4b856cd Feature 1

In the above, the commits adding features 3 and 4 on each branch are logically the same, but on the feature-5 branch they’ve somehow ended up after the commit adding feature 5.

The team was using feature branches, and the author of feature 5 above was trying to rebase their changes onto master, but somehow ended up inserting their commit between commits on master. At this stage, you’re in a pretty bad place, as you’ve diverged from master.

How did this happen? I’m guessing it was the following series of events:

  • Developer branches from master to create feature-5
  • Features 3 & 4 are pushed to master
  • Feature 5 is committed to the feature-5 branch
  • Developer runs git pull --rebase origin master or similar
  • … some time passes …
  • Developer runs git pull --rebase without really thinking about it

After the first few steps above, we have something like the following:

Commits after creating feature branch

So far, so good. We want to rebase our changes onto master, so that we can test and push our code. After git pull --rebase origin master:

Commits after rebasing onto master

Still looking good. At this stage, we could git push --force origin feature-5 and all would be well in the world.

But what happens if we go for a tea and forget what we were doing (or we use an overzealous git GUI tool), and we try to rebase onto origin/feature-5?

Commits after rebasing from our remote branch - shows duplicate commits

What we see above is the result of Git rebasing commits 94989d0, 232e985 and 44e1c44 on top of origin/feature-5. As the commit ID is computed by hashing of the contents of a commit and its parent, the same logical commits from master now exist on our branch with different IDs.

This could have been avoided if we followed this rule of thumb:

If you’re working on your own branch, always push immediately after rebasing.

Some people on the team were seeing a message like the following:

Your branch and 'origin/feature-5' have diverged,
and have 3 and 1 different commits each, respectively.
  (use "git pull" to merge the remote branch into yours)

and assuming that they should git pull --rebase, which in this case is exactly what you don’t want.

If you’re looking for further reading Getting Started - Git Basics covers the fundamentals and LearnGitBranching is a great interactive tool.

Diagrams were created using Gitgraph.js, and this script creates an example repository.

Monad transformers

I’ve been spending some time working through Haskell Programming From First Principles recently which is a very comprehensive introduction to functional programming and Haskell.

One of the exercises to implement a simplified command line version of Morra, which involves keeping track of scores and reading user input.

My initial method to play a round looked something like this:

playRound :: Config -> Scores -> IO (Bool, Scores)
  playRound config scores = do
  putStr "P1: "
  p1HandMaybe <- getHumanHand
  case p1HandMaybe of
    Nothing -> return (True, scores)               -- error case 1
    Just x -> do
      putStr "P2: "
      p2HandMaybe <- getHumanHand
      case p2HandMaybe of
        Nothing -> return (True, scores)           -- error case 2
        Just y -> do
          let evenWins = (x + y) `mod` 2 == 0
          return (False, updateScores config evenWins scores)

playRound takes the configuration for the game and the current score, and returns a side effecting computation that will return a tuple with the new scores and a boolean indicating if the game is finished.

The method getHumanHand used above returns a IO (Maybe Int), which can be interpreted as a side effecting action that might return an integer (in this case, the side effect is reading from the console and we can’t trust the user to enter an integer, hence the Maybe).

The problem then is that we’re then manually unpacking these Maybe Int values, which leads to the ugly nesting and case statements. However, we can see on the lines marked ‘error case’ above that the handling for both cases is the same - we assume that if the user has entered something other than an Int that they want to end the game.

I recently learned about Monad transformers, which allow you to compose monads. In this case, we want to compose the Maybe monad with the IO monad, so we will use MaybeT.

Rewriting getHumanHand to return a MaybeT IO Int and rewriting playRound results in the following:

playRound' :: Config -> Scores -> IO (Bool, Scores)
playRound' config scores = do

  newScores <- runMaybeT $ do
    liftIO $ putStr "P1: "
    p1Hand <- getHumanHand'
    liftIO $ putStr "P2: "
    p2Hand <- getHumanHand'
    let evenWins = (p1Hand + p2Hand) `mod` 2 == 0
    return $ updateScores config evenWins scores

  return $ case newScores of
    Nothing -> (True, scores)
    Just x -> (False, x)

The nice thing about this implementation is that we’ve avoided the need for pattern matching, as the do block above where we’re dealing with the potentially failing computations will immediately short circuit and return a Nothing if either user fails to provide a legitimate value.

This is the first time I’ve actually used a Monad transformer, and it was good to see how it cleans up the implementation of playRound.