Vetted - importing data into an event sourced system

See Vetted - a new project for some background.

A task I’m going to have to tackle sooner or later is importing data from the existing Access database. As I’m going to try my hand at event sourcing, this raises an interesting question:

Given an existing application with data stored in an relational database & no notion of events, how do you go about importing data into an event sourced system?


At a high level, the initial options seem to be:

  1. Try to reverse engineer/map the existing state to events
  2. Have some sort of migration event event in your domain model (e.g. FooMigrated) which acts as a snapshot
  3. Run everything through your new API as commands, and allow the API implementation to take care of creating the relevant events like normal

‘Recreating’ domain events

Option one above would be nice, but seems impractical at best and is more likely impossible. For every domain object (client, patient, payment, invoice, vaccination etc) I’d need to try and reverse engineer the real-world happenings that occurred to transition the object into its current state.

A ‘migration event’ as a snapshot

Originally when a colleague suggested this it conflicted with my understanding of the term ‘snapshot’. To me a ‘snapshot’ has always been about collapsing an event stream into a single event for performance reasons. When using this kind of snapshot, the original stream of events is still available.

The second kind of snapshot (which I didn’t see immediately) is a snapshot which is used as base data. When using a snapshot as base data, the collapsed state of the aggregate at the time the snapshot is all the information you have about the history of the aggregate.

It could also be argued that the migration is a meaningful domain event in its own right, and should be captured explicitly. A CustomerMigratedEvent could result in the creation of a new customer aggregate root in the same way that a CustomerRegisteredEvent does.

Run all existing data through the new API

It should be possible to write a script that reads data from the existing database, creates commands and posts those to the appropriate API. The relevant events would ultimately be created off the back of processing the commands, so all ‘legacy’ data should look exactly the same as anything created going forward.

The outcome is probably close to option one above, but with less manual work.

Next steps

So far I’ve been spending a lot of time on the technical concepts & design of an event sourced system, without doing much on the implementation side.

It’s hard to build a useful conceptual domain model without considering implementation issues, so I think it’s time I stopped debating concepts and wrote some code.

I’m planning to explore a little and gain an understanding of how building and executing commands would differ in practice from the ‘import event’ option above.

Further Reading


Thanks to Roman Safronov, Chris Rowe, Martin Fowler, Mariano Giuffrida, Jim Barritt and Nouman Memon for taking the time to reply and/or chat about event sourcing! Any good ideas are theirs, and errors are mine.

Vetted - choosing an appropriate license

See Vetted - a new project for some background.

I’m hoping to start pushing some code soon. Before I do, it’s a good opportunity to do some reading into a topic that I am more ignorant of than I should be: software licensing. The following is what I’ve learnt so far.

Disclaimer: I am not a lawyer and this may be totally incorrect, and as such should not be used as the basis for any decision ever.

Once code is released as open source software, most common OSS licenses (GPL, BSD, MIT, Apache etc) do not allow for the revocation of rights granted under the license. This is a good thing. Imagine having to be prepared for any open source library/framework you’re currently using become proprietary software with no warning. Such uncertainty would severely limit the utility of open source software.

It is possible for the current copyright owners to relicense their creations. So, theoretically, any OSS software can be relicensed if all copyright owners agree. The important part is that this relicensing does not revoke the rights assigned under the previous license. So if I’ve released some software as open source software, I can decide a year later to relicense it and create a commercial version, with the following caveats:

  • I still own the copyright for the entire project
    • I need to have been the sole contributor to the project, or to have ensured that contributors have assigned copyright to me for their work
  • The existing rights assigned under the OSS license remain in place
    • If the license permits, anyone can fork the project at this point and develop/use their own version

Given what I’ve learned above, I’m planning to license the project under an OSS license, but I won’t accept any contributions until I’ve got some kind of Contributor License Agreement (CLA) in place. This is a common approach, take a look at the GitHub CLA.

This is likely the first of many posts where it might seem I’m researching a topic and deliberating a little excessively, given I have no working software or even a particularly interesting idea. It’s fairly premature to assume that there are going to be any contributors to this project other than myself. I don’t believe veterinary practice management software is so exciting that I am going to be swamped with contributions. However, as I mentioned earlier, this whole project is mainly a learning opportunity for me.

Further resources:

Vetted - a new project

When I was in high school, I created a fairly basic application for managing a small veterinary practice. It’s written in Microsoft Access and is used by my parents to manage their mobile veterinary business.

I’m toying with rebuilding it as a web application. For its current users, the main benefits of this would be:

  1. client contact details could be made available on a mobile device;
  2. there will be no more (or at least fewer) issues with concurrent modification (merging two access database files that have been modified independently on different computers because Dropbox didn’t sync is no fun); and
  3. it will be accessible anywhere with internet access, so that my dad could do accounts when he is away from home and has some downtime.

For me, it would mainly be a learning opportunity.

Naming things is not my strong suit, so I’m going with ‘Vetted’ for now.

I’m going to try to do the following throughout the project:

  • Apply domain driven design rigorously
  • Apply functional programming principles
  • Document failures and successes
  • Document my design heuristics
  • Develop in the open
  • Deploy continuously to production somewhere
  • Focus on adding the most valuable parts first (e.g. make phone numbers available online) & delivering vertical slices

I’m thinking that the basic architecture for now will be:

  • Single page application
  • Elm frontend
  • Kotlin backend
  • Maybe some kind of event sourced data store, as I’d like to see how badly I can shoot my foot off

While I want to build something functional, I also want to learn about a few techniques/patterns/tools that would be applicable on larger projects, so I might be making some choices which seems strange. I’ll try to call these out as they happen. I need to remember that I am not Google.

Design Heuristics

I attended the excellent Explore DDD conference this year and one of my favourite talks was Cultivating Your Design Heuristics by Rebecca Wirfs-Brock.

As defined in the talk, a heuristic is:

anything that provides a plausible aid (not a guaranteed aid) or direction in defining a solution but is ultimately irrelevant to the final product

Rebecca encourages everyone to consciously document and cultivate heuristics, learn others heuristics & discuss them and ultimately adapt (or wholesale replace) your own heuristics when appropriate.

It’s a great talk, and I’m going to try to document the heuristics that I find myself using while working on this project.

Develop in the open

I’m going to be writing about what I’m building, and the code will be available on GitHub. However, I haven’t yet determined how to license the project. My basic requirement is that I retain copyright and can relicense the project if that’s ever required.

Look out for a post on this in the near future.

Debugging slow bash startup files

Recently I found that opening a new bash session (e.g. when opening a new terminal window) was getting a bit slow on my machine. I take reasonable care to make sure my dotfiles don’t get too crufty, and I keep them all in version control.

The following is a walk through of how I went about debugging the issue.

So, how does one go about profiling what bash is doing when starting a login shell/interactive shell?

My initial thought was to use some kind of system call tracing to see what files were being opened/executed. dtrace exists on OS X, so let’s try that:

sudo dtruss -ef bash

Sadly, the output isn’t overly useful due to System Integrity Protection. I don’t want to boot into recovery mode, so what are our options?

I regularly add set -o xtrace to my bash scripts … would the same thing work for my .bashrc? I added the line, and executed bash:

+ source /Users/mnewman/.bash_profile
++ export PATH=/Users/mnewman/bin:/Users/mnewman/perl5/bin:/Users/mnewman/bin:/usr/local/bin:/usr/bin:/bin:/usr/sbin:/sbin:/Users/mnewman/.rvm/bin
++ PATH=/Users/mnewman/bin:/Users/mnewman/perl5/bin:/Users/mnewman/bin:/usr/local/bin:/usr/bin:/bin:/usr/sbin:/sbin:/Users/mnewman/.rvm/bin
++ for file in ~/.{path,bash_prompt,exports,aliases,functions,extra}
++ '[' -r /Users/mnewman/.path ']'
++ for file in ~/.{path,bash_prompt,exports,aliases,functions,extra}
++ '[' -r /Users/mnewman/.bash_prompt ']'

It looks like that works (the above is showing the start of my .bash_profile, which is sourced from .bashrc). There is a lot of output there though, and we still don’t have any timing information. A little searching for variants of bash add timestamp to each line led me to an SO answer recommending ts. Looking at the manual page for ts:

$ man ts

       ts - timestamp input

       ts [-r] [-i | -s] [format]

       ts adds a timestamp to the beginning of each line of input.

       The optional format parameter controls how the timestamp is formatted, as used by strftime(3). The default format is "%b %d %H:%M:%S". In addition to the regular strftime
       conversion specifications, "%.S" and "%.s" are like "%S" and "%s", but provide subsecond resolution (ie, "30.00001" and "1301682593.00001").

       If the -r switch is passed, it instead converts existing timestamps in the input to relative times, such as "15m5s ago". Many common timestamp formats are supported. Note that
       the Time::Duration and Date::Parse perl modules are required for this mode to work. Currently, converting localized dates is not supported.

       If both -r and a format is passed, the existing timestamps are converted to the specified format.

       If the -i or -s switch is passed, ts timestamps incrementally instead. In case of -i, every timestamp will be the time elapsed since the last timestamp. In case of -s, the time
       elapsed since start of the program is used.  The default format changes to "%H:%M:%S", and "%.S" and "%.s" can be used as well.

So far so good, it looks like we could use ts -i and get the duration of every command! I’d like to try this out, but how can we redirect the xtrace output to ts?

Some further Googling led me to this SO answer, which suggests using the BASH_XTRACEFD variable to tell bash where to write its xtrace output. After some trial and error, I added a few lines to my .bashrc:

# open file descriptor 5 such that anything written to /dev/fd/5
# is piped through ts and then to /tmp/timestamps
exec 5> >(ts -i "%.s" >> /tmp/timestamps)

export BASH_XTRACEFD="5"

# Enable tracing
set -x

# Source my .bash_profile script, as usual
[ -n "$PS1" ] && source ~/.bash_profile;

Upon restarting bash, this produces (a lot of) output in /tmp/timestamps, and each line contains an incremental timestamp, like so:

0.000046 ++ which brew
0.003437 +++ brew --prefix
0.025518 ++ '[' -f /usr/local/share/bash-completion/bash_completion ']'
0.000741 +++ brew --prefix

These particular lines tell me that a brew --prefix command executed and took 20ms.

With output like the above, I had enough info to track down a couple of slow loading scripts (like sourcing and remove them from my .bashrc/.bash_profile.

Migrating a GitHub Pages blog with a custom domain to HTTPS

At the time of writing, this blog is hosted on GitHub Pages, which has been working well since I set it up a few years back.

The only thing that has bugged me for a while now is that the whole site was served over HTTP, rather than HTTPS.

I wanted to move to move this blog to HTTPS, but with some constraints:

  • Continue using GitHub pages (it’s free and easy, I don’t want to manage a server)
  • No certificate renewal (smart me plans for stupid me, who would surely forget to renew a cert)
  • Continue using my domain (
  • No cost

GitHub pages doesn’t support HTTPS for custom domain names, as it uses a certificate with a wildcard SAN of *

CloudFlare offers HTTPS on a free plan, which Troy Hunt has written about before.

It looks like this will meet my constraints above - I get to keep using GitHub Pages, I don’t have to manage a cert (CloudFlare takes care of this), and I can keep using my custom domain.

The steps I followed to do this were relatively simple:

  1. Exported a zone file from current nameservers
  2. Completed the CloudFlare onboarding, during which I imported the above zone file
  3. Updated the authoritative DNS servers for my domain to the * name servers: Update name servers
  4. Tested the site out, fixed a CSS link that was loaded over HTTP
  5. Forced HTTPS in CloudFlare: Enforcing HTTPS with CloudFlare

… and that was it. I finished this in part of an afternoon.


There is one major shortcoming with this setup: there is no certificate validation between CloudFlare and GitHub (CloudFlare supports fetching from an origin without validating certificates, which is the option I’ve chosen - ‘strict’ HTTPS can be enabled for most use cases).

As we mentioned before, the GitHub cert is valid for *, and we’re using my custom domain, which is

If we switched off the custom domain on GitHub, and did some smarts in CloudFront to rewrite requests so that the request to GitHub was using, then we’d get HTTPS all the way to GitHub servers.

CloudFlare does support rewriting HTTP Host headers , but it’s an enterprise feature.

I could switch to using CloudFront with an AWS Certificate Manager cert, which would meet all the above constraints except for ‘no cost’ (admittedly, my tiny blog doesn’t get much traffic, so the cost would be minimal).

Given that most of the shenanigans with injecting content into web sites happens at the last leg of a connection (I’m looking at you, dodgy internet cafe), I’m happy that the new setup for this blog mitigates that problem and am willing to accept the cost/security trade-off. While it’s possible for someone to perform a man in the middle attack and impersonate GitHub, given my site has no sensitive information I’m not too worried about this threat model (Troy Hunt also wrote about this trade-off).