Washington DC

I’m at the airport about to head back to Cinci after a couple of days in the US capital.

It was a good trip, though I almost missed my flight thanks to leaving a bag at security and only realising when boarding had already started. The 1 km run to security and back while wearing boots and wondering if I was going to make it got my heart rate up!

The moment I left the airport in DC a motorcade stopped all the traffic which I guess is a fairly regular occurrence here.

On Friday night I walked about 8 km from my hotel past the Marine Corps memorial, around Arlington national cemetery (you’ve probably seen this in movies ­ rows and rows of white tombstones), across into DC, to the Lincoln Memorial and Washington Monument, north to the White House, past the treasury and to a local pub for a nightcap. It’s crazy how close all these attractions are. While I was outside the White House a guy next to me said to his mate, not too quietly, “I think I see Donald standing at the window in his underwear, crying” at which point I burst out laughing.

The big green area between the Lincoln Memorial and the Washington monument is called the National Mall extends even further east to the US capitol building. Its surrounded by museums, monuments, memorials and other federal buildings.

On Saturday I caught the Metro and started at the Smithsonian National Air and Space Museum which someone from work recommended, it was filled with really cool stuff. There was also a fighter jet simulator there which would roll upside down and everything, it was pretty great! It was a short run but I think I’d have made myself sick if it was too much longer.

The national art gallery was on the other side of national mall, so I went there next. There was lots of … art. I appreciated a few things but I can only handle so many portraits of stuffy old white dudes. The oldest painting I saw was from 1247, and there were loads of pretty old sculptures. There was a painting by da Vinci and more recent names I know like Andy Warhol.

From there it was a short walk to the United States Capitol, which is the most impressive building around. I was able to go inside the rotunda, but the full tours were all booked out.

The Library of Congress is reachable from the Capitol via a tunnel, so I went there next - another super impressive building. Jefferson was apparently a big reader and his collection of books was there, they weren’t all dated but some were super old.

Another Metro ride home and a run closed out Saturday. Managed 9km at just over 12 km/h (with a few stops for stretching and photos) despite the temperature sitting at around 30°C. Theodore Roosevelt Island was really nice.

New York City

I spent the last weekend in New York City, doing the tourist thing. I went with very few plans and met up with another guy from work who has spent a year in NYC. Despite a several hour delay and change of airline & airport on the flight there, it was a successful trip!

Saturday played out as a busy day! In order:

  • Run in Central Park
  • Grand Central Station
  • The High Line - a park that’s been built on decommissioned overhead train tacks
  • Oculus - a $4 billion transit station
  • 9/11 Memorial - pretty sobering to think there were two massive buildings full of people where the monuments now stand
  • One World Observatory, 102 floors up
  • Stone Street - a street in the historic district with lots of old bars, it was absolutely packed though because of Cinco de Mayo
  • Times Square
  • Dinner at Ipanema
  • A Housewarming in Brooklyn, where we ended the day

Sunday was a quieter day. We spent most of our time in Dumbo (Down Under the Manhattan Bridge Overpass), an area in Brooklyn with pretty great views of the city. We went to Grimaldi’s pizza, which is one of the old, amazing pizza places in NYC. A walk across the Brooklyn Bridge closed out the weekend:

Miami, Florida Keys & Shark Valley in the Everglades

I spent last weekend in Miami! South Beach/South Pointe Pier was the first stop, and the weather played along nicely.

The nearby Art Deco Historic District had some classic Miami colours:

I spent Friday evening wandering around Wynwood. The urban art there is amazing, the below is just a tiny sample.

On Saturday I drove the Overseas Highway from my hotel to Key West.

The drive down was beautiful, though it’s a little bizarre being on some of the really long bridges.

I had to take the obligatory selfie at the Southernmost Point of the Continental US in Key West.

There was loads more to do in Key West, but I didn’t have loads of time given I was doing the drive back to Miami the same day - I definitely have to visit again!

On Sunday I went to Shark Valley in the Everglades where I hired a bike and rode a 24 km loop. There were so many alligators!

The tower at the half way point of the loop provided a good view, and was my last stop for the day before heading to MIA and back to CVG.

Dallas, Trinity River/Bishop Arts/Downtown

Yesterday I went for a walk through some of Dallas that I haven’t explored yet, despite technically living here for about 9 months (I say technically, as a lot of that time was spent travelling for work). I made a rough map of my walk using On the Go Map.

Following are a few notable points on the trip.

Trinity River and the Margaret Hunt Hill Bridge:

Landscape shot of Trinity River
Helix-inspired arches of the Margaret Hunt Hill Bridge

Lockhart Smokehouse - a reasonably famous BBQ place, where the ‘Texan Vegetarian’ section comprises of chicken/turkey, instead of beef 🤣:

Lockhart smokehouse menu

The delicious taster flight at Bishop Cider Co:

Cider menu with 4 ciders

A 30-ft eyeball in downtown Dallas, which is quite hard to miss:

Large plastic eyeball in a park

And finally, this sign, which I feel captures so much of what it is like walking in a US city other than New York:

Sign in the middle of a huge puddle saying 'pavement ends' despite there being no discernable pavement

I never did spot the pavement in that area, but at least I know not to expect one to appear now that it’s ‘ended’.

The lack of people walking here continues to blow my mind. I walked an hour from the Bishop Arts District to downtown, and didn’t walk past a single person. A group of young guys went past in a car and one of them yelled “ain’t y’all heard of Uber?”, further affirming my belief that everyone thought I was crazy for walking. I guess it’s not that surprising given a couple of times the pavement I was walking on came to a major road and just stopped, leaving me to backtrack and find somewhere else to go.

Vetted - tweaking importer performance

Since the last post, I’ve started working on an importer to load data from the existing Access database. Work to date is on GitHub.

In the current domain model, there is a single aggregate root, the Client. The importer is written as a command line application which interacts directly with the domain, assuming an empty database (I might get to incremental imports in the future). At a high level, the importer currently:

  • Creates the clients
  • Adds any existing ‘notes’ about the client
    • Notes are freeform text about a client, unrelated to any particular patient or transaction
    • The existing application is a little limited in what can be entered into the main form, so notes have been used to make up the slack (e.g. in the existing data, there are numerous clients which have an email address or fax number in the notes field, as there is no first class input for these values)
  • Adds home and mobile phone numbers
  • Adds the ‘most common travel distance’ as a note

These steps are visible in the implementation of the importer:

override fun run(vararg args: String?) {
    val rows = accessDb.clientTableRows

    val newClientIds: Map<String, UUID> = createClients(
        rows,
        accessDb::postCodeFor,
        accessDb::stateFor,
        commandGateway
    )

    allOf(
        addClientNotes(rows, newClientIds, commandGateway),
        addPhoneNumbers(rows, newClientIds, commandGateway),
        addMostCommonDistance(rows, newClientIds, commandGateway)
    ).get()
}

First, clients are created, producing a Map of the old client ID to the new client ID. Once all clients have been created, all the other updates are applied (potentially concurrently).

Without giving away too much information about the existing data, the order of magnitude of the existing number of clients is 3, and the total number of events generated with the current importer implementation is at most 5x the number of clients (one ClientMigratedEvent, up to two ClientNoteAddedEvents and up to two ClientPhoneNumberAddedEvents).

My first pass at the importer was taking around 80 seconds to import everything into a PostgreSQL database. I know that premature optimization is the root of all evil, and that I don’t have anything resembling a working product at the moment, but this seemed far too high. Also, it was impacting my ability to iterate quickly with ‘production’ data, which is enough of a reason to look for improvements.

After looking at the generated schema and doing some sampling with VisualVM, I decided there were three options to investigate:

  • Asynchronous processing of commands
  • Serialisation format changes
  • Generated schema changes

In order to compare a full run of the importer pre and post optimisations, I want to be able to toggle the optimisations on/off from the command line. The following script has the toggle properties in place, and in the sections below I will use Spring config management to read these properties.

PASSWORD=$(uuidgen)

docker stop vetted-postgres ; docker rm vetted-postgres

docker run \
    --publish 5432:5432/tcp \
    --name vetted-postgres \
    --env POSTGRES_PASSWORD=$PASSWORD \
    --detach \
    postgres

./gradlew build

java \
    -jar importer/build/libs/vetted-importer-0.0.1-SNAPSHOT.jar \
    --axon.use-async-command-bus=false \
    --axon.use-cbor-serializer=false \
    --spring.jpa.database-platform=org.hibernate.dialect.PostgreSQL95Dialect \
    --spring.datasource.password=$PASSWORD

The script above will allow me to evaluate the impact of any changes I make in a repeatable fashion.

Option 1 - Asynchronous processing of commands

I’m using the Axon framework, which handles a lot of the plumbing of building an application based on DDD & CQRS principles. By default when using the Spring auto-configuration, a SimpleCommandBus is used which processes commands on the calling thread.

I added some configuration to use a AsynchronousCommandBus with a configurable number of threads:

@Bean
@ConditionalOnProperty(
    value = ["axon.use-async-command-bus"],
    matchIfMissing = true
)
fun bus(
    transactionManager: TransactionManager,
    @Value("\${axon.command-bus.executor.pool-size}") poolSize: Int
): CommandBus {
    val bus = AsynchronousCommandBus(
        Executors.newFixedThreadPool(poolSize)
    )

    val tmi = TransactionManagingInterceptor(transactionManager)
    bus.registerHandlerInterceptor(tmi)

    return bus
}

I initially tried this configuration out with a pool size of 10. This reduced the time for the import to around 30 seconds, which is an improvement from 80 seconds but short of an order of magnitude improvement which should be possible. This led me to believe that there was either contention somewhere else, or that some of the constant factors are just too high at the moment.

Option 2 - Serialisation format changes

By default, Axon will use XStream to serialise events, which uses an XML representation. XML is quite verbose, and the Axon documentation even suggests using a different serializer.

Overriding the serializer is thankfully quite easy:

@Primary
@Bean
@ConditionalOnProperty(
    value = ["axon.use-cbor-serializer"],
    matchIfMissing = true
)
fun serializer(): Serializer {
    val objectMapper = ObjectMapper(CBORFactory())
    objectMapper.findAndRegisterModules()
    objectMapper.setSerializationInclusion(NON_ABSENT)
    return JacksonSerializer(objectMapper)
}

I opted for using Jackson with a ‘Concise Binary Object Representation’ (CBOR) JsonFactory. This resulted in a ~70% reduction in size for the serialized payload for most events. With XML:

postgres=# select avg(length(loread(lo_open(payload::int, x'40000'::int), x'40000'::int))) from domain_event_entry;
     avg
--------------
 433.69003053

and with CBOR:

     avg
--------------
 111.54379774

This didn’t have a huge impact on the run time of the importer, but is still a worthwhile optimisation.

Option 3 - Generated schema changes

You may have noticed in the SQL statments above that the current schema is using the PostgreSQL large objects functionality. From the PostgreSQL docs:

PostgreSQL has a large object facility, which provides stream-style access to user data that is stored in a special large-object structure. Streaming access is useful when working with data values that are too large to manipulate conveniently as a whole.

If we inspect the schema that’s being generated:

postgres=# \d domain_event_entry
     Table "public.domain_event_entry"
      Column      | Type | Nullable | Default
------------------+------+----------+---------
 meta_data        | oid  |          |
 payload          | oid  | not null |
 ...

The oid type here is an object identifier - a reference to a large object which is stored externally from the table. The events we’re writing are small enough that the overhead of reading them as separate streams is hurting performance rather than helping.

At least two people have had the same issue when using Axon with PostgreSQL, as evidenced by the questions on Google Groups and StackOverflow. The suggestion to customise the PostgreSQL dialect used by Hibernate seems to work, and further reduced the runtime to around 8 seconds.

Conclusion

Based on my very rough benchmarking, the three changes above have reduced the run time of the importer from around 80 seconds to 8 seconds. The code is all at the link above, and the optimisations are on by default.

There is surely more that can be done to improve performance, but that’s fast enough for now!