This is my architecture - how I built my blog

Aug 2, 2019

I mentioned in the last post that I would talk a little bit about how I built the engine that runs this blog, and there are plenty of cool bits under the hood that I want to talk about. I’ll start with the high level architecture for today’s post and build from there. To start with, here is a visual of the main site’s architecture.

CloudShot architecture

There’s a few moving parts involved here, so I’ll break it down a little. I’ll dive deeper on most of the topics and services I’m using here in future posts, this will be more of a high level introduction to what I’ve done.

Admin website

The admin website is a static HTML/JavaScript application - so I don’t need to have any servers to run this which is great. I store the files for this in an S3 bucket and put a CloudFront distribution in front of it. So I have a full CDN to keep it available and close to my “users” (just me!), and it costs me very little to keep it here. The admin application itself uses a few AWS services. Firstly Amazon Cognito is how I authenticate and make sure that only I can log in to the application. It manages my user credentials for me, I can run a forgotten password flow through there - and eventually I’ll enable some more of Cognito’s functionality like MFA. To store content the admin app talks to Amazon S3 to store the main files like markdown (which gets converted to HTML), images, etc. It also tracks some metadata in a DynamoDB table, which is fronted up by AWS AppSync so I can interact with it as a GraphQL endpoint.

Generating the site

My admin site saves pages and posts as Markdown - which means I need to convert this to HTML to make a functional web page. For this I used Jekyll, which has all sorts of cool plugins and features - features that I wouldn’t need to write myself - so it was a no brainer to use it. What I needed to give some thought to through was how would I incorporate it in to my process. There were two options here - firstly, a lambda function that had the Jekyll runtime included in a layer, or the second option of using CodeBuild to install Jekyll on the fly and then execute it. I ended up taking the CodeBuild option, for a couple of reasons:

  1. I would always get the latest version of Jekyll, which meant any bug fixes or improvements would instantly be in my pipeline here (although that also means I could inherit new bugs, so it’s a double edged sword to an extent)
  2. It was a lot easier to call the AWS CLI to do the “sync” command to S3 for the output HTML. This would mean that only the files that changed would be written to the bucket, including cleaning up any pages or files that need to be deleted. The sync command only exists in the CLI, not the JavaScript SDK which is what I would have written the lambda function in.

To get this working smoothly, I’m using AWS CodePipeline to orchestrate things here. The flow looks like this - when a file in the author bucket changes a lambda function runs that creates a zip file of the entire bucket. CodePipeline looks for this file as it’s trigger (as opposed to CodeCommit or something like that). CodePipeline passes the zip in to CodeBuild which then runs Jekyll and does the copy to the live site bucket.

The live site

Given that I’ve just generated a static HTML site from Jekyll (which includes an RSS feed), it really just needed another S3 bucket/CloudFront distribution to host it up to the public. There is some fun stuff I’ve slipped in here around how I’m managing caching and invalidation when content changes, but again that’s a dive deep topic for another post. The thing that mattered to me here though was ultimately the time it took from me pressing “save” in my admin site, to the new content being available on the internet for people like you to read. Right now it’s running to about 90 seconds or so. That’s not a number I’m unhappy with given the entirely serverless nature of this. Sure, I could have built something serverless that would take dynamic requests and simulate the way a CMS works right now, but then my serverless code would need to run every time it got a request to look at the site (or when the cache was old at minimum). Under this model, I’m just running a static HTML site that just happens to be super easy for me to edit and re-publish. Which simplifies the main site dramatically, and reduces a lot of cost in the operation too.

Where are the weak points in this approach?

Every architecture has trade-offs in it and my approach here is no exception. There are some points in here that I need to firm up, and some that are just a factor of my design that won’t change. For example, if I realise there is a mistake in the main website I’m always a minimum of 90 seconds away from removing it, where as if I did something that more closely resembled a traditional CMS that could be down to nothing. The benefit of my approach though is that the whole thing costs less and has less moving parts to maintain. I also know that my approach of zipping the whole site up to push in to CodePipeline won’t scale well - the bigger the site, the longer that process will take. Also with the code the way it is now, I also know that I’ll only get the first 1000 files from S3 in to the zip as well - now that’s not an issue for me anytime soon, but it’s something to consider. I’m giving some thought to that lambda function just calling CodeBuild directly and then letting CodeBuild just copy the site contents down to its local environment when it runs. That too has some limitations too which will start to show if a really large (1000’s of pages) site comes in to play, because Jekyll will take some time to be able to generate it in full. I know Jekyll has some support to do differential runs, but I haven’t looked in to that yet to figure out what that would look like - again, it’s a future Brian problem and it’s a long way off since I’m not likely to be writing 500 posts that have 1 image on them any time soon, but it’s something that once I get all of my core functionality delivered I’ll be going back to re-visit.

That’s my architecture at a glance. There are some other components that run along side this which I haven’t covered off here, but I’ll introduce them as I write more posts to talk through them in detail. One of the other features that I haven’t implemented here yet is comments, so if you’re got anything you want to share with me for the time being, Twitter is your best bet. I’m @BrianFarnhill there, so let me know what you think!

 

Comments

You're signed in as | Sign out

Submitting comment...

There are no comments on this post yet. Be the first to leave one!