Like so many others, most open source developers run their own public blog. But there is a huge difference between open source developers and other people: while most “normal” people have to decide what kind of blogging software to use, developers tend to decide what kind of blogging software to implement. The reason for this? A blogging software is actually extremely simple to implement if you just want to have a single, simple workflow for blogging. This makes custom blogging software (the blog engine) way more appealing. And doesn’t it seem like a small and fun project?
Let’s build on what we’ve learned in the recent chapters about modular applications and Rack to build our own little blog engine.
As programmers, we can get quite passionate about what text editor or IDE to use for writing code. Why should we be less opinionated when it comes to writing blog posts? In-browser editors have become quite impressive as of late. But embedding those in a blog engine just for getting syntax highlighting? Instead, let’s avoid dealing with authorization and the security issues involved with implementing an in-browser editor and having to set it up properly by simply not doing it at all.
What if we could use our favorite editor to write blog posts and use git to keep track of versioning? This would, among other things, give us all the features Git comes with: tracking changes made to the posts, creating different branches for working on not-yet-released posts, even easing collaboration for blogs written by more than one author.
For simplicity, let’s store articles in the same repository we store our blog engine in. It should be fairly easy for you to separate it into two repositories, in case you favor that approach. To keep things organized, we’ll store those articles in the articles folder and the Ruby code in lib.
We’ll use the popular Markdown markup language for formatting the blog posts. It will also allow you to embed arbitrary HTML in your articles, which comes in handy if you want to embed media like a YouTube movie.
Since we will rely on Sinatra for translating our Markdown articles into HTML, you can easily choose any other template language supported by Sinatra, like Textile. You can learn more about Markdown at http://daringfireball.net/projects/markdown/.
Like most other text-based blog engines, let’s store the posts metadata in YAML format at the beginning of the article. We should probably stick to the minimum of data for this exercise, but what we do need is the date the article was published and the title. A simple blog post is shown in Example 5-1.
If you feel like experimenting even further, try removing all metadata from the files and parse title and date from the generated HTML and git logs respectively once you have your blog running.
We still want the workflow to be as simple as possible. In an
ideal scenario we could simply git
push
to our blog in order to update it, similar to how you
would deploy on Heroku. And since it would be cool to avoid a vendor
lock-in, let’s try to implement that feature on our own, without having
to host our application on Heroku.
If you are new to Git, you can learn more about it at http://git-scm.com/. It is essentially a distributed, patch based Version Control System.
For this tutorial it is important to know that Git commits are explicitly pushed to a remote repository and are identified by a commit hash.
Ideally, we want to be able to push our updates to GitHub and that’s it. That would also allow us to use the embedded web editor GitHub offers, shown in Figure 5-1, in case we run into a situation where we don’t have Git installed locally or don’t want to clone the repository just for editing a blog post. We will look into how to implement this without writing any code that is GitHub specific, again to avoid a vendor lock-in.
Apart from updating or adding an article, a blog is actually a rather static website. While it seems a bit boring at first, we can easily use that to our advantage. An important goal for most websites out there is to serve as many requests per second as possible in the most efficient way. While it is probably true that developer time is more expensive than server time, it seems rather unreasonable to serve a simple blog from more than one machine. Even serving it from more than one process on that machine seems unreasonable in most cases.
Since articles will not change unless we push an update upstream, there is no reason to read them from disk more often than that. The general idea is to spend as little time as possible in the Ruby process when a request comes in. We will look into how to reduce that time even further and how to keep requests from reaching the Ruby process in the first place by setting the proper HTTP headers.
Reloading the application on changes and actually displaying the articles are two separate concerns not tightly coupled to each other. As software developers we have learned to reflect such separations in the code to keep it clean and flexible. So, why not create two separate Sinatra applications for those tasks? It seems reasonable to serve the posts from a Rack endpoint. Now we could set up the update logic as another endpoint, but for such a simple app it’s easier to create a middleware for it.
First we need to create a modular application for serving the
articles, as shown in Example 5-2. Since we don’t want to
store views and static assets like stylesheets and images inside
lib, we have to make sure that we
set the root
property
properly.
require 'sinatra/base' class Blog < Sinatra::Base # File.expand_path generates an absolute path. # It also takes a path as second argument. The # generated path is treated as being relative # to that path. set :root, File.expand_path('../../', __FILE__) end
Sinatra supports quite a large number of rendering engines. We
already used the erb
method for
rendering ERB templates. Sinatra offers a similar method called
markdown
for - surprise, surprise -
Markdown templates. As we’ve seen before, we can pass symbols to those
methods to render files from the views directory. But you don’t always have
the source stored in a view file. That’s why you can pass a string to
those methods instead and Sinatra will treat that string as the
template source code. See Example 5-3.
Since these rendering methods simply return strings, you can easily embed the result in another template, as seen in Example 5-4.
<h1>Markdown in ERB</h1> <%= markdown("This is *Markdown*!") %>
While ERB ships with Ruby, there is no Markdown implementation
in the standard library. Sinatra will automatically pick up any
implementation you have installed on your system. However, if you
have none, you need to install one: gem
install rdiscount
.
Since we do not care about Git updates in the blog logic, let’s load all the articles when loading the application. See Example 5-5.
require 'sinatra/base' require 'ostruct' require 'time' class Blog < Sinatra::Base set :root, File.expand_path('../../', __FILE__) # loop through all the article files Dir.glob "#{root}/articles/*.md" do |file| # parse meta data and content from file meta, content = File.read(file).split(" ", 2) # generate a metadata object article = OpenStruct.new YAML.load(meta) # convert the date to a time object article.date = Time.parse article.date.to_s # add the content article.content = content # generate a slug for the url article.slug = File.basename(file, '.md') # set up the route get "/#{article.slug}" do erb :post, :locals => { :article => article } end end end
We are using the ostruct library that comes with Ruby. It is a small wrapper around hashes, exposing setters and getters for all the hash entries.
We still need a view for rendering these articles, shown in Example 5-6. We’ll use HTML5 tags, since we like to use all the fancy new technology out there to keep up with recent development.
<article> <header> <h1> <a href="<%= url(article.slug) %>"><%= article.title %></a> </h1> <time class="timeago" datetime="<%= article.date.xmlschema %>"> <%= article.date.strftime "%Y/%m/%d" %> </time> </header> <section class="content"> <%= markdown article.content %> </section> </article>
We also want a home page displaying all the articles. Since we keep the articles in-process, there is no reason not to do the same for the list of articles; Example 5-7 shows the Sinatra wiring necessary for this, and Example 5-8 provides an Erb template for rendering.
require 'sinatra/base' require 'ostruct' require 'time' class Blog < Sinatra::Base set :root, File.expand_path('../../', __FILE__) set :articles, [] Dir.glob "#{root}/articles/*.md" do |file| meta, content = File.read(file).split(" ", 2) article = OpenStruct.new YAML.load(meta) article.date = Time.parse article.date.to_s article.content = content article.slug = File.basename(file, '.md') get "/#{article.slug}" do erb :post, :locals => { :article => article } end # Add article to list of articles articles << article end # Sort articles by date, display new articles first articles.sort_by! { |article| article.date } articles.reverse! get '/' do erb :index end end
The pages we’re generating at the moment are more or less incomplete HTML documents. A simple layout file fixes this. See Example 5-9.
<!DOCTYPE html> <html> <head> <title>My Blog</title> <link rel="stylesheet" media="screen" href="/css/blog.css" /> <script type="text/javascript" src="/js/jquery.min.js"></script> <script type="text/javascript" src="/js/jquery.timeago.js"></script> <script type="text/javascript" src="/js/blog.js"></script> </head> <body> <%= yield %> </body> </html>
As you can see in Example 5-10, we added the timeago JQuery plug-in to automatically format our date strings. You can learn more about that plugin at http://timeago.yarp.com/.
And to have a nicer first impression, let’s add some CSS right away. This will also give you a nice starting point to adding a better layout later on. See Example 5-11 for CSS and Figure 5-2 for a first look at the blog.
body { font-family: "Helvetica Neue", Arial, Helvetica, sans-serif; } article { min-width: 300px; max-width: 700px; margin: 50px auto; padding: 0 50px; } header h1 { margin: 0; } header a { color: #000; text-decoration: none; text-shadow: 1px 1px 2px #555; } header a:hover { text-decoration: underline; } header time { font-size: 80%; color: #555; }
As mentioned before, the goal is to automatically update the blog whenever pushing to the blog repository. Most hosting sites, like GitHub or Bitbucket, offer service hooks: they will trigger a request to a custom URL whenever someone pushes new commits to the repository. Even if you host the repository on your own server, you can easily set up a so-called post-receive hook there. But let’s first look into the implementation before we go into setting everything up.
To regenerate the content, all we have to do is reload our application. We could do that by restarting the process. However, that might be complicated to implement and cause our website to be down for a moment.
Another idea would be to simply load lib/blog.rb again. However, that would append the routes to the list of already defined routes rather than overriding existing routes. That approach works for adding new posts, but would prohibit editing existing posts. Moreover, it would leak memory, since old routes would never be removed.
We need to remove all the routes before loading the file again.
But it doesn’t stop there, we also need to get rid of all the filters,
middleware, error handlers, and so on. We are not using all those
features at the moment, but we don’t want to break our app later on if
we add a middleware or error handler. Luckily Sinatra has a mechanism
for doing exactly that: the reset!
method.
Let’s assume that in the middleware we’re creating, the wrapped
endpoint (stored in app
) is the
Sinatra class we want to wrap. In that case we have reset!
and the file that we want to reload
available. The file is stored in the app_file
setting. Sinatra takes care of
setting it to the correct value. Example 5-12
demonstrates how to do this.
require 'sinatra/base' require 'time' class GithubHook < Sinatra::Base post '/update' do app.settings.reset! load app.settings.app_file content_type :txt "ok" end end
The above middleware will reload our application whenever
/update
is being requested. We can
use that when setting up a hook later on.
When running on the server, we also want to automatically
trigger a git pull
to fetch the
commits we just pushed from our local development machine to our
source code repository and deploy them on our productions server.
However, we probably don’t want to trigger a pull while in
development. That way we can easily trigger a reload while working on
a post without causing trouble with Git trying to pull in changes, as
seen in Example 5-13.
Let’s introduce a setting called :autopull
that specifies whether or not to
trigger a pull on a reload and make that setting dependent on the
current environment.
require 'sinatra/base' require 'time' class GithubHook < Sinatra::Base set(:autopull) { production? } post '/update' do app.settings.reset! load app.settings.app_file content_type :txt if settings.autopull? # Pipe stderr to stdout to make # sure we display everything. `git pull 2>&1` else "ok" end end end
We want our page to render as quickly as possible and at the same time keep the load on our server as low as we can. Fortunately HTTP comes with a handful of headers to aid us here. We covered the basics of HTTP caching in Chapter 2, let’s see how best to utilize them.
First of all, we want to avoid outdated caches at any cost. We
also want to allow public caching, since our blog is public. We’ll
therefore call cache_control :public,
:must_revalidate
. To allow revalidation, we need to set at
least either an ETag
or a Last-Modified
header. Let’s do both.
Since our blog is git-based, we can simply ask Git when the
content has last been modified, and we can use the Commit
Hash as ETag
. And since
we know when new commits are coming in, we only have to ask Git for
the information whenever the update hook is triggered. Example 5-14 demonstrates how to probe Git for update
information.
require 'sinatra/base' require 'time' class GithubHook < Sinatra::Base def self.parse_git # Parse hash and date from the git log command. sha1, date = `git log HEAD~1..HEAD --pretty=format:%h^%ci`.strip.split('^') set :commit_hash, sha1 set :commit_date, Time.parse(date) end set(:autopull) { production? } parse_git before do cache_control :public, :must_revalidate etag settings.commit_hash last_modified settings.commit_date end post '/update' do settings.parse_git app.settings.reset! load app.settings.app_file content_type :txt if settings.autopull? `git pull 2>&1` else "ok" end end end
What we still need to do is actually set up the GithubHook
middleware in our Blog
application. As with all middleware, we
do that with the use
method in Example 5-15.
require 'sinatra/base' require 'github_hook' require 'ostruct' require 'time' class Blog < Sinatra::Base use GithubHook set :root, File.expand_path('../../', __FILE__) set :articles, [] set :app_file, __FILE__ Dir.glob "#{root}/articles/*.md" do |file| meta, content = File.read(file).split(" ", 2) article = OpenStruct.new YAML.load(meta) article.date = Time.parse article.date.to_s article.content = content article.slug = File.basename(file, '.md') get "/#{article.slug}" do erb :post, :locals => { :article => article } end articles << article end articles.sort_by! { |article| article.date } articles.reverse! get '/' do erb :index end end
For deployment we’ll write a config.ru, as described earlier. To show a
use case of the caching headers, let’s add the Rack::Cache
library (as shown in Example 5-16), which implements an HTTP cache as a Rack
middleware. This is, of course, completely optional.
$LOAD_PATH.unshift 'lib' # this is optional require 'rack/cache' use Rack::Cache require 'blog' run Blog
As discussed in Chapter 3, you can now start the
server with the rackup
command.
When deploying on your production system, make sure you set the
environment to production
.
Optionally, you can start rackup
as a daemon, so it will run in the
background: rackup -E production -D -s
thin.
Now that we’re done with implementing our blog, let’s publish it on GitHub. Log in with your GitHub account. On the Dashboard, click on New Repository and follow the instructions.
Once you have your repository set up, go to its
Admin section, navigate to Service
Hooks, and add a Post-Receive URL
pointing to the /update
endpoint of
your blog. See Figure 5-3.
Bitbucket used to be a hosting site for Mercurial only, but it recently added support for Git. In contrast to GitHub, Bitbucket offers an unlimited number of private repositories for free. So, if you don’t want anyone to see the code of your little blog engine, Bitbucket might be an interesting alternative.
After logging in with your account, you can create a new repository by clicking the create repository link from the Repositories pop-up menu. Make sure you choose Git as Repository type. Again, just follow the instructions displayed after creating the repository.
Once you have your repository set up, go to its
Admin section, navigate to
Services, and add a POST URL
pointing to the /update
end-point
of your blog. See Figure 5-4.
If you already are an experienced Git user, you might be tempted
to set up your own repository somewhere. To set up a post-receive hook
with your own repository, navigate to your repository on your server,
create a file called .git/hooks/post-receive (see Example 5-17), and make that file executable, for instance by
running chmod +x
.git/hooks/post-receive
on a Unix system. In this file we
can add the logic for triggering such an update. Interestingly if you
add a magic comment called shebang or
hashbang, you can easily write that hook in
Ruby.
You might want to deploy your application on Heroku. Heroku has
a read-only filesystem, therefore you cannot have the blog
automatically pull changes. Instead you have to push to Heroku
explicitly. But it would be nice if we didn’t have to throw away all
of the GithubHook
middleware. Since
the only real issue is not being able to pull in changes, we can solve
this situation by disabling the autopull
setting we introduced earlier. And
since Heroku comes with an HTTP cache, there is no need for setting up
Rack::Cache
.
Heroku sets a few environment variables to avoid any additional
configuration, therefore the variables URL
and DATABASE_URL
are indicators for running on
Heroku.
This configuration is not really part of the application logic. The best place to store it is probably in the config.ru, which you can see in Example 5-18.
Hopefully you have a running blog by now. With this tutorial we demonstrated how to use some of the tools Sinatra has to offer and how to address real problems that come up when implementing a web application.