Chapter 5. Hands On: Your Own Blog Engine

Like so many others, most open source developers run their own public blog. But there is a huge difference between open source developers and other people: while most “normal” people have to decide what kind of blogging software to use, developers tend to decide what kind of blogging software to implement. The reason for this? A blogging software is actually extremely simple to implement if you just want to have a single, simple workflow for blogging. This makes custom blogging software (the blog engine) way more appealing. And doesn’t it seem like a small and fun project?

Let’s build on what we’ve learned in the recent chapters about modular applications and Rack to build our own little blog engine.

Workflow Concept

As programmers, we can get quite passionate about what text editor or IDE to use for writing code. Why should we be less opinionated when it comes to writing blog posts? In-browser editors have become quite impressive as of late. But embedding those in a blog engine just for getting syntax highlighting? Instead, let’s avoid dealing with authorization and the security issues involved with implementing an in-browser editor and having to set it up properly by simply not doing it at all.

What if we could use our favorite editor to write blog posts and use git to keep track of versioning? This would, among other things, give us all the features Git comes with: tracking changes made to the posts, creating different branches for working on not-yet-released posts, even easing collaboration for blogs written by more than one author.

File-Based Posts

For simplicity, let’s store articles in the same repository we store our blog engine in. It should be fairly easy for you to separate it into two repositories, in case you favor that approach. To keep things organized, we’ll store those articles in the articles folder and the Ruby code in lib.

We’ll use the popular Markdown markup language for formatting the blog posts. It will also allow you to embed arbitrary HTML in your articles, which comes in handy if you want to embed media like a YouTube movie.

Note

Since we will rely on Sinatra for translating our Markdown articles into HTML, you can easily choose any other template language supported by Sinatra, like Textile. You can learn more about Markdown at http://daringfireball.net/projects/markdown/.

Like most other text-based blog engines, let’s store the posts metadata in YAML format at the beginning of the article. We should probably stick to the minimum of data for this exercise, but what we do need is the date the article was published and the title. A simple blog post is shown in Example 5-1.

Note

If you feel like experimenting even further, try removing all metadata from the files and parse title and date from the generated HTML and git logs respectively once you have your blog running.

Example 5-1. A typical blog post: articles/updated.md
title: Updated
date: 2011-09-25

Hello friends! Sorry, I haven't blogged in quite a while. I was busy reading
[a book](http://oreilly.com/catalog/0636920019664/) to improve my Sinatra
skills. I will blog more from now on, I promise.

Git for Distribution

We still want the workflow to be as simple as possible. In an ideal scenario we could simply git push to our blog in order to update it, similar to how you would deploy on Heroku. And since it would be cool to avoid a vendor lock-in, let’s try to implement that feature on our own, without having to host our application on Heroku.

Note

If you are new to Git, you can learn more about it at http://git-scm.com/. It is essentially a distributed, patch based Version Control System.

For this tutorial it is important to know that Git commits are explicitly pushed to a remote repository and are identified by a commit hash.

Ideally, we want to be able to push our updates to GitHub and that’s it. That would also allow us to use the embedded web editor GitHub offers, shown in Figure 5-1, in case we run into a situation where we don’t have Git installed locally or don’t want to clone the repository just for editing a blog post. We will look into how to implement this without writing any code that is GitHub specific, again to avoid a vendor lock-in.

Note

We use GitHub as an example here, as it is rather popular among Ruby developers and you will find almost all Ruby related projects there. You could of course use any alternative platform. Popular ones include Bitbucket and Gitorious.

Being able to edit posts on GitHub
Figure 5-1. Being able to edit posts on GitHub

Semistatic Pages

Apart from updating or adding an article, a blog is actually a rather static website. While it seems a bit boring at first, we can easily use that to our advantage. An important goal for most websites out there is to serve as many requests per second as possible in the most efficient way. While it is probably true that developer time is more expensive than server time, it seems rather unreasonable to serve a simple blog from more than one machine. Even serving it from more than one process on that machine seems unreasonable in most cases.

Since articles will not change unless we push an update upstream, there is no reason to read them from disk more often than that. The general idea is to spend as little time as possible in the Ruby process when a request comes in. We will look into how to reduce that time even further and how to keep requests from reaching the Ruby process in the first place by setting the proper HTTP headers.

The Implementation

Reloading the application on changes and actually displaying the articles are two separate concerns not tightly coupled to each other. As software developers we have learned to reflect such separations in the code to keep it clean and flexible. So, why not create two separate Sinatra applications for those tasks? It seems reasonable to serve the posts from a Rack endpoint. Now we could set up the update logic as another endpoint, but for such a simple app it’s easier to create a middleware for it.

Displaying Blog Posts

First we need to create a modular application for serving the articles, as shown in Example 5-2. Since we don’t want to store views and static assets like stylesheets and images inside lib, we have to make sure that we set the root property properly.

Example 5-2. Setting up a modular application (lib/blog.rb)
require 'sinatra/base'

class Blog < Sinatra::Base
  # File.expand_path generates an absolute path.
  # It also takes a path as second argument. The
  # generated path is treated as being relative
  # to that path.
  set :root, File.expand_path('../../', __FILE__)
end

Rendering Markdown

Sinatra supports quite a large number of rendering engines. We already used the erb method for rendering ERB templates. Sinatra offers a similar method called markdown for - surprise, surprise - Markdown templates. As we’ve seen before, we can pass symbols to those methods to render files from the views directory. But you don’t always have the source stored in a view file. That’s why you can pass a string to those methods instead and Sinatra will treat that string as the template source code. See Example 5-3.

Example 5-3. Rendering Markdown from a string
get('/') { markdown "# A Headline" }

Since these rendering methods simply return strings, you can easily embed the result in another template, as seen in Example 5-4.

Example 5-4. Embedding Markdown in ERB
<h1>Markdown in ERB</h1>
<%= markdown("This is *Markdown*!") %>

Warning

While ERB ships with Ruby, there is no Markdown implementation in the standard library. Sinatra will automatically pick up any implementation you have installed on your system. However, if you have none, you need to install one: gem install rdiscount.

Generating articles

Since we do not care about Git updates in the blog logic, let’s load all the articles when loading the application. See Example 5-5.

Example 5-5. Loading articles (lib/blog.rb)
require 'sinatra/base'
require 'ostruct'
require 'time'

class Blog < Sinatra::Base
  set :root, File.expand_path('../../', __FILE__)

  # loop through all the article files
  Dir.glob "#{root}/articles/*.md" do |file|
    # parse meta data and content from file
    meta, content   = File.read(file).split("

", 2)
    
    # generate a metadata object
    article         = OpenStruct.new YAML.load(meta)

    # convert the date to a time object
    article.date    = Time.parse article.date.to_s

    # add the content
    article.content = content

    # generate a slug for the url
    article.slug    = File.basename(file, '.md')

    # set up the route
    get "/#{article.slug}" do
      erb :post, :locals => { :article => article }
    end
  end
end

Note

We are using the ostruct library that comes with Ruby. It is a small wrapper around hashes, exposing setters and getters for all the hash entries.

We still need a view for rendering these articles, shown in Example 5-6. We’ll use HTML5 tags, since we like to use all the fancy new technology out there to keep up with recent development.

Example 5-6. views/post.erb
<article>
  <header>
    <h1>
      <a href="<%= url(article.slug) %>"><%= article.title %></a>
    </h1>
    <time class="timeago" datetime="<%= article.date.xmlschema %>">
      <%= article.date.strftime "%Y/%m/%d" %>
    </time>
  </header>
  <section class="content">
    <%= markdown article.content %>
  </section>
</article>

Adding an index

We also want a home page displaying all the articles. Since we keep the articles in-process, there is no reason not to do the same for the list of articles; Example 5-7 shows the Sinatra wiring necessary for this, and Example 5-8 provides an Erb template for rendering.

Example 5-7. Loading articles (lib/blog.rb)
require 'sinatra/base'
require 'ostruct'
require 'time'

class Blog < Sinatra::Base
  set :root, File.expand_path('../../', __FILE__)
  set :articles, []

  Dir.glob "#{root}/articles/*.md" do |file|
    meta, content   = File.read(file).split("

", 2)
    article         = OpenStruct.new YAML.load(meta)
    article.date    = Time.parse article.date.to_s
    article.content = content
    article.slug    = File.basename(file, '.md')

    get "/#{article.slug}" do
      erb :post, :locals => { :article => article }
    end

    # Add article to list of articles
    articles << article
  end

  # Sort articles by date, display new articles first
  articles.sort_by! { |article| article.date }
  articles.reverse!

  get '/' do
    erb :index
  end
end
Example 5-8. views/index.erb
<% settings.articles.each do |article| %>
  <%= erb :post, :locals => { :article => article } %>
<% end %>

Adding a basic layout

The pages we’re generating at the moment are more or less incomplete HTML documents. A simple layout file fixes this. See Example 5-9.

Example 5-9. views/layout.erb
<!DOCTYPE html>
<html>
  <head>
    <title>My Blog</title>
    <link rel="stylesheet" media="screen" href="/css/blog.css" />
    <script type="text/javascript" src="/js/jquery.min.js"></script>
    <script type="text/javascript" src="/js/jquery.timeago.js"></script>
    <script type="text/javascript" src="/js/blog.js"></script>
  </head>
  <body>
    <%= yield %>
  </body>
</html>

As you can see in Example 5-10, we added the timeago JQuery plug-in to automatically format our date strings. You can learn more about that plugin at http://timeago.yarp.com/.

Example 5-10. public/js/blog.js
$(document).ready(function() {
  $("time.timeago").timeago();
});

And to have a nicer first impression, let’s add some CSS right away. This will also give you a nice starting point to adding a better layout later on. See Example 5-11 for CSS and Figure 5-2 for a first look at the blog.

Example 5-11. public/css/blog.css
body {
  font-family: "Helvetica Neue", Arial, Helvetica, sans-serif;
}

article {
  min-width: 300px;
  max-width: 700px;
  margin: 50px auto;
  padding: 0 50px;
}

header h1 {
  margin: 0;
}

header a {
  color: #000;
  text-decoration: none;
  text-shadow: 1px 1px 2px #555;
}

header a:hover {
  text-decoration: underline;
}

header time {
  font-size: 80%;
  color: #555;
}
A first look at the blog
Figure 5-2. A first look at the blog

Git Integration

As mentioned before, the goal is to automatically update the blog whenever pushing to the blog repository. Most hosting sites, like GitHub or Bitbucket, offer service hooks: they will trigger a request to a custom URL whenever someone pushes new commits to the repository. Even if you host the repository on your own server, you can easily set up a so-called post-receive hook there. But let’s first look into the implementation before we go into setting everything up.

Regenerating content

To regenerate the content, all we have to do is reload our application. We could do that by restarting the process. However, that might be complicated to implement and cause our website to be down for a moment.

Another idea would be to simply load lib/blog.rb again. However, that would append the routes to the list of already defined routes rather than overriding existing routes. That approach works for adding new posts, but would prohibit editing existing posts. Moreover, it would leak memory, since old routes would never be removed.

We need to remove all the routes before loading the file again. But it doesn’t stop there, we also need to get rid of all the filters, middleware, error handlers, and so on. We are not using all those features at the moment, but we don’t want to break our app later on if we add a middleware or error handler. Luckily Sinatra has a mechanism for doing exactly that: the reset! method.

Let’s assume that in the middleware we’re creating, the wrapped endpoint (stored in app) is the Sinatra class we want to wrap. In that case we have reset! and the file that we want to reload available. The file is stored in the app_file setting. Sinatra takes care of setting it to the correct value. Example 5-12 demonstrates how to do this.

Example 5-12. Regenerating content (lib/github_hook.rb)
require 'sinatra/base'
require 'time'

class GithubHook < Sinatra::Base
  post '/update' do
    app.settings.reset!
    load app.settings.app_file

    content_type :txt
    "ok"
  end
end

The above middleware will reload our application whenever /update is being requested. We can use that when setting up a hook later on.

Pulling changes

When running on the server, we also want to automatically trigger a git pull to fetch the commits we just pushed from our local development machine to our source code repository and deploy them on our productions server. However, we probably don’t want to trigger a pull while in development. That way we can easily trigger a reload while working on a post without causing trouble with Git trying to pull in changes, as seen in Example 5-13.

Let’s introduce a setting called :autopull that specifies whether or not to trigger a pull on a reload and make that setting dependent on the current environment.

Example 5-13. Pulling changes (lib/github_hook.rb)
require 'sinatra/base'
require 'time'

class GithubHook < Sinatra::Base
  set(:autopull) { production? }

  post '/update' do
    app.settings.reset!
    load app.settings.app_file

    content_type :txt
    if settings.autopull?
      # Pipe stderr to stdout to make
      # sure we display everything.
      `git pull 2>&1`
    else
      "ok"
    end
  end
end

Proper cache headers

We want our page to render as quickly as possible and at the same time keep the load on our server as low as we can. Fortunately HTTP comes with a handful of headers to aid us here. We covered the basics of HTTP caching in Chapter 2, let’s see how best to utilize them.

First of all, we want to avoid outdated caches at any cost. We also want to allow public caching, since our blog is public. We’ll therefore call cache_control :public, :must_revalidate. To allow revalidation, we need to set at least either an ETag or a Last-Modified header. Let’s do both.

Since our blog is git-based, we can simply ask Git when the content has last been modified, and we can use the Commit Hash as ETag. And since we know when new commits are coming in, we only have to ask Git for the information whenever the update hook is triggered. Example 5-14 demonstrates how to probe Git for update information.

Example 5-14. lib/github_hook.rb
require 'sinatra/base'
require 'time'

class GithubHook < Sinatra::Base
  def self.parse_git
    # Parse hash and date from the git log command.
    sha1, date = `git log HEAD~1..HEAD --pretty=format:%h^%ci`.strip.split('^')
    set :commit_hash, sha1
    set :commit_date, Time.parse(date)
  end

  set(:autopull) { production? }
  parse_git

  before do
    cache_control :public, :must_revalidate
    etag settings.commit_hash
    last_modified settings.commit_date
  end

  post '/update' do
    settings.parse_git

    app.settings.reset!
    load app.settings.app_file

    content_type :txt
    if settings.autopull?
      `git pull 2>&1`
    else
      "ok"
    end
  end
end

Glueing Everything Together

What we still need to do is actually set up the GithubHook middleware in our Blog application. As with all middleware, we do that with the use method in Example 5-15.

Example 5-15. lib/blog.rb
require 'sinatra/base'
require 'github_hook'
require 'ostruct'
require 'time'

class Blog < Sinatra::Base
  use GithubHook

  set :root, File.expand_path('../../', __FILE__)
  set :articles, []
  set :app_file, __FILE__

  Dir.glob "#{root}/articles/*.md" do |file|
    meta, content   = File.read(file).split("

", 2)
    article         = OpenStruct.new YAML.load(meta)
    article.date    = Time.parse article.date.to_s
    article.content = content
    article.slug    = File.basename(file, '.md')

    get "/#{article.slug}" do
      erb :post, :locals => { :article => article }
    end

    articles << article
  end

  articles.sort_by! { |article| article.date }
  articles.reverse!

  get '/' do
    erb :index
  end
end

Rack It Up!

For deployment we’ll write a config.ru, as described earlier. To show a use case of the caching headers, let’s add the Rack::Cache library (as shown in Example 5-16), which implements an HTTP cache as a Rack middleware. This is, of course, completely optional.

Example 5-16. config.ru
$LOAD_PATH.unshift 'lib'

# this is optional
require 'rack/cache'
use Rack::Cache

require 'blog'
run Blog

As discussed in Chapter 3, you can now start the server with the rackup command. When deploying on your production system, make sure you set the environment to production. Optionally, you can start rackup as a daemon, so it will run in the background: rackup -E production -D -s thin.

Setting it up on GitHub

Now that we’re done with implementing our blog, let’s publish it on GitHub. Log in with your GitHub account. On the Dashboard, click on New Repository and follow the instructions.

Once you have your repository set up, go to its Admin section, navigate to Service Hooks, and add a Post-Receive URL pointing to the /update endpoint of your blog. See Figure 5-3.

Setting up a service hook on GitHub
Figure 5-3. Setting up a service hook on GitHub

Setting it up on Bitbucket

Bitbucket used to be a hosting site for Mercurial only, but it recently added support for Git. In contrast to GitHub, Bitbucket offers an unlimited number of private repositories for free. So, if you don’t want anyone to see the code of your little blog engine, Bitbucket might be an interesting alternative.

After logging in with your account, you can create a new repository by clicking the create repository link from the Repositories pop-up menu. Make sure you choose Git as Repository type. Again, just follow the instructions displayed after creating the repository.

Once you have your repository set up, go to its Admin section, navigate to Services, and add a POST URL pointing to the /update end-point of your blog. See Figure 5-4.

Setting up a service hook on Bitbucket
Figure 5-4. Setting up a service hook on Bitbucket

Using a post-receive hook

If you already are an experienced Git user, you might be tempted to set up your own repository somewhere. To set up a post-receive hook with your own repository, navigate to your repository on your server, create a file called .git/hooks/post-receive (see Example 5-17), and make that file executable, for instance by running chmod +x .git/hooks/post-receive on a Unix system. In this file we can add the logic for triggering such an update. Interestingly if you add a magic comment called shebang or hashbang, you can easily write that hook in Ruby.

Example 5-17. .git/hooks/post-receive
#!/usr/bin/env ruby
require 'open-uri'

# place your own URL here
open('http://localhost:4567/update')

What about Heroku?

You might want to deploy your application on Heroku. Heroku has a read-only filesystem, therefore you cannot have the blog automatically pull changes. Instead you have to push to Heroku explicitly. But it would be nice if we didn’t have to throw away all of the GithubHook middleware. Since the only real issue is not being able to pull in changes, we can solve this situation by disabling the autopull setting we introduced earlier. And since Heroku comes with an HTTP cache, there is no need for setting up Rack::Cache.

Heroku sets a few environment variables to avoid any additional configuration, therefore the variables URL and DATABASE_URL are indicators for running on Heroku.

This configuration is not really part of the application logic. The best place to store it is probably in the config.ru, which you can see in Example 5-18.

Example 5-18. config.ru
$LOAD_PATH.unshift 'lib'
require 'blog'

if ENV['URL'] and ENV['DATABASE_URL']
  # we're on heroku, no cache needed
  # also, it's a read-only file system
  GithubHook.disable :autopull
elsif Blog.production?
  require 'rack/cache'
  use Rack::Cache
end

run Blog

Summary

Hopefully you have a running blog by now. With this tutorial we demonstrated how to use some of the tools Sinatra has to offer and how to address real problems that come up when implementing a web application.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset