Backup package

We are first going to write the backup package, of which we will become the first customer when we write the associated tools. The package will be responsible for deciding whether directories have changed and need backing up or not, as well as actually performing the backup procedure too.

Obvious interfaces?

The first thing to think about when embarking on a new Go program is whether any interfaces stand out to you. We don't want to over-abstract or waste too much time up front designing something that we know will change as we start to code, but that doesn't mean we shouldn't look for obvious concepts that are worth pulling out. Since our code will archive files, the Archiver interface pops out as a candidate.

Create a new folder inside your GOPATH called backup, and add the following archiver.go code:

package backup

type Archiver interface {
  Archive(src, dest string) error
}

An Archiver interface will specify a method called Archive that takes source and destination paths and returns an error. Implementations of this interface will be responsible for archiving the source folder, and storing it in the destination path.

Note

Defining an interface up front is a nice way to get some concepts out of our heads and into code; it doesn't mean this interface can't change as we evolve our solution as long as we remember the power of simple interfaces. Also, remember that most of the I/O interfaces in the io package expose only a single method.

From the very beginning, we have made the case that while we are going to implement ZIP files as our archive format, we could easily swap this out later with another kind of Archiver format.

Implementing ZIP

Now that we have the interface for our Archiver types, we are going to implement one that uses the ZIP file format.

Add the following struct definition to archiver.go:

type zipper struct{}

We are not going to export this type, which might make you jump to the conclusion that users outside of the package won't be able to make use of it. In fact, we are going to provide them with an instance of the type for them to use, to save them from having to worry about creating and managing their own types.

Add the following exported implementation:

// Zip is an Archiver that zips and unzips files.
var ZIP Archiver = (*zipper)(nil)

This curious snippet of Go voodoo is actually a very interesting way of exposing the intent to the compiler, without using any memory (literally 0 bytes). We are defining a variable called ZIP of type Archiver, so from outside the package it's pretty clear that we can use that variable wherever Archiver is needed—if you want to zip things. Then we assign it with nil cast to the type *zipper. We know that nil takes no memory, but since it's cast to a zipper pointer, and given that our zipper struct has no fields, it's an appropriate way of solving a problem, which hides the complexity of code (and indeed the actual implementation) from outside users. There is no reason anybody outside of the package needs to know about our zipper type at all, which frees us up to change the internals without touching the externals at any time; the true power of interfaces.

Another handy side benefit to this trick is that the compiler will now be checking whether our zipper type properly implements the Archiver interface or not, so if you try to build this code you'll get a compiler error:

./archiver.go:10: cannot use (*zipper)(nil) (type *zipper) as type Archiver in assignment:
  *zipper does not implement Archiver (missing Archive method)

We see that our zipper type does not implement the Archive method as mandated in the interface.

Note

You can also use the Archive method in test code to ensure that your types implement the interfaces they should. If you don't need to use the variable, you can always throw it away by using an underscore and you'll still get the compiler help:

var _ Interface = (*Implementation)(nil)

To make the compiler happy, we are going to add the implementation of the Archive method for our zipper type.

Add the following code to archiver.go:

func (z *zipper) Archive(src, dest string) error {
  if err := os.MkdirAll(filepath.Dir(dest), 0777); err != nil {
    return err
  }
  out, err := os.Create(dest)
  if err != nil {
    return err
  }
  defer out.Close()
  w := zip.NewWriter(out)
  defer w.Close()
  return filepath.Walk(src, func(path string, info os.FileInfo, err error) error {
    if info.IsDir() {
      return nil // skip
    }
    if err != nil {
      return err
    }
    in, err := os.Open(path)
    if err != nil {
      return err
    }
    defer in.Close()
    f, err := w.Create(path)
    if err != nil {
      return err
    }
    io.Copy(f, in)
    return nil
  })
}

You will have to also import the archive/zip package from the Go standard library. In our Archive method, we take the following steps to prepare writing to a ZIP file:

  • Use os.MkdirAll to ensure the destination directory exists. The 0777 code represents the file permissions with which to create any missing directories.
  • Use os.Create to create a new file as specified by the dest path.
  • If the file is created without error, defer the closing of the file with defer out.Close().
  • Use zip.NewWriter to create a new zip.Writer type that will write to the file we just created, and defer the closing of the writer.

Once we have a zip.Writer type ready to go, we use the filepath.Walk function to iterate over the source directory src.

The filepath.Walk function takes two arguments: the root path, and a callback function func to be called for every item (files and folders) it encounters while iterating over the file system. The filepath.Walk function is recursive, so it will travel deep into subfolders too. The callback function itself takes three arguments: the full path of the file, the os.FileInfo object that describes the file or folder itself, and an error (it also returns an error in case something goes wrong). If any calls to the callback function result in an error being returned, the operation will be aborted and filepath.Walk returns that error. We simply pass that up to the caller of Archive and let them worry about it, since there's nothing more we can do.

For each item in the tree, our code takes the following steps:

  • If the info.IsDir method tells us that the item is a folder, we just return nil, effectively skipping it. There is no reason to add folders to ZIP archives, because anyway the path of the files will encode that information for us.
  • If an error is passed in (via the third argument), it means something went wrong when trying to access information about the file. This is uncommon, so we just return the error, which will eventually be passed out to the caller of Archive.
  • Use os.Open to open the source file for reading, and if successful defer its closing.
  • Call Create on the ZipWriter object to indicate that we want to create a new compressed file, and give it the full path of the file, which includes the directories it is nested inside.
  • Use io.Copy to read all of the bytes from the source file, and write them through the ZipWriter object to the ZIP file we opened earlier.
  • Return nil to indicate no errors.

This chapter will not cover unit testing or Test-driven Development (TDD) practices, but feel free to write a test to ensure that our implementation does what it is meant to do.

Tip

Since we are writing a package, spend some time commenting the exported pieces so far. You can use golint to help you find any exported pieces you may have missed.

Has the filesystem changed?

One of the biggest problems our backup system has is deciding whether a folder has changed or not in a cross-platform, predictable, and reliable way. A few things spring to mind when we think about this problem: should we just check the last modified date on the top-level folder? Should we use system notifications to be informed whenever a file we care about changes? There are problems with both of these approaches, and it turns out it's not a trivial problem to solve.

We are instead going to generate an MD5 hash made up of all of the information that we care about when considering whether something has changed or not.

Looking at the os.FileInfo type, we can see that we can find out a lot of information about a file:

type FileInfo interface {
  Name() string       // base name of the file
  Size() int64        // length in bytes for regular files; 
                         system-dependent for others
  Mode() FileMode     // file mode bits
  ModTime() time.Time // modification time
  IsDir() bool        // abbreviation for Mode().IsDir()
  Sys() interface{}   // underlying data source (can return nil)
}

To ensure we are aware of a variety of changes to any file in a folder, the hash will be made up of the filename and path (so if they rename a file, the hash will be different), size (if a file changes size, it's obviously different), last modified date, whether the item is a file or folder, and file mode bits. Even though we won't be archiving the folders, we still care about their names and the tree structure of the folder.

Create a new file called dirhash.go and add the following function:

package backup
import (
  "crypto/md5"
  "fmt"
  "io"
  "os"
  "path/filepath"
)
func DirHash(path string) (string, error) {
  hash := md5.New()
  err := filepath.Walk(path, func(path string, info os.FileInfo, err error) error {
    if err != nil {
      return err
    }
    io.WriteString(hash, path)
    fmt.Fprintf(hash, "%v", info.IsDir())
    fmt.Fprintf(hash, "%v", info.ModTime())
    fmt.Fprintf(hash, "%v", info.Mode())
    fmt.Fprintf(hash, "%v", info.Name())
    fmt.Fprintf(hash, "%v", info.Size())
    return nil
  })
  if err != nil {
    return "", err
  }
  return fmt.Sprintf("%x", hash.Sum(nil)), nil
}

We first create a new hash.Hash that knows how to calculate MD5s, before using filepath.Walk to iterate over all of the files and folders inside the specified path directory. For each item, assuming there are no errors, we write the differential information to the hash generator using io.WriteString, which lets us write a string to an io.Writer, and fmt.Fprintf, which does the same but exposes formatting capabilities at the same time, allowing us to generate the default value format for each item using the %v format verb.

Once each file has been processed, and assuming no errors occurred, we then use fmt.Sprintf to generate the result string. The Sum method on a hash.Hash calculates the final hash value with the specified values appended. In our case, we do not want to append anything since we've already added all of the information we care about, so we just pass nil. The %x format verb indicates that we want the value to be represented in hex (base 16) with lowercase letters. This is the usual way of representing an MD5 hash.

Checking for changes and initiating a backup

Now that we have the ability to hash a folder, and to perform a backup, we are going to put the two together in a new type called Monitor. The Monitor type will have a map of paths with their associated hashes, a reference to any Archiver type (of course, we'll use backup.ZIP for now), and a destination string representing where to put the archives.

Create a new file called monitor.go and add the following definition:

type Monitor struct {
  Paths       map[string]string
  Archiver    Archiver
  Destination string
}

In order to trigger a check for changes, we are going to add the following Now method:

func (m *Monitor) Now() (int, error) {
  var counter int
  for path, lastHash := range m.Paths {
    newHash, err := DirHash(path)
    if err != nil {
      return 0, err
    }
    if newHash != lastHash {
      err := m.act(path)
      if err != nil {
        return counter, err
      }
      m.Paths[path] = newHash // update the hash
      counter++
    }
  }
  return counter, nil
}

The Now method iterates over every path in the map and generates the latest hash of that folder. If the hash does not match the hash from the map (generated the last time it checked), then it is considered to have changed, and needs backing up again. We do this with a call to the as yet unwritten act method, before then updating the hash in the map with this new hash.

To give our users a high-level indication of what happened when they called Now, we are also maintaining a counter which we increment every time we back up a folder. We will use this later to keep our end users up-to-date on what the system is doing without bombarding them with information.

m.act undefined (type *Monitor has no field or method act)

The compiler is helping us again and reminding us that we have yet to add the act method:

func (m *Monitor) act(path string) error {
  dirname := filepath.Base(path)
  filename := fmt.Sprintf("%d.zip", time.Now().UnixNano())
  return m.Archiver.Archive(path, filepath.Join(m.Destination, dirname, filename))
}

Because we have done the heavy lifting in our ZIP Archiver type, all we have to do here is generate a filename, decide where the archive will go, and call the Archive method.

Tip

If the Archive method returns an error, the act method and then the Now method will each return it. This mechanism of passing errors up the chain is very common in Go and allows you to either handle cases where you can do something useful to recover, or else defer the problem to somebody else.

The act method in the preceding code uses time.Now().UnixNano() to generate a timestamp filename and hardcodes the .zip extension.

Hardcoding is OK for a short while

Hardcoding the file extension like we have is OK in the beginning, but if you think about it we have blended concerns a little here. If we change the Archiver implementation to use RAR or a compression format of our making, the .zip extension would no longer be appropriate.

Tip

Before reading on, think about what steps you might take to avoid hardcoding. Where does the filename extension decision live? What changes would you need to make in order to avoid hardcoding properly?

The right place for the filename extensions decision is probably in the Archiver interface, since it knows the kind of archiving it will be doing. So we could add an Ext() string method and access that from our act method. But we can add a little extra power with not much extra work by instead allowing Archiver authors to specify the entire filename format, rather than just the extension.

Back in archiver.go, update the Archiver interface definition:

type Archiver interface {
  DestFmt() string
  Archive(src, dest string) error
}

Our zipper type needs to now implement this:

func (z *zipper) DestFmt() string {
  return "%d.zip"
}

Now that we can ask our act method to get the whole format string from the Archiver interface, update the act method:

func (m *Monitor) act(path string) error {
  dirname := filepath.Base(path)
  filename := fmt.Sprintf(m.Archiver.DestFmt(), time.Now().UnixNano())
  return m.Archiver.Archive(path, filepath.Join(m.Destination, dirname, filename))
}
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset