We are first going to write the backup
package, of which we will become the first customer when we write the associated tools. The package will be responsible for deciding whether directories have changed and need backing up or not, as well as actually performing the backup procedure too.
The first thing to think about when embarking on a new Go program is whether any interfaces stand out to you. We don't want to over-abstract or waste too much time up front designing something that we know will change as we start to code, but that doesn't mean we shouldn't look for obvious concepts that are worth pulling out. Since our code will archive files, the Archiver
interface pops out as a candidate.
Create a new folder inside your GOPATH
called backup
, and add the following archiver.go
code:
package backup type Archiver interface { Archive(src, dest string) error }
An Archiver
interface will specify a method called Archive
that takes source and destination paths and returns an error. Implementations of this interface will be responsible for archiving the source folder, and storing it in the destination path.
Defining an interface up front is a nice way to get some concepts out of our heads and into code; it doesn't mean this interface can't change as we evolve our solution as long as we remember the power of simple interfaces. Also, remember that most of the I/O interfaces in the io
package expose only a single method.
From the very beginning, we have made the case that while we are going to implement ZIP files as our archive format, we could easily swap this out later with another kind of Archiver
format.
Now that we have the interface for our Archiver
types, we are going to implement one that uses the ZIP file format.
Add the following struct
definition to archiver.go
:
type zipper struct{}
We are not going to export this type, which might make you jump to the conclusion that users outside of the package won't be able to make use of it. In fact, we are going to provide them with an instance of the type for them to use, to save them from having to worry about creating and managing their own types.
Add the following exported implementation:
// Zip is an Archiver that zips and unzips files. var ZIP Archiver = (*zipper)(nil)
This curious snippet of Go voodoo is actually a very interesting way of exposing the intent to the compiler, without using any memory (literally 0 bytes). We are defining a variable called ZIP
of type Archiver
, so from outside the package it's pretty clear that we can use that variable wherever Archiver
is needed—if you want to zip things. Then we assign it with nil
cast to the type *zipper
. We know that nil
takes no memory, but since it's cast to a zipper
pointer, and given that our zipper
struct has no fields, it's an appropriate way of solving a problem, which hides the complexity of code (and indeed the actual implementation) from outside users. There is no reason anybody outside of the package needs to know about our zipper
type at all, which frees us up to change the internals without touching the externals at any time; the true power of interfaces.
Another handy side benefit to this trick is that the compiler will now be checking whether our zipper type properly implements the Archiver
interface or not, so if you try to build this code you'll get a compiler error:
./archiver.go:10: cannot use (*zipper)(nil) (type *zipper) as type Archiver in assignment: *zipper does not implement Archiver (missing Archive method)
We see that our zipper
type does not implement the Archive
method as mandated in the interface.
To make the compiler happy, we are going to add the implementation of the Archive
method for our zipper
type.
Add the following code to archiver.go
:
func (z *zipper) Archive(src, dest string) error { if err := os.MkdirAll(filepath.Dir(dest), 0777); err != nil { return err } out, err := os.Create(dest) if err != nil { return err } defer out.Close() w := zip.NewWriter(out) defer w.Close() return filepath.Walk(src, func(path string, info os.FileInfo, err error) error { if info.IsDir() { return nil // skip } if err != nil { return err } in, err := os.Open(path) if err != nil { return err } defer in.Close() f, err := w.Create(path) if err != nil { return err } io.Copy(f, in) return nil }) }
You will have to also import the archive/zip
package from the Go standard library. In our Archive
method, we take the following steps to prepare writing to a ZIP file:
os.MkdirAll
to ensure the destination directory exists. The 0777
code represents the file permissions with which to create any missing directories.os.Create
to create a new file as specified by the dest
path.defer out.Close()
.zip.NewWriter
to create a new zip.Writer
type that will write to the file we just created, and defer the closing of the writer.Once we have a zip.Writer
type ready to go, we use the filepath.Walk
function to iterate over the source directory src
.
The filepath.Walk
function takes two arguments: the root path, and a callback function func
to be called for every item (files and folders) it encounters while iterating over the file system. The filepath.Walk
function is recursive, so it will travel deep into subfolders too. The callback function itself takes three arguments: the full path of the file, the os.FileInfo
object that describes the file or folder itself, and an error (it also returns an error in case something goes wrong). If any calls to the callback function result in an error being returned, the operation will be aborted and filepath.Walk
returns that error. We simply pass that up to the caller of Archive
and let them worry about it, since there's nothing more we can do.
For each item in the tree, our code takes the following steps:
info.IsDir
method tells us that the item is a folder, we just return nil
, effectively skipping it. There is no reason to add folders to ZIP archives, because anyway the path of the files will encode that information for us.Archive
.os.Open
to open the source file for reading, and if successful defer its closing.Create
on the ZipWriter
object to indicate that we want to create a new compressed file, and give it the full path of the file, which includes the directories it is nested inside.io.Copy
to read all of the bytes from the source file, and write them through the ZipWriter
object to the ZIP file we opened earlier.nil
to indicate no errors.This chapter will not cover unit testing or Test-driven Development (TDD) practices, but feel free to write a test to ensure that our implementation does what it is meant to do.
One of the biggest problems our backup system has is deciding whether a folder has changed or not in a cross-platform, predictable, and reliable way. A few things spring to mind when we think about this problem: should we just check the last modified date on the top-level folder? Should we use system notifications to be informed whenever a file we care about changes? There are problems with both of these approaches, and it turns out it's not a trivial problem to solve.
We are instead going to generate an MD5 hash made up of all of the information that we care about when considering whether something has changed or not.
Looking at the os.FileInfo
type, we can see that we can find out a lot of information about a file:
type FileInfo interface { Name() string // base name of the file Size() int64 // length in bytes for regular files; system-dependent for others Mode() FileMode // file mode bits ModTime() time.Time // modification time IsDir() bool // abbreviation for Mode().IsDir() Sys() interface{} // underlying data source (can return nil) }
To ensure we are aware of a variety of changes to any file in a folder, the hash will be made up of the filename and path (so if they rename a file, the hash will be different), size (if a file changes size, it's obviously different), last modified date, whether the item is a file or folder, and file mode bits. Even though we won't be archiving the folders, we still care about their names and the tree structure of the folder.
Create a new file called dirhash.go
and add the following function:
package backup import ( "crypto/md5" "fmt" "io" "os" "path/filepath" ) func DirHash(path string) (string, error) { hash := md5.New() err := filepath.Walk(path, func(path string, info os.FileInfo, err error) error { if err != nil { return err } io.WriteString(hash, path) fmt.Fprintf(hash, "%v", info.IsDir()) fmt.Fprintf(hash, "%v", info.ModTime()) fmt.Fprintf(hash, "%v", info.Mode()) fmt.Fprintf(hash, "%v", info.Name()) fmt.Fprintf(hash, "%v", info.Size()) return nil }) if err != nil { return "", err } return fmt.Sprintf("%x", hash.Sum(nil)), nil }
We first create a new hash.Hash
that knows how to calculate MD5s, before using filepath.Walk
to iterate over all of the files and folders inside the specified path directory. For each item, assuming there are no errors, we write the differential information to the hash generator using io.WriteString
, which lets us write a string to an io.Writer
, and fmt.Fprintf
, which does the same but exposes formatting capabilities at the same time, allowing us to generate the default value format for each item using the %v
format verb.
Once each file has been processed, and assuming no errors occurred, we then use fmt.Sprintf
to generate the result string. The Sum
method on a hash.Hash
calculates the final hash value with the specified values appended. In our case, we do not want to append anything since we've already added all of the information we care about, so we just pass nil
. The %x
format verb indicates that we want the value to be represented in hex (base 16) with lowercase letters. This is the usual way of representing an MD5 hash.
Now that we have the ability to hash a folder, and to perform a backup, we are going to put the two together in a new type called Monitor
. The Monitor
type will have a map of paths with their associated hashes, a reference to any Archiver
type (of course, we'll use backup.ZIP
for now), and a destination string representing where to put the archives.
Create a new file called monitor.go
and add the following definition:
type Monitor struct { Paths map[string]string Archiver Archiver Destination string }
In order to trigger a check for changes, we are going to add the following Now
method:
func (m *Monitor) Now() (int, error) { var counter int for path, lastHash := range m.Paths { newHash, err := DirHash(path) if err != nil { return 0, err } if newHash != lastHash { err := m.act(path) if err != nil { return counter, err } m.Paths[path] = newHash // update the hash counter++ } } return counter, nil }
The Now
method iterates over every path in the map and generates the latest hash of that folder. If the hash does not match the hash from the map (generated the last time it checked), then it is considered to have changed, and needs backing up again. We do this with a call to the as yet unwritten act
method, before then updating the hash in the map with this new hash.
To give our users a high-level indication of what happened when they called Now
, we are also maintaining a counter which we increment every time we back up a folder. We will use this later to keep our end users up-to-date on what the system is doing without bombarding them with information.
m.act undefined (type *Monitor has no field or method act)
The compiler is helping us again and reminding us that we have yet to add the act
method:
func (m *Monitor) act(path string) error { dirname := filepath.Base(path) filename := fmt.Sprintf("%d.zip", time.Now().UnixNano()) return m.Archiver.Archive(path, filepath.Join(m.Destination, dirname, filename)) }
Because we have done the heavy lifting in our ZIP Archiver
type, all we have to do here is generate a filename, decide where the archive will go, and call the Archive
method.
The act
method in the preceding code uses time.Now().UnixNano()
to generate a timestamp filename and hardcodes the .zip
extension.
Hardcoding the file extension like we have is OK in the beginning, but if you think about it we have blended concerns a little here. If we change the Archiver
implementation to use RAR or a compression format of our making, the .zip
extension would no longer be appropriate.
The right place for the filename extensions decision is probably in the Archiver
interface, since it knows the kind of archiving it will be doing. So we could add an Ext()
string method and access that from our act
method. But we can add a little extra power with not much extra work by instead allowing Archiver
authors to specify the entire filename format, rather than just the extension.
Back in archiver.go
, update the Archiver
interface definition:
type Archiver interface {
DestFmt() string
Archive(src, dest string) error
}
Our zipper
type needs to now implement this:
func (z *zipper) DestFmt() string { return "%d.zip" }
Now that we can ask our act
method to get the whole format string from the Archiver
interface, update the act
method:
func (m *Monitor) act(path string) error { dirname := filepath.Base(path) filename := fmt.Sprintf(m.Archiver.DestFmt(), time.Now().UnixNano()) return m.Archiver.Archive(path, filepath.Join(m.Destination, dirname, filename)) }