Defining projects

Once you have your task or problem, you need to figure out how you are going to approach it. Personally, I start by writing down the problem in a notebook and start taking notes about what I do not know, what I do know, and how I might try and solve it. This research phase often involves searching online to figure out how other people have solved the problem, if there are any open source tools that people have already released that are in the problem space, or if no one else has this problem, why that is.

Evaluating other people's solutions teaches you about how this problem surfaces in different environments. It's also useful to figure out if a piece of open source software could solve your problem or nearly solve it. Every situation is different, but often you can find a software project that does 90% of what you want and you can either expand it or work with the maintainer to see if the software would work for your situation. All of this research will help to save you work and make sure that you know what you are getting yourself into. Do not let your research discourage you though! If you find a problem that no one else has solved yet, it does not mean it is too hard: it just means that there are more unknowns going into the project.

Once you have done your research, it is time to figure out the scale of the project. Is this something small that you can do in a few hours or days? If yes, I usually follow what is called Readme Driven Development or RDD. If it is going to be larger, you should probably write a design document. The reason for this is that the larger the project (in terms of scope, people or time), the more planning you should be doing, so there are less surprises along the way. If people you do not know are going to be using the software or if you will be working with others to make it, I also suggest writing a design document, so that they can understand what you are doing and how you would like it to be done.

RDD

I first heard about RDD from Tom Preston-Werner, one of the founders of GitHub. The general premise is that every project should have a README.md file in the root, which describes the project. This should be the first thing you write, and it should describe the basic requirements of the project and how it is going to work. The idea is that if anyone stumbles across the project, they should be able to get a deep understanding of what problem you are trying to solve, how you are trying to solve it, and other things you were thinking of.

Note

.md is a typical extension for a Markdown file. Markdown is just a plain-text markup language. The readme could easily be in any markup language you want, but markdown is just very simple and nice.

RDD is the casual version of the design document, which we will describe next. It does not need to be too formal, but it should be easy to understand. Focus on succinct statements, along with details that you find important. Bullet points, links to external resources, and drawings are all useful. I enjoy writing, so it is possible that you are rolling your eyes right now, but hear me out. Describing what you are going to do is good for you because it forces your brain to work through the problem in a space that is not code. This is important as we need to communicate with others because, at the end of the day, our code does not exist in a vacuum: it affects all of those around us. The document also leaves a record for the future, which we've already talked about being useful in Chapter 4, Postmortems. It will help someone in the future who has to reverse engineer your project when there is an outage involving it and you, or others involved, are not around to help.

Example

The following is a fictitious example. It involves a project that I have wanted to exist for a long time but have never built. Feel free to build it if you are looking for a side project. This document is written in markdown, so it can easily be turned into a PDF or HTML.

# enki.garden

Enki Garden is a personal internet archive. Its goal is to create a searchable index of websites and documents a user likes and can be hosted on their laptop or cheaply on a personal server.

## Background

I love the Internet Archive. It is an archive of digital culture. Everything from old video games to old websites and books is stored in its vaunted public archives online at archive.org. I think its hoarder approach to the Internet is the only way we can preserve history for the future. We are already getting to the point where there are programs we cannot run, art we cannot see and machines we cannot fix. We need to preserve all of this so that our children's children can understand how we got to where we are today.

Due to copyright laws, non-disclosure agreements (NDAs) and other general privacy concerns, you cannot upload everything to the Internet Archive, nor do you want everyone to have access to it. For example, I want my tax documents searchable, but I do not want them on the Internet Archive for everyone to see. Also, the Internet Archive respects 'robots.txt', so it will stop sharing archives for URLs someone puts in robots.txt, even if they want to archive it.

> robots.txt is a file served by a website that defines which robot spiders (also known as web-crawlers, indexers or scrapers) can index its pages. It is a blacklist of pages and a list of bots that should not access those pages.

## Minimum Viable Product

While the eventual goal would be to search all files across all of my computers and every website I have ever visited, I believe a good starting place is to be able to give a starting URL, scrape its contents, take a screenshot, and then scrape every page it links to.

Every day, I would like to re-scrape and screenshot every page I have stored but not the pages it linked to.

I want to store everything on a cloud service because I have a few different computers and I think it'd be a good start to solving the problem.

## Alternatives

- https://archive.org/
- https://archive.is/
- https://webrecorder.io/
- https://github.com/marvelm/erised
- https://github.com/rahiel/archiveror

## Inspiration

- https://www.gwern.net/Archiving-URLs
- https://www.archiveteam.org/

## Implementation

There are three main functions:

- Users submitting URLs and scanning for linked pages
- Scraping and screenshotting each URL
- Browsing historical archives

There will be a relational database that has two tables: one for URLs and one for scrape history.

We will store all scraped assets in a cloud object store and provide a link to them in the scrape history.

There will be four processes. The first is the web UI for browsing and adding new URLs. The second will screenshot URLs. The third will capture HTML and related assets and compress them into an archive. The fourth will scan user-submitted URLs for links and add those to the database.

You could expand this example over time by adding details on how to run the application, descriptions of important files, links to code documentation, or screenshots of the UI.

Note that the level of detail isn't too deep. The goal is to just provide broad strokes, while still providing direction. Notice that I suggested four separate pieces of work, which gives a bit of structure to how someone could contribute.

This is good for a smaller project or a personal project, but what about something larger?

Design documents

There is often a fear of design documents in smaller organizations. They taste like a formal requirements document, which is the fear of all software developers everywhere, due to a fear of the waterfall development process. In the past, a requirements document was written and software could not deviate from it. Teams on this path, two years after starting a project, sometimes delivered a piece of software which did not actually do what you wanted. However, because it matched the initial requirements document, you could say it was a success. This type of situation is often why people complain that government software contracts in the United States were broken before the creation of 18F and USDS (United States Digital Service) because to even put out a contract, you needed a full requirements document.

The software world has moved past strict requirements documents into a world of constant iteration. That being said, defining a starting point is still a good idea. Like I said above, by creating an initial description of the work you will be doing, you can circulate the document to find possible concerns and existing alternatives and introduce slight modifications to improve your design.

When writing a more formal design document, that a lot of people might see, I try to follow the below structure. It is not a hard and fast outline, but it at least gives me a starting point:

  • Summary: This is the executive summary, which is a paragraph that describes what this document is describing.
  • Background: This is the problem we are trying to solve and includes past attempts to solve the problem and other details about the problem.
  • Overview: A high-level, one-to-two paragraph description of our solution.
  • Detailed design: The in-depth dive on how to solve the problem. It starts with architecture and then has subsections of things other people might be concerned with. The goal of this is to be as descriptive as possible. It includes architecture diagrams and pseudocode of possibly complicated sections. Some subsections might include the following:
    • Testing
    • Migration
    • Security
    • Dependencies
    • Privacy
    • Billing
    • Logging
    • Monitoring
  • Alternatives considered: This is a list of alternatives and descriptions of why you aren't using them.
  • Implementation plan: This is a rough timeline of how you will build this, which is important if you view the project as having many phases.
  • Production impact: This is a section to describe the things people should be worried about when this goes to production. Could this triple the number of requests to the database? Does it encrypt a subset of internal network traffic? These are questions that people might want answers to.

As you might see, with all of these sections, this could be a very long document. When writing, remember the goal is not to explain everything you might do but rather to raise possible issues to others. The more explicit you are, the more reviewers' questions will be detailed, but the more specific you are, the less flexibility you may feel that you have later. Just remember that this isn't the end.

You will need to constantly evaluate if the project you are working on solves the problem you are trying to solve (and if the problem is still the same). Also, this document is not a contract: it is more of a directional statement.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset