CHAPTER 2

image

Introduction to Requirements

Being busy does not always mean real work. The object of all work is production or accomplishment and to either of these ends there must be forethought, system, planning, intelligence, and honest purpose, as well as perspiration. Seeming to do is not doing.

—Thomas Edison, American Inventor and Businessman

If there is anything worse than doing a simple task without fully understanding the requirements for success, it is doing a complex one. It happens every day, computer projects are created with only a shimmer of an idea of what the end goal is. Sometimes this is really interesting, and leads you to something interesting. . .sometimes, but almost never when the goal of that project is to deliver value to a customer who knows what they want. Hence, someone involved in every software project must take time to figure out what the customer wants before any software designing starts. If you are very lucky, this will not be you, as capturing requirements is considerably more difficult than any task that will follow in later chapters.

However, for the rest of this chapter, we are going to assume that we have been given a business analyst hat to wear and we need to gather the requirements before we start to design a database. We will keep it very simple, covering only the minimal amount that can be gotten away with.

The first thing one generally does when starting a computer project is to interview users and ask a question along the lines of “Just what is it that you want?” And then listen, being mindful that users often aren’t technologists. Some of them are. Still others completely believe they are and are not. So be sure and get lots of people involved and truly take the time to understand the customer’s problems before you try to solve them. A major issue from my own history was that I would try to rapidly get past understanding the customer needs and right into design of the internals of a database.

The problem is that users don’t think about databases; they think about user interfaces (UIs) and reports and how they want things to look and feel. Of course, a lot of what the user specifies for a UI or report format is actually going to end up reflected in the database design; it is up to you to be certain that there is enough in the requirements to design storage without too much consideration about how it will be displayed, processed, or used. The data has an essence of its own that must be obeyed at this point in the process, or you will find yourself in a battle with the structures you concoct. In this chapter, we will go through some of the basic sorts of data you want to get and locations to look to make sure you are getting the right kinds of requirements to begin the database design process from your business analyst.

Of course, if you are a newbie, you are probably thinking that this all sounds like a lot of writing and not a lot of designing (and coding). No matter how you slice it, planning every project is like this. If you are lucky, you will have analysts who do the requirements gathering so you can design and code software. However, the importance of making sure someone gathers requirements cannot be understated. During a software project (and really any project, but let’s focus on software projects), the following phases are common:

  • Requirements gathering: Document what a system is to be, and identify the criteria that will make the project a success.
  • Design: Translate the requirements into a plan for implementation.
  • Implementation: Code the software.
  • Testing: Verify that the software does what it is supposed to do.
  • Maintenance: Make changes to address problems not caught in testing.

Each phase of the project after requirements gathering relies on the requirements to make sure that the target is met. Requirements are like a roadmap, giving you the direction and target to get there. Trying to build your database without first outlining requirements is like taking a trip without a map. The journey may be fun, but you may find you should have taken that left turn at Albuquerque, so instead of sunning your feathers on Pismo beach, you have to fight an abominable snowman. Without decent requirements, a very large percentage of projects fail to meet user’s needs. A very reasonable discussion that needs to be considered is how many requirements are enough. In the early days of software development, these phases were done one at a time for the entire project, so you gathered all requirements that would ever be needed and then designed the entire software project before any coding started, and so on. This method of arranging a project has been given the name of “waterfall method” because the output of one step flowed into another. Waterfall has been derided as generally terrible (certainly uncool), because each phase is done to completion before moving to the next. More and more teams are using “agile” project development methods, but the phases are the same as in the waterfall method, the major differences being in the iterations and time slices. You still need to know what you are doing before you design, and design before you code.

The important point I want to make clear in this chapter is simple: each of these phases will be performed whether you like it or not. I have been on projects where we started implementation almost simultaneously with the start of the project. Eventually, we had to go back to gather requirements to find out why the user wasn’t happy with our output. And the times when we jumped directly from gathering requirements to implementation were a huge mess, because every programmer did his or her own thing, and eventually every database, every object, and every interface in the system looked completely different. It is a mess that is probably still being dug out from today.

This book is truly about design and implementation (although there is a section about data quality and testing in Chapter 13). However, after this chapter, I am going to assume requirements are finished, and the design phase has begun in earnest. Many books have been written about the software requirements gathering and documenting process, so I am not going to even attempt to come up with a deep example of requirements. Rather, I’ll just make a quick list of what I look for in requirements. As writer Gelett Burress once said about art, “I don’t know anything about art, but I know what I like,” and the same is really quite true when it comes to requirements. In any case, requirements should be captured, and you can generally tell the good from the bad by a few key criteria:

  • Requirements should generally include very few technical details about how a problem will be solved; they should contain only the definition of the problem and success criteria. For example, a good requirements document might say “the clerks have to do all of their adding in their heads, and this is slow and error prone. For project success, we would prefer the math in a manner that avoids error.” A poor requirements document would exchange the last phrase for “. . . we would prefer the math be done using a Texas Instruments TI3243 calculator.” A calculator might be the solution, and that may be a great one, but the decision should be left to the technologist to avoid overly limiting the final solution.
  • The language used should be as specific as possible. As an example, consider a statement like “we only pay new-hire DBAs $20,000 a year, and the first raise is after six months.” If this was the actual requirement, the company could never hire a qualified DBA—ever. And if you implemented this requirement in the software as is, the first time the company wanted to break the rule (like if Paul Nielsen became available), that user would curse your name, hire Paul as a CEO, in title only, and after six months, change his designation to DBA. (Users will find a way to get their job done!) If the requirement was written specifically enough, it would have said “We usually only. . .”, which is implemented much differently.
  • Requirements should be easily read and validated by customers. Pure and simple, use language the users can understand, not technical jargon that they just gloss over so they don’t realize that you were wrong until their software fails to meet their needs. Simple diagrams and pictures also work nicely as communication devices.

For my mind, it really doesn’t matter how you document requirements, just as long as they get written down. Write them down. Write them down. Hopefully, if you forget the rest of what I said in this chapter, you’ll remember that. If you are married or have parents, you have probably made the mistake of saying, “Yes ______, I promise I will get that done for you” and then promptly forgetting what was said exactly so an argument eventually occurs. “Yes, you did say that you wanted the screen blue!” you say to your customers. At this point, you have just called them liars or stupid, and that is not a great business practice. On the other hand, if you forward the document in which they agreed to color the screen blue, taking responsibility for their mistake is in their court.

Finally, how will we use written requirements in the rest of the software creation process? In the design phase, requirements are your guide to how to mold your software. The technical bits are yours (or corporate standards) to determine: two tables or three, stored procedures or ad hoc access, C# or VB? But the final output should be verifiable by comparing the design to the requirements. And when it is time to do the overall system tests, you will use the requirements as the target for success. Later, you will use requirements to test your design and code to make sure you have met them.

In this chapter, I will cover two particular parts of the requirements gathering process:

  • Documenting requirements: I’ll briefly introduce the types of concerns you’ll have throughout the project process in terms of documenting requirements.
  • Looking for requirements: Here, I’ll talk about the places to find information and some techniques for mining that information.

Requirements are not a trivial part of a project, and most certainly should not be omitted, but like anything, they can be overdone. This chapter will give you a bit of advice on where to look or, if you are in the happy programmer position of not being the one gathering requirements, what to make sure has been looked at. The sad reality of programming is that if the system you create stinks because the requirements that you were given stink, it won’t be the requirements gatherer who has to recode.

Documenting Requirements

If you’ve ever traveled to a place where no one speaks the same language as you, you know the feeling of being isolated based solely on communication. Everything everyone says sounds incomprehensible to you, and no matter how often you ask where the bathroom is, all you get is this blank look back. It has nothing to do with intelligence; it’s because you aren’t speaking the same language. This sounds really obvious to say, but you can’t expect the entire population of another country to learn your language perfectly just so you can get what you need. It works better if you learn their language. Even when two people speak the same basic language, often there can be dialects and phrasing that can be confusing. But that is what we often expect of our users.

Information technology professionals and our clients tend to have these sorts of communication issues, because frequently, we technology types don’t speak the same dialect or even the same language as our clients. Clients tend to think in the language of their industry, and we tend to think in terms of computer solutions. You probably have the same feelings when you are the user as they do. For example, think about SQL Server’s tools. We relational programmers have trouble communicating to the tool designers what we want in SQL Server’s tools. They do an adequate job for most tasks, but clearly, they aren’t completely on the same page as the users.

During the process of analysis, you should adopt one habit early on: document, document, document as much of the information that you acquire as reasonably possible. Sometimes people take vacations, departing with vast amounts of job knowledge that is in their head only, and sometimes they don’t come back from vacation (you know, because they get offered a job as a Jungle Cruise captain at Disneyland!). Many variations on this scenario exist, not all pleasant to contemplate. Without documentation, you will quickly risk losing vital details. It’s imperative that you don’t try to keep everything in your head, because even people with the best memories tend to forget the details of a project (especially if they end up staying in the Magic Kingdom).

The following are a few helpful tips as you begin to take notes on users’ needs:

  • Try to maintain a set of documents that will share system requirement and specification information. Important documents to consider include design-meeting notes, documents describing verbal change requests, and sign-offs on all specifications, such as functional, technical, testing, and so on.
  • Beyond formal documentation, it’s important to keep the members of your design team up to date and fully informed. Develop and maintain a common repository for all the information, and keep it up to date.
  • Note anywhere that you add information that the users haven’t given you or outwardly agreed to.
  • Set the project’s scope early on, and do your best to scope the requirements the same. This will prevent the project from getting too big or diverse to be achievable within a reasonable period of time and within the budget. Hashing out changes that affect the budget, particularly ones that will increase the budget, early in the process will avoid future animosity.
  • Be succinct, but thorough. A document can be 1000 pages or 10 and say the same thing. The big test: Can the users, architects, and programmers use it and produce the end goal?

Once you produce a document, a crucial step follows: make sure the client agrees with your version of their goals. As you go through the entire system design process, the clients will no doubt change their minds on entities, data points, business rules, user interface, colors—just about anything they can—and you have to prepare yourself for this. Whatever the client wants or needs is what you have to endeavor to accomplish, as they are ultimately in control of the project, which unfortunately often means communicating through a third party like a project manager and being flexible enough to run with any proposed changes, whether minor or major. This setup initially sounds great, because you think the project manager will translate for you and be on the side of quality and correctness, and sometimes this is true. But often, the manager will mistranslate a client desire into something quite odd and then insist that it is the client’s desire. “I need all of the data on one screen” gets translated into “I need all of the data in one table.” Best case is that the manager realizes who the technical people are and who have business needs. If you have a typical job, worst case is probably closer to reality.

In addition to talking to the client, it’s important to acquire as many notes, printouts, screenshots, portable drives, etc. loaded with spreadsheets, database backups, Word documents, e-mails, handwritten notes, and so on that exist for any current solution to the problem. This data will be useful in the process of discovering data elements, screens, reports, and other elements that you’ll need to design into your applications. Often, you’ll find information in the client’s artifacts that’s invaluable when putting together the data model.

Image Tip  Throughout the process of design and implementation, you’ll no doubt find changes to the original requirements. Make sure to continue to update your documentation, because the most wonderfully written and formatted documentation in the world is useless if it’s out of date.

Gathering Requirements

Gathering requirements can be a daunting task, so start small and keep adding until you and your customer agree that you understand the problem well enough to move past it. It isn’t necessary to gather every requirement about every area of a large system initially; the system can be broken down into portions, often referred to as subject areas. The size of the subject area is based on the needs of the team and development methodology used. For example, an Agile approach, like Scrum, breaks down everything into small units for designing, coding, testing, and releasing frequently (for example, every two weeks), while something like the waterfall methodology would expect you to design the entire system first and then start coding. No matter what your methodology, the eventual outcome should at least resemble the same thing (with allowances for change in technology and corporate needs). The important thing is that all development methodologies will tell you one thing: design what you are going to code before you code.

For gathering requirements, there are many tools and methodologies for documenting processes, business rules, and database structures. The Unified Modeling Language (UML) is one possible choice; the Microsoft Solutions Framework (which employs UML) and Rational Unified Process are others. There are also several model types in the IDEF family of methods for business process modeling as well; we will cover their data modeling technique in Chapter 3. I’ll employ the Entity-Relationship (E-R) modeling method IDEF1X to model databases. I won’t be covering any of the other modeling languages for the nondatabase structure parts of the project but will rather be using a simple manual spreadsheet method, which is by far the most common method of documenting requirements—even in medium-sized organizations where spending money on documentation tools can be harder than teaching your pet half-bee Eric to make good word choices when playing Words with Friends (“Buzz Again?”).

Regardless of the tools used to document the requirements, the needs for the database design process are the same. Specifications need to be acquired that will lead to you as the database designer to discover all of the following:

  • Entities and relationships
  • Attributes and domains
  • Business rules that can be enforced in the database
  • Processes that require the use of the database

Without these specifications, you’ll either have to constantly go back to the clients and ask a bunch of questions (which they will sometimes answer three different ways for every two times they are asked, teaching you discernment skills) or start making guesses. Although guessing wrong a few times is a good education in how not to do things, it’s certainly no way to work efficiently (unless you happen to be the Amazing Kreskin and guess right 99.9% of the time, though I am pretty sure it was a trick and he had done his requirements gathering as well).

As a major part of the process of implementing a database system, the data architect’s goal will be to produce a graphical model of the database, which we will be covering in the following chapters.

Image Tip  During the early parts of a project, figure out the “what” and “why” first; then you can work on the “how.” Once you know the details of what needs to be built, the process to get it built will be reasonably natural, and you can possibly apply preexisting patterns to the solution.

Vagueness may cause unnecessary discussions, fights, or even lawsuits later in the process. So, make sure your clients understand what you’re going to do for them, and use language that will be clearly understood but that’s specific enough to describe what you learn in the information gathering process.

Throughout the process of discovery, artifacts will be gathered and produced that will be used throughout the process of implementation as reference materials. Artifacts are any kind of documents that will be important to the design, for example, interview notes, e-mails, sample documents, and so on. In this section, I’ll discuss the some of the main types of activities that you will need to be very interested in as a database architect:

  • Interviewing clients
  • Asking the right questions
  • Working with existing systems and prototypes

By no means is this an exhaustive list of where to find and acquire documentation; in fact, it’s far from it. The goal is simply to get your mind clicking and thinking of information to get from the client so your job will be easier.

Interviewing Clients

It might be the case that the person designing the data storage (commonly referred as the data architect) will never meet the user, let alone be involved in formal interviews. The project manager, business analyst, and/or system architect might provide all the required information. Other projects might involve only a data architect or a single person wearing more hats than the entire Fourth Army on maneuvers. I’ve done it both ways: I’ve been in the early design sessions, and I’ve worked from documentation. The better the people you work with, the more favorable the latter option is. In this section, I’ll talk quickly about the basics of client interviews, because on almost any project, you’ll end up doing some amount of interviewing the client.

Client interviews are commonly where the project really gets started. It’s where the free, unstructured flow of information starts. However, it’s also where the communication gap starts. Many clients generally think visually—in terms of forms, web pages, and perhaps simple user interfaces. Users also tend to think solely from their own perspective. For example, they may use the word “error” to denote why a process did not run as they expected. These error conditions may be not only actual errors but choices the user makes. So a value like “scheduled maintenance” might be classified as an error condition. It is very much up to the people with “analyst” embroidered on the back of their hats to analyze what users are actually asking for.

As such, the job is to balance the customers’ perceived wants and needs with their real need: a properly structured database that sits nicely behind a user interface and captures what they are really after, specifically information to make their business lives easier and more lucrative. Changing a form around to include a new text box, label, or whatever is a relatively simple task, giving the user the false impression that creating the entire application is an easy process. If you want proof, make the foolish mistake of demonstrating a polished-looking prototype application with non-hard-coded values that makes the client think it actually works. The clients might be impressed that you’ve put together something so quickly and expect you to be nearly done. Rarely will they understand that what exists under the hood—namely, the database and other layers of business and interface objects—is where all the main work takes place.

Image Tip  While visual elements are great places to find a clue to what data a user will want as you go along in the process, you’ll want to be careful not to center your database design too heavily around a particular interface. The structure of the data needs to be dictated by what the data means, not on how it will be presented. Presentation is more of an interface design task, not a database design one.

Brainstorming sessions with users can also yield great results for gathering a lot of information at once, as long as the group doesn’t grow too large (if your meeting requires an onsite caterer for lunch, you are not going to make any great decisions). The key here is to make sure that someone is facilitating the meeting and preventing the “alpha” person from beating up on the others and giving only his or her own opinion (it is even worse if you are that alpha person!). Treat information from every person interviewed as important, because each person will likely have a different, yet valuable viewpoint. Sometimes (OK, usually) the best information comes not from the executive, but from the person who does the work. Don’t assume that the first person speaks for the rest, even if they’re all working on the same project or if this individual is the manager (or even president or owner of a major corporation, though a great amount of tact is required sometimes to walk that tightrope).

In many cases, when the dominant person cannot be controlled or the mousey person cannot be prodded into getting involved, one-on-one sessions should be employed to allow all clients to speak their minds, without untimely interruptions from stronger-willed (though sometimes not stronger-minded) colleagues. Be mindful of the fact that the loudest and boldest people might not have the best ideas and that the quiet person who sits at the back and says nothing might have the key to the entire project. Make sure to at least consider everybody’s opinions.

This part of the book is written with the most humility, because I’ve made more mistakes in this part of the design process than any other (and like anyone with lots of experience, I have made my fair share of mistakes in all levels of software engineering). The client interview is one of the most difficult parts of the process that I’ve encountered. It might not seem a suitable topic for experienced analysts, but even the best of us need to be reminded that jumping the gun, bullying the clients, telling them what they want before they tell you, and even failing to manage the user’s expectations can lead to the ruin of even a well-developed system. If you have a shaky foundation, the final product will likely be shaky as well.

Asking the Right Questions

Before painting the interior of any house, there are a set of questions that the painting company’s representative will ask every single one of their clients (colors to use? rooms to paint? children’s room? write on/wipe off use?). The same can go for almost any computer software project. In the following sections are some questions that are going to be important to the database design aspects of a system’s development. Clearly, this is not going to be an exhaustive list, but it’s certainly enough to get you started, so at a minimum, you won’t have to sit in a room one day with no idea about what to say.

What Data Is Needed?

If the data architect is part of the project team, some data is clearly needed for the system. Most users, at a high level, know what data they want to see out of the system. For example, if they’re in accounting, they want to see dollars and cents summarized by such-and-such groupings. It will be very important at some time in your process to differentiate between what data is needed and what would just be nice to have. It is obviously a really straightforward question, and the customer may have no idea what they need, but many users already work with data, either on existing systems (that they either hate and will be glad you are replacing for them; or will hate you for changing things) or in a spreadsheet system that they have been using since VisiCalc.

How Will the Data Be Used?

Knowing what your client is planning to use the data in the system for is an important piece of information indeed. Not only will you understand the processes that you will be trying to model, but you can also begin to get a good picture of the type of data that needs to be stored.

For example, imagine you’re asked to create a database of contacts for a dental office. You might want to know the following:

  • Will the contact names be used just to make phone calls, like a quick phone book?
  • Will the client be sending e-mail or posting to the members of the contact lists? Should the names be subdivided into groups for this purpose?
  • Will the client be using the names to solicit a response from the mail, such as appointment reminders?
  • Is it important to have family members documented? Do they want to send cards to the person on important dates?

Usage probably seems like it would be out of bounds early in the design process, and in some ways, you would be right. But in broad strokes, usage information is definitely useful. Information about the types of processes where data might be used is important, but what screen it might show up on is less so. For example, take addresses. If you just capture them for infrequent usage, you might only need to give the user a single string to input an entire address. But if your business is mailing, you may need to format it to your post office’s exact specifications, so you don’t have to pay the same postage rates as the normal human beings.

What Rules Govern the Use of the Data?

Almost every piece of data you are going to want to store will have rules that govern how it is stored, used, and accessed. These rules will provide a lot of guidance to the model that you will produce. As an example, taking our previous example of contacts, you might discover the following:

  • Every contact must have a valid e-mail address.
  • Every contact must have a valid street address.
  • The client checks every e-mail address using a mail routine, and the contact isn’t a valid contact until this routine has been successfully executed.
  • Contacts must be subdivided by the type of issues they have.
  • Only certain users can access the e-mail addresses of the contacts.

It’s important to be careful with the verbiage of the rules gathered early in the process. Many times, the kinds of rules you get seem pretty straightforward when they are written down, but the reality is quite often not so simple. It is really important as you are reviewing rules to confirm them with the analyst and likely directly with the client before assuming them to be true.

As a case in point, what is a “valid” e-mail address? Well, it’s the e-mail address that accurately goes with the contact. Sure, but how on earth do you validate that? The fact is that in many systems you don’t. Usually, this is implemented to mean that the string meets the formatting for an e-mail address, in that it has an ampersand character between other characters and a dot (.) between one or more alphanumeric values (such as %@%.%, plus all characters between A and Z, 0 and 9, an underscore, and so on), but the value is completely up to interpretation. On the other hand, in other types of systems, you actually require the user to pick up some information from the e-mail to validate that it is, indeed, a working e-mail address and that the person who entered the data has rights to it. It is very much up to the needs of the system, but the English question can easily be put using the exact same words.

The real problem comes when you too-strictly interpret rules and your final product ends up unacceptable because you’ve placed an overly restrictive rule on the data that the client doesn’t want or you’ve missed a rule that the client truly needs. I made this mistake in a big way once, which torpedoed a system for several weeks early in its life. Rules that the clients had seemingly wanted to be strictly enforced needed to be occasionally overridden on a case-by-case basis, based on their clients’ desires. Unfortunately, our program didn’t make it possible for the user to override these rules, and they never tried to simulate this condition in their user acceptance testing, so teeth were gnashed and sleep was lost fixing the problem.

Some rules might have another problem: the client wants the rule, but implementing it isn’t possible or practical. For example, the client might request that all registered visitors of a web site have to insert a valid mobile phone number, but is it certain that visitors would provide this data? And what exactly is a valid mobile number? Can you validate that by format alone, or does the number have to be validated by calling it or checking with the phone company? What if users provide a landline instead? Implementability is of limited concern at this point in the process. Someone will have to enforce the rule, and that will be ironed out later in the process.

What Data Is Reported On?

Reports are often one of the most frequently forgotten parts of the design process, yet in reality, they are almost certainly the most important part of the project to the client. Usually, the thing that makes a computer system profitable is the ability to report on all of the activity in the system in great detail. How productive different parts of the organization are, how effective salespersons are (on the second Tuesday after the first Wednesday, because, well, just because every business does stuff that seems wacky to outsiders. . .and some insiders). All of these questions are a large part of the “why” that make computer systems worthwhile for even very small companies.

Many novice developers leave designing and implementing reports until the last minute (a mistake I’ve made more than once over the years, and have suffered through many, many times). For the user, reports are where data becomes information and are used as the basis of vital decision making and can make or break a company. Note that it isn’t a tools problem. Reporting Services, Power BI, etc. all make the tasks easier. The problem lies in knowing what data to capture. If you need to know the temperature of a freezer over the last five days, a part of the initial design is to capture the temperature in a temporal manner, over at least five days. If you didn’t know that requirement, you might just have a single value for the current freezer temperature and not be able to meet the reporting needs. That would be an easy problem to solve, but often there are many little annoying problems like this that come up late in the process that end up implemented in a less than awesome manner. Getting it right the first time is definitely better than not.

Looking back at the contact example, what name does the client want to see on the reports? The following items come to mind:

  • First name, last name
  • First name, middle name, last name
  • Last name, first name
  • Nickname

It’s important to try to nail down such issues early, no matter how small or silly they seem to you at this point. They’re important to the client, who you should always remember is paying the bill. And frankly, the most important rule for reporting is that you cannot report on data that you do not capture.

From a database design standpoint, the content of reports is extremely important, because it will likely help to discover data requirements that aren’t otherwise thought of. Avoid being concerned with the ascetics of the reports yet, because that might lead to the temptation of coding and away from modeling.

Image Tip  Don’t overlook any existing reports that might have a corresponding report in the new system. Rarely should you just duplicate old reports, but the content will likely include data that the client may never even think about when they’re expressing needs. There will often be hundreds of reports currently in production, and in the new system, there is little doubt that the number will go up, unless many of the reports can be consolidated. Specific reporting architecture is covered in more detail in Chapter 14.

Where Is the Data Now?

It is nice once in a while to have the opportunity to create a totally new database with absolutely no preexisting data. This makes life so easy and your job a lot of fun. Unfortunately, as years pass, finding a completely new system to implement gets less likely than the Browns winning the Super Bowl (no offense, Browns fans, but it is what it is). The only likely exception is when building a product to be sold to end users in a turnkey fashion (then the preexisting data is their problem, or yours if you purchase their system). For almost every system I have worked on, I was creating a better version of some other system, so we had to consider converting existing data that’s important to the end users. (Several have been brand new systems. These were wonderful experiences for many reasons; not only didn’t we have to deal with data conversion but we didn’t have to deal with existing processes and code either.)

Every organization is different. Some have data in one centralized location, while others have it scattered in many (many) locations. Rarely, if ever, is the data already in one well-structured database that you can easily access. If that were the case, why would the client come to you at all? Clients typically have data in the following sundry locations:

  • Mainframe or legacy servers: Millions of lines of active COBOL still run many corporations.
  • Spreadsheets: Spreadsheets are wonderful tools to view, slice, and dice data but are wildly inappropriate places to maintain complex data. Most users know how to use a spreadsheet as a database but, unfortunately, are not so experienced in ensuring the integrity of their data, so this data is undoubtedly going to give you a major headache.
  • Desktop databases such as Microsoft Access: Desktop databases are great tools and are easy to deploy and use. However, this ease of use often means that these databases are constructed and maintained by nontechnical personnel and are poorly designed, potentially causing many problems when the databases have to be enlarged or modified.
  • Filing cabinets: Even now, in the twenty-first century, many companies still have few or no computers used for anything other than playing solitaire and instead maintain stockpiles of paper documents. Your project might simply be to replace a filing cabinet with a computer-based system or to supply a simple database that logs the physical locations of the existing paper documents.

Data that you need to include in the SQL Server database you’re designing will come from these and other weird and wonderful sources that you discover from the client (truth is commonly stranger than fiction). Even worse, spreadsheets, filing cabinets, and poorly designed computerized databases don’t enforce data integrity (and often desktop databases, mainframe applications, and even existing SQL Server databases don’t necessarily do such a perfect job either), so always be prepared for dirty data that will have to be cleaned up before storage in your nice new database.

Will the Data Need to Be Integrated with Other Systems?

Once you have a good idea of where the client’s important data is located, you can begin to determine how the data in your new SQL Server solution will interact with the data that will stay in its original format. This might include building intricate gateway connections to mainframes, linking server connections to other SQL Servers, Oracle boxes, or Hadoop systems, or even linking to spreadsheets. You can’t make too many assumptions about this topic at this point in your design. Just knowing the architecture you’ll need to deal with can be helpful later in the process.

Image Tip  Never expect that the data you will be converting or integrating with is going to have any quality. Too many projects get their start with poor guesses about the effort required, and data cleanup has been the least well-guessed part of them all. It will be hard enough to understand what is in a database to start with, but if the data is bad, it will make your job orders of magnitude more difficult

How Much Is This Data Worth?

It’s important to place value judgments on data. In some cases, data will have great value in the monetary sense. For example, in the dental office example that will be presented later in Chapter 4, the value lies in the record of what has been done to the patient and how much has been billed to the patient and his or her insurance company. Without this documentation, digging out this data to eventually get paid for the work done might take hours and days. This data has a specific monetary value, because the quicker the payment is received, the more interest is drawn, meaning more profits. If the client shifts the turnover of payments from one month to one week because of streamlining the process, this might be worth quite a bit more money.

On the other hand, just because existing data is available doesn’t necessarily mean that it should be included in the new database. The client needs to be informed of all the data that’s available and should be provided with a cost estimate of transferring it into the new database. The cost of transferring legacy data can be high, and the client should be offered the opportunity to make decisions that might conserve funds for more important purposes.

Who Will Use the Data?

Who is going to use the data probably doesn’t instantly jump out at you as a type of data that needs to be considered during the early stages of requirements gathering. When designing an interface, usually who is going to actually be pushing the button probably doesn’t make a lot of difference to the button design (unless disabilities are involved in the equation perhaps). Yet, the answer to the question of “who” can start to answer several different types of questions:

  • Security: “Who will use the data?” can be taken two ways. First, these are the only people who care about the data. Second, these are the only people who are privileged to use the data. The latter will require you to create boundaries to utilization. For fun, add in privacy laws like the Health Insurance Portability and Accountability Act (HIPAA) or Sarbanes–Oxley Act (SOX) or any of 100 other well-meaning laws around the world that punish the DBA more than the offenders.
  • Structure: If multiple user groups need the same data, but for particularly different needs, this could lead to different possible design outcomes later in the process.
  • Concurrency: The design considerations for a system that has one simultaneous user are different from those for a system that has ten, or a hundred, thousand, and so on. The number of users should not change our conceptual or logical designs, but it will certainly change how we design the physical layer. Concurrency is something we won’t make a lot of reference to until very late in this book (Chapter 11), but this is the point in time when you are doing the asking and likely specifying the future hardwaresoftware, so it doesn’t hurt to find out now.

This choice of who will use the data goes hand in hand with all of the other questions you have gotten answered during the process of gathering requirements. Of course, these questions are just the start of the information gathering process, but there is still a lot more work to go before you can start building a database, so you are going to have to cool your jets a bit longer.

Are Existing Systems Being Replaced?

If you’re writing a new version of a current database system, access to the existing system is going to be a blessing and a curse. Obviously, the more information you can gather about how any previous system and its data was previously structured, the better. All the screens, data models, object models, user documents, and so on are important to the design process.

However, unless you’re simply making revisions to an existing system, often the old database system is reasonable only as a reference point for completeness, not as an initial blueprint. On most occasions, the existing system you’ll be replacing will have many problems that need to be fixed, not emulated. If the system being replaced had no problems, why is the client replacing it? Possibly just to move to newer technology, but no one replaces a working system just for kicks. The worst part of replacing an existing system? Testing. Old System says OrderCount: 100; New System says it is 102. Which is right? Someone will need to prove the new system is right, no matter how well the new code has been tested.

Another thing that makes existing systems complicated is exiting data. If you have 20 years of previous data, perhaps based on a proprietary hierarchical file–based mainframe application, two problems often arise. First: the new system “needs” to work like the old system. Second: 20 years of data often presents tons of structural issues. So part of the requirements, design, and plan for implementation needs to address how to get the old data in.

Prototypes from the early design process might also exist. Prototypes can be useful tools to communicate how to solve a real-world problem using a computer or when you’re trying to reengineer how a current process is managed. Their role is to be a proof of concept—an opportunity to flesh out with the design team and the end users the critical elements of the project on which success or failure will depend.

Utilizing Other Types of Documentation

Apart from interviews and existing systems, you can look to other sources to find data rules and other pieces of information relevant to the design project. Often, the project manager will obtain these documents; sometimes, they will not be available to you, and you just have to take someone else’s word for what is in them. In these cases, I find it best to put into writing your understanding and make sure it is clear who said what about the meaning of documentation you cannot see. And as always, the following list is certainly not exclusive but should kick-start your thinking about where to get existing documentation for a system you are creating or replacing.

Early Project Documentation

If you work for a company that is creating software for other companies, you’ll find that early in the project there are often documents that get created to solicit costs and possible solutions, for example:

  • Request for quote (RFQ): A document with a fairly mature specification that an organization sends out to determine how much a solution would cost
  • Request for proposal (RFP): For less mature ideas for which an organization wants to see potential solutions and get an idea about its costs

Each of these documents contains valuable information that can help you design a solution, because you can get an idea of what the client wanted before you got involved. Things change, of course, and not always will the final solution resemble the original request, but a copy of an RFP or an RFQ should be added to the pile of information that you’ll have available later in the process. Although these documents generally consist of sketchy information about the problem and the desired solution, you can use them to confirm the original reason for wanting the database system and for getting a firmer handle on what types of data are to be stored within it.

No matter what, if you can get a copy of these documents, you’ll be able to see the client’s thought pattern and why the client wants a system developed.

Contracts or Client Work Orders

Getting copies of the contract can seem like a fairly radical approach to gathering design information, depending on the type of organization you’re with. Frankly, in a corporate structure, you’ll likely have to fight through layers of management to make them understand why you need to see the contract at all. Contracts can be inherently difficult to read because of the language they’re written in (sort of like a terse version of a programming language, with intentional vagueness tossed in to give lawyers something to dispute with one another later). However, be diligent in filtering out the legalese, and you’ll uncover what amounts to a basic set of requirements for the system—often the requirements that you must fulfill exactly or not get paid. Even more fun is the stuff you may learn that has been promised that the implementation team has never heard of.

What makes the contract so attractive is simple. It is, generally speaking, the target you’ll be shooting at. No matter what the client says, or what the existing system was, if the contract specifies that you deliver some sort of watercraft and you deliver a Formula 1 race car because the lower-level clients change their minds without changing the contract, you might not get paid because your project is deemed a failure (figuratively speaking, of course, since maybe they will let you keep the Formula 1 car?).

Level of Service Agreement

One important section of contracts that’s also important to the design process is the required level of service. This might specify the number of web pages per minute, the number of rows in the database, and so on. All this needs to be measured, stored, tested for, and so on. When it comes to the testing and optimization phases, knowing the target level of service can be of great value. You may also find some data that needs to be stored to validate that a service level is being met.

Audit Plans

Don’t forget about audits. When you build a system, you must consider whether the system is likely to be audited in the future and by whom. Government agencies, ISO 9000 clients, and other clients that are monitored by standards organizations are likely to have strict audit requirements. Other clients may also have financial audit processes. Of particular concern are all the various privacy policies, child data restrictions, credit card encryption rules, and so on. All of these will require not only that you follow rules that regulatory bodies set but that you document certain parts of your operation. These audit plans might contain valuable information that can be used in the design process.

Prototypes

A prototype is kind of a live document that gets created so that the user can get a feel for how software might work for them. Prototypes are fantastic communication devices, but they are focused on visuals, not internals. The real problem with prototypes is that if a database was created for the prototype, it is rarely going to be worth anything. So, by the time database design starts, you might be directed to take a prototype database that has been hastily developed and “make it work” or, worse yet, “polish it up.” Indeed, you might inherit an unstructured, unorganized prototype, and your task will be to turn it into a production database in no time flat (loosely translated, that means to have it done early yesterday).

It may be up to you, at times, to remind customers to consider prototypes only as interactive pictures to get the customer to try out a concept, often to get your company to sign a contract. As a data architect, you must work as hard as possible to use prototype code only as a working document that you use to inform your own design. Prototypes help you to be sure you’re not going to miss out on any critical pieces of information that the users need—such as a name field, a search operation, or even a button (which might imply a data element)—but they may not tell you anything about the eventual database design at all.

Following Best Practices

The following list of some best practices can be useful to follow when dealing with and gathering requirements:

  • Be diligent: Look through everything to make sure that what’s being said makes sense. Be certain to understand as many of the business rules that bind the system as possible before moving on to the next step. Mistakes made early in the process can mushroom later in the process.
  • Document: The format of the documentation isn’t really all that important, only that you get documented as much of what the client wants as possible. Make sure that the documentation is understandable by all parties involved and that it will be useful going forward toward implementation.
  • Communicate: Constant communication with clients is essential to keep the design on track. The danger is that if you start to get the wrong idea of what the client needs, every decision past that point might be wrong. Get as much face time with the client as possible.

Summary

In this chapter, I’ve touched on some of the basics of documentation and requirements gathering. This is one of the most important parts of the process of creating software, because it’s the foundation of everything that follows. If the foundation is solid, the rest of the process has a chance. If the foundation is shoddy, the rest of the system that gets built will likely be the same. The purpose of this process is to acquire as much information as possible about what the clients want out of their system. As a data architect, this information might be something that’s delivered to you, or at least most of it. Either way, the goal is to understand the users’ needs.

Once you have as much documentation as possible from the users, the real work begins. Through all this documentation, the goal is to prepare you for the next step of producing a data model that will document in a very formal manner all of the following:

  • Entities and relationships
  • Attributes and domains
  • Business rules that can be enforced in the database
  • Processes that require the use of the database

From this, a conceptual data model will emerge that has many of the characteristics that will exist in the actual implemented database. In the upcoming chapters, the database design will certainly change from this conceptual model, but it will share many of the same characteristics.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset