Data access design strategies

The last bit of analysis that we need to undertake before we can start writing out the stories for this iteration involves determining where the responsibility for object data access is going to live. In a script or another purely procedural context, it would probably suffice to simply connect to a data source, read the data from it as needed, modify it as needed, and write any changes back out again, but that would only be viable because the entire procedure would be relatively static.

In an application or service such as hms_sys, data use is very much a random-access scenario—there may be common procedures that might even look a lot like a simple script's step-by-step implementations, but those processes could (and will) be initiated in a fashion that may be totally unpredictable.

That, then, means that we need to have data access processes that are easily called and repeatable with minimal effort. Given that we already know that at least two different data storage mechanisms will be in play, it would also make future support and development a lot easier if we could design these processes so that the exact same method calls could be used, no matter what the underlying data store looks like—again, abstracting the processes, and allowing code to use interfaces, not implementations.

One option that would accomplish this sort of abstraction starts at the data source, making each data source aware of the object-types that are in play, and storing the information that it needs to be able to perform CRUD operations for each object-type somewhere. That's technically a workable implementation, but it will get very complicated very quickly, because each combination of data store and business object type needs to be accounted for and maintained. Even if the initial class set is limited to three data store variants (the file system data store of the Artisan Application, a generic RDBMS data store, and a generic NoSQL data store), that's four operations (CRUD) across three data store types for four business objects, for a total of 48 permutations (4 × 3 × 4) that have to be built, tested, and maintained. Each new operation added into the mix, such as, say, the ability to search a business object data store, as well as each new business object type to be persisted and each new data store type, increases that permutation count multiplicatively—adding one of each increases the count to 75 items (5 × 3 × 5) that have to be dealt with—which could easily get out of control.

If we take a step back and think about what we actually need for all of those combinations, a different and more manageable solution is possible. For each and every business object that needs to be persisted, we need to be able to do the following:

Create a record for a new object.
Read a record for a single object, identified somehow, and return an instance for that item.
Update the record for a single object after changes have been made to it.
Delete the record for a single object.
Find and return zero-to-many objects based on matches to some criteria.

It might also be useful to be able to flag objects as being in specific states—active versus inactive, and deleted (without actually deleting the underlying record), perhaps. Tracking created and/or updated dates/times is also a common practice—it's sometimes useful for sorting purposes, if nothing else.

All of the CRUD operations relate directly to the object type itself—that is, we need to be able to create, read, update, delete, and find Artisan objects in order to work with them. The various object properties of those instances can be retrieved and populated as needed in the context of the instance's creation, created as part of the instance's creation process, or updated with the owning instance or individually as needed. With those subordinate actions in mind, keeping track of whether an object's record needs to be created or updated will probably be useful as well. Finally, we'll need to keep track of some unique identifier for each object's state data record in the data store. Putting all of those together, the following is what a BaseDataObject ABC might look like:

The properties are all concrete, with implementations baked in at the BaseDataObject level:

oid is the unique identifier of the object, and is a UUID value that will be stored as, and converted from, a string during data access.

created and modified are Python datetime objects, and may also need to be converted to and from string-value representations during data access.
is_active is a flag that indicates whether or not a given record should be considered active, which allows for some management of active/inactive state for records and thus for objects that those records represent.
is_deleted is a similar flag, indicating whether the record/object should be considered as deleted, even if it really still exists in the database.
is_dirty and is_new are flags that keep track of whether an object's corresponding record needs to be updated (because it's been changed) or created (because it's new), respectively. They are local properties, and will not be stored in a database.

Using a UUID instead of a numeric sequence requires a bit more work, but has some security advantages, especially in web application and service implementations—UUID values are not easily predictable, and have 16³² possible values, making automated exploits against them much more time-consuming.

There may be requirements (or at least a desire) to not really delete records, ever. It's not unusual in certain industries, or for publicly traded companies who are required to meet certain data-audit criteria, to want to keep all data, at least for some period of time.

BaseDataObject defines two concrete and three abstract instance methods:

create (abstract and protected) will require derived classes to implement a process for creating and writing a state data record to the relevant database.
matches (concrete) will return a Boolean value if the property values of the instance that it's called from match the corresponding values of the criteria passed to it. This will be instrumental in implementing criteria-based filtering in the get method, which will be discussed shortly.
save (concrete) will check the instance's is_dirty flag, calling the instance's update method and exiting if it's True, then check the is_new flag, calling the instance's create method if it is True. The net result of this is that any object deriving from BaseDataObject can simply be told to save itself, and the appropriate action will be taken, even if it's no action.

to_data_dict (abstract) will return a dict representation of the object's state data, with values in formats and of types that can be written to the database that state data records live in.
update (abstract and protected) is the update implementation counterpart to the create method, and is used to update an existing state data record for an object.

BaseDataObject also defines four class methods, all of which are abstract—each of these methods, then, is bound to the class itself, not to instances of the class, and must be implemented by other classes that derive from BaseDataObject:

delete performs a physical record deletion for each record identified by the provided *oids.
- from_data_dict returns an instance of the class, populated with the state data in the data_dict provided, which will usually result from a query against the database that those records live in. It's the counterpart of the to_data_dict method, which we already described.

get is the primary mechanism for returning objects with state data retrieved from the database. It's been defined to allow both specific records (the *oids argument list) and filtering criteria (in the **criteria keyword arguments, which is expected to be the criteria argument passed to matches for each object), and will return an unsorted list of object instances according to those values.
sort accepts a list of objects and sorts them using a callback function or method passed in sort_by.

BaseDataObject captures all of the functional requirements and common properties that would need to be present in order to let the business object classes and instances take responsibility for their data storage interactions. Setting aside any database engine concerns for the moment, defining a data persistence-capable business object class such as an Artisan in the Artisan Application becomes very simple—the final, concrete Artisan class just needs to inherit from BaseArtisan and BaseDataObject, as follows, and then implement the nine required abstract methods that are required by those parent classes:

This approach would suffice if it could be safely assumed that any given application or service instance will always use the same data store backend for each business object type. Any engine-specific needs or capabilities could simply be added to each final concrete class. It would also be possible, though, to collect any properties needed by specific data store engines (MongoDB and MySQL, for example) into an additional layer of abstraction, then have the final concrete objects derive from one of those instead:

In this scenario, the final Artisan class could derive from either MongoDataObject or MySQLDataObject, and those could enforce the provision of any data required to execute the data access methods against those specific backend engines. Those middle-layer ABCs might also provide some helper methods for tasks that are relevant for each engine type—taking the template SQL in the create_sql class attribute, for example, and populating it with instance data values from to_data_dict() results in being able to create the final SQL for a MySQL call to create an instance. This approach would keep most of the data access information needed by any given business object class in that class, and associated with the business object itself, which doesn't feel like a bad idea, though it has the potential to get complex if a lot of combinations need to be supported. It would also keep the level of effort involved in adding adding new functionality to all data objects (at the BaseDataObject level of the class tree) more manageable—the addition of new abstract functionality would still require implementation in all derived concrete classes, but any concrete changes would simply be inherited and immediately available.

Table of Contents for Data access design strategies

Create new playlist

Sign In

Sign Up

Table of Contents for
Data access design strategies