For detecting non-admissible and inconsistent data and for preventing such data to be added to an application’s database, we need to define suitable integrity constraints that can be used by the application’s data validation mechanisms for catching these cases of flawed data. Integrity constraints are logical conditions that must be satisfied by the data entered by a user and stored in the application’s database.
For instance, if an application is managing data about persons including their birth dates and their death dates, then we must make sure that for any person record with a death date, this date is not before that person’s birth date.
Since integrity maintenance is fundamental in database management,the data definition language part of the relational database language SQL supports the definition of integrity constraints in various forms. On the other hand, however, there is hardly any support for integrity constraints and data validation in common programming languages such as PHP, Java, C# or JavaScript. It is therefore important to take a systematic approach to constraint validation in web application engineering, like choosing an application development framework that provides sufficient support for it.
Unfortunately, many web application development frameworks do not provide sufficient support for defining integrity constraints and performing data validation. Integrity constraints should be defined in one (central) place in an app, and then be used for configuring the user interface and for validating data in different parts of the app, such as in the user interface and in the database. In terms of usability, the goals should be:
HTML5 provides support for validating user input in an HTML-forms-based user interface (UI). Here, the goal is to provide immediate feedback to the user whenever invalid data has been entered into a form field. This UI mechanism of responsive validation is an important feature of modern web applications. In traditional web applications, the back-end component validates the data and returns the validation results in the form of a set of error messages to the front-end. Only then, often several seconds later, and in the hard-to-digest form of a bulk message, does the user get the validation feedback.
Integrity constraints (or simply constraints) are logical conditions on the data of an app. They may take many different forms. The most important type of constraints, property constraints, define conditions on the admissible property values of an object. They are defined for an object type (or class) such that they apply to all objects of that type. We concentrate on the most important cases of property constraints:
String Length Constraints | require that the length of a string value for an attribute is less than a certain maximum number, or greater than a minimum number. |
Mandatory Value Constraints | require that a property must have a value. For instance, a person must have a name, so the name attribute must not be empty. |
Range Constraints | require that an attribute must have a value from the value space of the type that has been defined as its range. For instance, an integer attribute must not have the value “aaa” |
Interval Constraints | require that the value of a numeric attribute must be in a specific inteval. |
Pattern Constraints | require that a string attribute’s value must match a certain pattern defined by a regular expression. |
Cardinality Constraints | apply to multi-valued properties, only, and require that the cardinality of a multi-valued property’s value set is not less than a given minimum cardinality or not greater than a given maximum cardinality. |
Uniqueness Constraints | require that a property’s value is unique among all instances of the given object type. |
Referential Integrity Constraints | require that the values of a reference property refer to an existing object in the range of the reference property. |
Frozen Value Constraints | require that the value of a property must not be changed after it has been assigned initially. |
The visual language of UML class diagrams supports defining integrity constraints either in a special way for special cases (like with predefined keywords), or, in the general case, with the help of invariants, which are conditions expressed either in plain English or in the Object Constraint Language (OCL) and shown in a special type of rectangle attached to the model element concerned. We use UML class diagrams for modeling constraints in design models that are independent of a specific programming language or technology platform.
UML class diagrams provide special support for expressing multiplicity (or cardinality) constraints. This type of constraint allows to specify a lower multiplicity (minimum cardinality) or an upper multiplicity (maximum cardinality), or both, for a property or an association end. In UML, this takes the form of a multiplicity expression l..u
where the lower multiplicity l
is a non-negative integer and the upper multiplicity u
is either a positive integer not smaller than l
or the special value *
standing for unbounded. For showing property multiplicity (or cardinality) constrains in a class diagram, multiplicity expressions are enclosed in brackets and appended to the property name, as shown in the Person
class rectangle below.
In the following sections, we discuss the different types of property constraints listed above in more detail. We also show how to express some of them in computational languages such as UML class diagrams, SQL table creation statements, JavaScript model class definitions, or the annotation-based languages Java Bean Validation annotations and ASP.NET Data Annotations.
Any systematic approach to constraint validation also requires to define a set of error (or ’exception’) classes, including one for each of the standard property constraints listed above.
The length of a string value for a property such as the title of a book may have to be constrained, typically rather by a maximum length, but possibly also by a minimum length. In an SQL table definition, a maximum string length can be specified in parenthesis appended to the SQL datatype CHAR
or VARCHAR
, as in VARCHAR(50)
.
UML does not define any special way of expressing string length constraints in class diagrams. Of course, we always have the option to use an invariant for expressing any kind of constraint, but it seems preferable to use a simpler form of expressing these property constraints. One option is to append a maximum length, or both a minimum and a maximum length, in parenthesis to the datatype name, like so
Another option is to use min/max constraint keywords in the property modifier list:
A mandatory value constraint requires that a property must have a value. This can be expressed in a UML class diagram with the help of a multiplicity constraint expression where the lower multiplicity is 1. For a single-valued property, this would result in the multiplicity expression 1..1
, or the simplified expression 1
, appended to the property name in brackets. For example, the following class diagram defines a mandatory value constraint for the property name
:
Whenever a class rectangle does not show a multiplicity expression for a property, the property is mandatory (and single-valued), that is, the multiplicity expression 1
is the default for properties.
In an SQL table creation statement, a mandatory value constraint is expressed in a table column definition by appending the key phrase NOT NULL
to the column definition as in the following example:
According to this table definition, any row of the persons
table must have a value in the column name
, but not necessarily in the column age
.
In JavaScript, we can code a mandatory value constraint by a class-level check function that tests if the provided argument evaluates to a value, as illustrated in the following example:
With Java Bean Validation, a mandatory property like name
is annotated with Not-Null
in the following way:
The equivalent ASP.NET Data Annotation is Required
as shown in
A range constraint requires that a property must have a value from the value space of the type that has been defined as its range. This is implicitly expressed by defining a type for a property as its range. For instance, the attribute age
defined for the object type Person
in the class diagram above has the range Integer
, so it must not have a value like “aaa”, which does not denote an integer. However, it may have values like -13 or 321, which also do not make sense as the age of a person. In a similar way, since its range is String
, the attribute name
may have the value “” (the empty string), which is a valid string that does not make sense as a name.
We can avoid allowing negative integers like -13 as age values, and the empty string as a name, by assigning more specific datatypes as range to these attributes, such as NonNegativeInteger
to age
, and NonEmptyString
to name
. Notice that such more specific datatypes are neither predefined in SQL nor in common programming languages, so we have to implement them either in the form of user-defined types, as supported in SQL-99 database management systems such as PostgreSQL, or by using suitable additional constraints such as interval constraints, which are discussed in the next section. In a UML class diagram, we can simply define NonNegativeInteger
and NonEmptyString
as custom datatypes and then use them in the definition of a property, as illustrated in the following diagram:
In JavaScript, we can code a range constraint by a check function, as illustrated in the following example:
This check function detects and reports a constraint violation if the given value for the name
property is not of type “string” or is an empty string.
In a Java EE web app, for declaring empty strings as non-admissible user input we must set the context parameter
to true
in the web deployment descriptor file web.xml
.
In ASP.NET, empty strings are non-admissible by default.
An interval constraint requires that an attribute’s value must be in a specific interval, which is specified by a minimum value or a maximum value, or both. Such a constraint can be defined for any attribute having an ordered type, but normally we define them only for numeric datatypes or calendar datatypes. For instance, we may want to define an interval constraint requiring that the age
attribute value must be in the interval [25,70]. In a class diagram, we can define such a constraint by using the property modifiers min
and max
, as shown for the age
attribute of the Driver
class in the following diagram.
In an SQL table creation statement, an interval constraint is expressed in a table column definition by appending a suitable CHECK
clause to the column definition as in the following example:
In JavaScript, we can code an interval constraint in the following way:
In Java Bean Validation, we express this interval constraint by adding the annotations Min(0)
and Max(120)
to the property age
in the following way:
The equivalent ASP.NET Data Annotation is Range(25,70)
as shown in
A pattern constraint requires that a string attribute’s value must match a certain pattern, typically defined by a regular expression. For instance, for the object type Book
we define an isbn
attribute with the datatype String
as its range and add a pattern constraint requiring that the isbn
attribute value must be a 10-digit string or a 9-digit string followed by “X” to the Book
class rectangle shown in the following diagram.
In an SQL table creation statement, a pattern constraint is expressed in a table column definition by appending a suitable CHECK
clause to the column definition as in the following example:
The ~
(tilde) symbol denotes the regular expression matching predicate and the regular expression d{9}(d|X)$
follows the syntax of the POSIX standard (see, e. g. the PostgreSQL documentation107).
In JavaScript, we can code a pattern constraint by using the built-in regular expression function test
, as illustrated in the following example:
In Java EE Bean Validation, this pattern constraint for isbn
is expressed with the annotation Pattern
in the following way:
The equivalent ASP.NET Data Annotation is RegularExpression
as shown in
A cardinality constraint requires that the cardinality of a multi-valued property’s value set is not less than a given minimum cardinality or not greater than a given maximum cardinality. In UML, cardinality constraints are called multiplicity constraints, and minimum and maximum cardinalities are expressed with the lower bound and the upper bound of the multiplicity expression, as shown in the following diagram, which contains two examples of properties with cardinality constraints.
The attribute definition nickNames[0..3]
in the class Person
specifies a minimum cardinality of 0 and a maximum cardinality of 3, with the meaning that a person may have no nickname or at most 3 nicknames. The reference property definition members[3..5]
in the class Team
specifies a minimum cardinality of 3 and a maximum cardinality of 5, with the meaning that a team must have at least 3 and at most 5 members.
It’s not obvious how cardinality constraints could be checked in an SQL database, as there is no explicit concept of cardinality constraints in SQL, and the generic form of constraint expressions in SQL, assertions, are not supported by available DBMSs. However, it seems that the best way to implement a minimum (or maximum) cardinality constraint is an on-delete (or on-insert) trigger that tests the number of rows with the same reference as the deleted (or inserted) row.
In JavaScript, we can code a cardinality constraint validation for a multi-valued property by testing the size of the property’s value set, as illustrated in the following example:
With Java Bean Validation annotations, we can specify
A uniqueness constraint (or key constraint) requires that a property’s value (or the value list of a list of properties in the case of a composite key constraint) is unique among all instances of the given object type. For instance, in a UML class diagram with the object type Book
we can define the isbn
attribute to be unique, or, in other words, a key, by appending the (user-defined) property modifier keyword key
in curly braces to the attribute’s definition in the Book
class rectangle shown in the following diagram.
In an SQL table creation statement, a uniqueness constraint is expressed by appending the keyword UNIQUE
to the column definition as in the following example:
In JavaScript,we can code this uniqueness constraint by a check function that tests if there is already a book with the given isbn
value in the books
table of the app’s database.
An unique attribute (or a composite key) can be declared to be the standard identifier for objects of a given type, if it is mandatory (or if all attributes of the composite key are mandatory). We can indicate this in a UML class diagram with the help of the property modifier id
appended to the declaration of the attribute isbn
as shown in the following diagram.
Notice that such a standard identifier declaration implies both a mandatory value and a uniqueness constraint on the attribute concerned.
Standard identifiers are called primary keys in relational databases. We can declare an attribute to be the primary key in an SQL table creation statement by appending the phrase PRIMARY KEY
to the column definition as in the following example:
In JavaScript, we cannot easily code a standard identifier declaration, because this would have to be part of the metadata of the class definition, and there is no standard support for such metadata in JavaScript. However, we should at least check if the given argument violates the implied mandatory value or uniqueness constraints by invoking the corresponding check functions discussed above.
A referential integrity constraint requires that the values of a reference property refer to an object that exists in the population of the property’s range class. Since we do not deal with reference properties in this chapter, we postpone the discussion of referential integrity constraints to Volume 2.
A frozen value constraint defined for a property requires that the value of this property must not be changed after it has been assigned. This includes the special case of read-only value constraints on mandatory properties that are initialized at object creation time.
Typical examples of properties with a frozen value constraint are standard identifier attributes and event properties. In the case of events, the semantic principle that the past cannot be changed prohibits that the property values of events can be changed. In the case of a standard identifier attribute we may want to prevent users from changing the ID of an object since this requires that all references to this object using the old ID value are changed as well, which may be difficult to achieve (even though SQL provides special support for such ID changes by means of its ON UPDATE CASCADE
clause for the change management of foreign keys).
The following diagram shows how to define a frozen value constraint for the isbn
attribute:
In Java, a read-only value constraint can be enforced by declaring the property to be final
. In JavaScript, a read-only property slot can be implemented as in the following example:
where the property slot obj.teamSize
is made unwritable. An entire object obj
can be frozen with Object.freeze( obj)
.
We can implement a frozen value constraint for a property in the property’s setter method like so:
So far, we have only discussed how to define and check property constraints. However, in certain cases there may be also integrity constraints that do not just depend on the value of a particular property, but rather on
OCL
The Object Constraint Language (OCL) was defined in 1997 as a formal logic language for expressing integrity constraints in UML version 1.1. Later, it was extended for allowing to define also (1) derivation expressions for defining derived properties, and (2) preconditions and postconditions for operations, in a class model.
In a class model, property constraints can be expressed within the property declaration line in a class rectangle (typically with keywords, such as id
, max
, etc.). For expressing more complex constraints, such as object-level or type-level constraints, we can attach an invariant declaration box to the class rectangle(s) concerned and express the constraint either in (unambiguous) English or in the Object Constraint Language (OCL). A simple example of an object-level constraint expressed as an OCL invariant is shown in Figure 7.1.
A general approach for implementing object-level constraint validation consists of taking the following steps:
validate
.validate
function returning either a ConstraintViolation
or a NoConstraintViolation
object.a.in the UI/view, on form submission;
b.in the model class, before save, both in the create
and in the update
method.
Constraints affecting two or more model classes could be defined in the form of static methods (in a model layer method library) that are invoked from the object-level validation methods of the affected model classes.
This problem is well-known from classical web applications where the front-end component submits the user input data via HTML form submission to a back-end component running on a remote web server. Only this back-end component validates the data and returns the validation results in the form of a set of error messages to the front-end. Only then, often several seconds later, and in the hard-to-digest form of a bulk message, does the user get the validation feedback. This approach is no longer considered acceptable today. Rather, in a responsive validation approach, the user should get immediate validation feedback on each single data input. Technically, this can be achieved with the help of event handlers for the user interface events input
or change
.
Responsive validation requires a data validation mechanism in the user interface (UI), such as the HTML5 form validation API108. Alternatively, the jQuery Validation Plugin109 can be used as a (non-HTML5-based) form validation API.
The HTML5 form validation API essentially provides new types of input
fields (such as number
or date
) and a set of new attributes for form control elements for the purpose of supporting responsive validation performed by the browser. Since using the new validation attributes (like required
, min
, max
and pattern
) implies defining constraints in the UI, they are not really useful in a general approach where constraints are only checked, but not defined, in the UI.
Consequently, we only use two methods of the HTML5 form validation API for validating constraints in the HTML-forms-based user interface of our app. The first of them, setCustomValidity
, allows to mark a form field as either valid or invalid by assigning either an empty string or a non-empty (constraint violation) message string.
The second method, checkValidity
, is invoked on a form before user input data is committed or saved (for instance with a form submission). It tests, if all fields have a valid value. For having the browser automatically displaying any constraint violation messages, we need to have a submit
event, even if we don’t really submit the form, but just use a save
button.
See this Mozilla tutorial110 or this HTML5Rocks tutorial111 for more about the HTML5 form validation API.
Integrity constraints should be defined in the model classes of an MVC app since they are part of the business semantics of a model class (representing a business object type). However, a more difficult question is where to perform data validation? In the database? In the model classes? In the controller? Or in the user interface (“view”)? Or in all of them?
A relational database management system (DBMS) performs data validation whenever there is an attempt to change data in the database, provided that all relevant integrity constraints have been defined in the database. This is essential since we want to avoid, under all circumstances, that invalid data enters the database. However, it requires that we somehow duplicate the code of each integrity constraint, because we want to have it also in the model class to which the constraint belongs.
Also, if the DBMS would be the only application component that validates the data, this would create a latency, and hence usability, problem in distributed applications because the user would not get immediate feedback on invalid input data. Consequently, data validation needs to start in the user interface (UI).
However, it is not sufficient to perform data validation in the UI. We also need to do it in the model classes, and in the database, for making sure that no flawed data enters the application’s persistent data store. This creates the problem of how to maintain the constraint definitions in one place (the model), but use them in two or three other places (at least in the model classes and in the UI code, and possibly also in the database).We call this the multiple validation problem. This problem can be solved in different ways. For instance:
The simplest, and most responsive, solution is the third one, using only JavaScript both for the back-end and front-end components of a web app.
We again consider the book data management problem that was considered in Part 1. But now we also consider the data integrity rules (or ’business rules’) that govern the management of book data. These integrity rules, or constraints, can be expressed in a UML class diagram as shown in Figure 7.2 below.
In this model, the following constraints have been expressed:
isbn
attribute is declared to be the standard identifier of Book
, it is mandatory and unique.isbn
attribute has a pattern constraint requiring its values to match the ISBN-10 format that admits only 10-digit strings or 9-digit strings followed by “X”.title
attribute is mandatory, as indicated by its multiplicity expression [1], and has a string length constraint requiring its values to have at most 50 characters.year
attribute is mandatory and has an interval constraint, however, of a special form since the maximum is not fixed, but provided by the calendar function nextYear()
, which we implement as a utility function.Notice that the edition
attribute is not mandatory, but optional, as indicated by its multiplicity expression [0..1]. In addition to the constraints described in this list, there are the implicit range constraints defined by assigning the datatype NonEmptyString
as range to isbn
and title
, Integer
to year
, and PositiveInteger
to edition
. In our plain JavaScript approach, all these property constraints are coded in the model class within property-specific check functions.
The meaning of the design model can be illustrated by a sample data population respecting all constraints:
Table 7.1 Sample data for Book
1.Constraints are logical conditions on the data of an app. The simplest, and most important, types of constraints are property constraints and object-level constraints.
2.Constraints should be defined in the model classes of an MVC app, since they are part of their business semantics.
3.Constraints should be checked in various places of an MVC app: in the UI/view code, in model classes, and possibly in the database.
4.Software applications that include CRUD data management need to perform two kinds of bi-directional object-to-string type conversions:
a.Between the model and the UI: converting model object property values to UI widget values, and, the other way around, converting input widget values to property values. Typically, widgets are form fields that have string values.
b.Between the model and the datastore: converting model objects to storage data sets (called serialization), and, the other way around, converting storage data sets to model objects (called de-serialization). This involves converting property values to storage data values, and, the other way around, converting storage data values to property values. Typically, datastores are either JavaScript’s local storage or IndexedDB, or SQL databases, and objects have to be mapped to some form of table rows. In the case of an SQL database, this is called “Object-Relational Mapping” (ORM).
5.Do not perform any string-to-property-value conversion in the UI code. Rather, this is the business of the model code.
6.For being able to observe how an app works, or, if it does not work, where it fails, it is essential to log all critical application events, such as data retrieval, save and delete events, at least in the JavaScript console.
7.Responsive validation means that the user, while typing, gets immediate validation feedback on each input (keystroke), and when requesting to save the new data.
The support of MVC frameworks for constraint validation can be evaluated according to the following criteria. Does the framework support
If you would like to look up the answers for the following quiz questions, you can check our discussion forum112. If you don’t find an answer in the forum, you may create a post asking for an answer to a particular question.
Where in the application code should the constraints be checked? In the …
☐controller code
☐model classes
☐user interface code
☐underlying DBMS (if it supports constraint validation)
Where in the application code should the constraints be defined? In the …
☐controller code
☐model classes
☐user interface HTML5 code
☐user interface JavaScript code
How many constraints are specified by the class model shown in the diagram? Enter a number: _____
Which of the following constraints are specified by the class model?
☐A range constraint for the property name
.
☐A uniqueness constraint for the property name
.
☐A referential integrity constraint for the property name
.
☐A uniqueness constraint for the property age
.
☐A mandatory value constraint for the property name
.
☐A mandatory value constraint for the property age
.
Which of the following objects do not represent admissible instances (or, in other words, violate a constraint) of the object type Person
?