The construction of property attribute queries is critical to your success in creating geoprocessing scripts that query data from feature classes and tables. All attribute queries that you execute against feature classes and tables will need to have the correct SQL syntax and also follow various rules depending upon the data type that you execute the queries against.
Creating the syntax for attribute queries is one of the most difficult and time-consuming tasks that you'll need to master when creating Python scripts that incorporate the use of the Select by Attributes tool. These queries are basically SQL statements along with a few idiosyncrasies that you'll need to master. If you already have a good understanding of creating queries in ArcMap or perhaps experience in creating SQL statements in other programming languages, then this will be a little easier for you. In addition to creating valid SQL statements, you also need to be aware of some specific Python syntax requirements and some data type differences that will result in a slightly altered formatting of your statements for some data types. In this recipe, you'll learn how to construct valid query syntax and understand the nuances of how different data types alter the syntax as well as some Python-specific constructs.
Initially, we're going to take a look at how queries are constructed in ArcMap, so that you can get a feel of how they are structured.
C:ArcpyBookCh7Crime_Ch7.mxd
.Notice the Select * FROM Burglary WHERE: statement on the query dialog box (shown in the following screenshot). This is a basic SQL statement that will return all the columns from the attribute table for Burglary that meet the condition that we define through the query builder. The asterisk (*) simply indicates that all fields will be returned:
Many people mistakenly assume that you can simply take a query that has been generated in this fashion and paste it into a Python script. That is not the case. There are some important differences that we'll cover next.
arcpy
:import arcpy
qry = "SVCAREA" = 'North'
Runtime error SyntaxError: can't assign to literal (<string>, line 1)
Python interprets SVCAREA and North as strings, but the equal to sign between the two is not part of the string used to set the qry
variable. There are several things we need to do to generate a syntactically correct statement for the Python interpreter.
One important thing has already been taken care of though. Each field name used in a query needs to be surrounded by double quotes. In this case, SVCAREA is the only field used in the query and it has already been enclosed by double quotes. This will always be the case when you're working with shapefiles, file geodatabases, or ArcSDE geodatabases. Here is where it gets a little confusing though. If you're working with data from a personal geodatabase, the field names will need to be enclosed by square brackets instead of double quotes, as shown in the following code example. This can certainly leads to confusion for script developers:
qry = [SVCAREA] = 'North'
Now, we need to deal with the single quotes surrounding 'North'. When querying data from fields that have a text
data type, the string being evaluated must be enclosed by quotes. If you examine the original query, you'll notice that we have in fact already enclosed North
with quotes, so everything should be fine, right? Unfortunately, it's not that simple with Python. Quotes along with a number of other characters must be escaped with a forward slash followed by the character being escaped. In this case, the escape sequence would be '
as shown in the following steps:
qry = "SVCAREA" = 'North'
qry = '"SVCAREA" = 'North''
In addition to the =
sign, which tests for equality, there are a number of additional operators that you can use with strings and numeric data, including not equal (< >
), greater than (>
), greater than or equal to (>=
), less than (<
), and less than or equal to (<=
).
Wildcard characters, including %
and _
, can also be used for shapefiles, file geodatabases, and ArcSDE geodatabases. These include %
that represents any number of characters. The LIKE
operator is often used with wildcard characters to perform partial string matching. For example, the following query would find all records with a service area that begins with N
and has any number of characters after it:
qry = '"SVCAREA" LIKE 'N%''
The (_
) underscore character can be used to represent a single character. For personal geodatabases, the (*
) asterisk is used to represent a wildcard character for any number of characters, while (?
) represents a single character.
You can also query for the absence of data, also known as NULL
values. A NULL
value is often mistaken for a value of zero, but this does not always hold true. The NULL
values indicate the absence of data, which is different from a value of zero. Null operators include IS NULL and IS NOT NULL. The following code example will find all the records where the SVCAREA
field contains no data:
qry = '"SVCAREA" IS NULL'
The final topic that we'll cover in this section are operators used to combine expressions where multiple query conditions need to be met. The AND
operator requires that both query conditions be met for the query result to be true, resulting in selected records. The OR
operator requires that at least one of the conditions be met.
The creation of syntactically correct queries is one of the most challenging aspects of programming ArcGIS with Python. However, once you understand some basic rules, it gets a little easier. In this section, we'll summarize these rules. One of the more important things to keep in mind is that field names must be enclosed with double quotes for all datasets, with the exception of personal geodatabases, which require braces surrounding field names.
There is also an AddFieldDelimiters()
function that you can use to add the correct delimiter to a field based on the datasource supplied as a parameter to the function. The syntax for this function is as follows:
AddFieldDelimiters(dataSource,field)
Additionally, most people, especially those new to programming with Python, struggle with the issue of adding single quotes to string values being evaluated by the query. In Python, quotes have to be escaped with a single forward slash followed by the quote. Using this escape sequence will ensure that Python does in fact see this as a quote rather than the end of the string.
Finally, take some time to familiarize yourself with the wildcard characters. For datasets other than personal geodatabases, you'll use the %
character for multiple characters and an underscore character for a single character. If you're using a personal geodatabase, the *
character is used to match multiple characters and the ?
character is used to match a single character. Obviously, the syntax differences between personal geodatabases and all other types of datasets can lead to some confusion.