Accumulo offers the ability to apply cell visibility labels for each unique key/value in a table, which is arguably its most distinguishing feature from other BigTable implementations. This recipe will demonstrate one way to apply cell-level security. The code in this recipe will write several mutations that can only be scanned and read with the proper authorizations.
This recipe will be the easiest to test over a pseudo-distributed Hadoop cluster with Accumulo 1.4.1 and Zookeeper 3.3.3 installed. The shell script in this recipe assumes that Zookeeper is running on the host localhost
and on the port 2181
; you can change this to suit your environment needs. The Accumulo installation's bin
folder needs to be on your environment path.
For this recipe you'll need to create an Accumulo instance named test
with user as root
and password as password
.
You will need a table by the name acled
to exist in the configured Accumulo instance. If you have an existing table by that name from an earlier recipe, delete, and recreate it.
It is also highly recommended that you go through the Using MapReduce to bulk import geographic event data into Accumulo recipe earlier in this chapter. This will give you some sample data to experiment with.
The following are the steps to read/write data to Accumulo using cell visibility controls:
accumulo-examples.jar
.example.accumulo
and add the class SecurityScanMain.java
with the following content:package examples.accumulo; import org.apache.accumulo.core.client.*; import org.apache.accumulo.core.data.Key; import org.apache.accumulo.core.data.Mutation; import org.apache.accumulo.core.data.Value; import org.apache.accumulo.core.security.Authorizations; import org.apache.accumulo.core.security.ColumnVisibility; import org.apache.hadoop.io.Text; import java.util.Map; public class SecurityScanMain { public static final long MAX_MEMORY= 10000L; public static final long MAX_LATENCY=1000L; public static final int MAX_WRITE_THREADS = 4; public static final String TEST_TABLE = "acled"; public static final Text COLUMN_FAMILY = new Text("cf"); public static final Text THREAT_QUAL = new Text("trt_lvl"); public static void main(String[] args)throws Exception { if(args.length < 4) { System.err.println("usage: <instance name> <user> <password> <zookeepers>"); System.exit(0); } String instanceName = args[0]; String user = args[1]; String pass = args[2]; String zooQuorum = args[3];
Connector
instance for our user
or pass
variable to the test
Accumulo instance.ZooKeeperInstance ins = new ZooKeeperInstance(instanceName, zooQuorum); Connector connector = ins.getConnector(user, pass); if(!connector.tableOperations().exists(TEST_TABLE)) connector.tableOperations().create(TEST_TABLE);
Authorizations allowedAuths = connector.securityOperations().getUserAuthorizations(user); BatchWriter writer = connector.createBatchWriter(TEST_TABLE, MAX_MEMORY, MAX_LATENCY, MAX_WRITE_THREADS);
Mutation m1 = new Mutation(new Text("eventA")); m1.put(COLUMN_FAMILY, THREAT_QUAL, new ColumnVisibility("(p1|p2|p3)"), new Value("moderate".getBytes())); Mutation m2 = new Mutation(new Text("eventB")); m2.put(COLUMN_FAMILY, THREAT_QUAL, new ColumnVisibility("(p4|p5)"), new Value("severe".getBytes())); writer.addMutation(m1); writer.addMutation(m2); writer.close();
threat
.Scanner scanner = connector.createScanner(TEST_TABLE, allowedAuths); scanner.fetchColumn(COLUMN_FAMILY, THREAT_QUAL); boolean found = false; for(Map.Entry<Key, Value> item: scanner) { System.out.println("Scan found: " + item.getKey().getRow().toString() + " threat level: " + item.getValue().toString()); found = true; }
if(!found) System.out.println("No threat levels are visible with your current user auths: " + allowedAuths.serialize()); } }
accumulo-examples.jar
.accumulo-examples.jar
is located, create a new shell script named run_security_auth_scan.sh
with the following commands. Be sure to change ACCUMULO-LIB
, HADOOP_LIB
, and ZOOKEEPER_LIB
to match your local paths.ACCUMULO_LIB=/opt/cloud/accumulo-1.4.1/lib/* HADOOP_LIB=/Applications/hadoop-0.20.2-cdh3u1/*:/Applications/hadoop-0.20.2-cdh3u1/lib/* ZOOKEEPER_LIB=/opt/cloud/zookeeper-3.4.2/* java -cp $ACCUMULO_LIB:$HADOOP_LIB:$ZOOKEEPER_LIB:accumulo-examples.jar examples.accumulo.SecurityScanMain test root password localhost:2181
no threat levels are visible with your current user auths:
accumulo shell –u root –p password
setauths
command to see a list of options.$ root@test> setauths
$ root@test> setauths –s p1
run_security_auth_scan.sh
.Scan found: eventA threat level: moderate
$ root@test> setauths -s p1,p4
run_security_auth_scan.sh
.Scan found: eventA threat level: moderate Scan found: eventB threat level: severe
The class SecurityScanMain
reads the required arguments to connect to Accumulo and instantiates a BatchWriter
instance to write out test data to the acled
table. We write two mutations to the table. The first is for rowID eventA
and the column visibility expression (p1|p2|p3
). The second is for rowID eventB
and the column visibility (p4|p5
). The column visibility expressions are very simple Boolean expressions. Before a scan can occur over an Accumulo table, the client must supply authorization tokens for the connected user. Accumulo will compare the given tokens against the column visibility label on each key to determine visibility for that user over the given key/value. The expression (p1|p2|p3
) implies that a scanner reading the key must present an Authorizations
object that supplies p1
, p2
, or p3
. By default, the root user does not have any scanning authorization tokens. The call to the getUserAuthorizations(user)
method on the connector currently returns no authorization tokens. To view eventA
, we need to
present p1
, p2
, or p3
; none of which are currently listed for the root user. To view eventB
, we need to present p4
or p5
; which the root user also does not have. Once we go into the shell and add p1
for the root user, our scan will present the authorization p1
and find a successful Boolean match to eventA
. Once we set the scan tokens for the root user to p1,p4
, we can view both eventA
and eventB
.
Cell visibility is a feature with more complexity than you might think. Here are some things to know about cell security in Accumulo:
Authorization tokens restrict what users can see during scans, but not what column visibility expressions they can write on mutations.
This is the default behavior and, for many systems, is undesirable. If you would like to enforce this policy in your Accumulo installation, you can add the Constraint
class implementation org.apache.accumulo.core.security.VisibilityConstraint
as a system-wide constraint. Once applied to the Accumulo installation, users will be barred from writing mutations containing column visibility labels they themselves are not authorized to read.
Different keys containing the exact same rowID, column-family, and qualifier may have different ColumnVisibility
labels. If the most recent timestamped version of a key contains a ColumnVisibility
key that is not viewable by the current scan, the user will see the next oldest version of that key for which a column visibility token matches, or none if they are not authorized to see any of the versions.
The normal scanning logic for key/value presentation has the scanner returning the most recent version of a given key. The cell visibility system adjusts that logic with one additional condition. The scanner will return the most recently timestamped version of a given key that matches the supplied authorization tokens.
This recipe shows two very simple disjunction examples of the ColumnVisibilty
Boolean expression. You can apply more complicated expressions, should your application require them. For example, (((A & B)|C) & D) would match for authorizations that supplied the label D and either label C or labels A and B.