Security at the level of key-value pairs in a NoSQL database

August 31, 2016



Adam Fuchs is CTO of Sqrrl.

Adam Fuchs is CTO of Sqrrl.

Adam Fuchs of Sqrrl describes the benefits of data-centric security analytics.


PwC: What does Sqrrl do?

Adam Fuchs: We are a big data analytics company focused on cybersecurity investigations. We came out of the intelligence community where we were looking at a huge variety of big data applications, all of which had multilevel security concerns. We encountered a lot of security requirements common to other industries, such as healthcare—which has HIPAA [Health Insurance Portability and Accountability Act] and other restrictions on data use—or banking—which has various data privacy requirements, data sharing agreements, and privacy policies.

All of these things restrict how data can be used, and it’s a challenge to perform analysis across many large data sets. The more data sets, the more complex the policy becomes in many cases.

So we’re trying to provide an element of data-centric security that still allows our analytics solution to scale up to the petabyte range and across thousands of server nodes—and still allows users to ask a broad variety of questions on top of it. Given that we have the security restriction and that we’re trying to scale, we still want to be able to search, aggregate, graph, and perform other kinds of analytics.

PwC: And the Sqrrl platform works on top of a NoSQL wide-column store?

Adam Fuchs: It works with Apache Accumulo, a clone of BigTable. We have a series of layers, some of which are open source. At the bottom we have HDFS [the Hadoop Distributed File System]. On top of HDFS sits Accumulo. It’s open source up to that point.

PwC: Then the Sqrrl platform sits on top of that?

Adam Fuchs: Then Sqrrl software sits on top of that and provides linked data analysis capabilities, which enable analysts to find patterns and trends hidden in data sets. Although we do a lot of Accumulo development and we provide some support of Accumulo in operational instances, our company is really tailored to sell the Sqrrl product.

PwC: How does the access control work?

“Across a large data set, there could be hundreds of trillions of key-value pairs. Each one has a label that’s derived from the provenance of that data. That provenance allows us to determine who can access at query time.”

Adam Fuchs: Across a large data set, there could be hundreds of trillions of key-value pairs. Each one has a label that’s derived from the provenance of that data. That provenance allows us to determine who can access at query time. We try to make that security filtering very efficient.

Text search also ties into Accumulo’s key-value pair.

PwC: And then Sqrrl offers other modes or data models in addition to the wide-column mode, yes?

Adam Fuchs: That’s correct. The whole package leverages a multimodal database. We have document store capabilities. So we can do JSON [JavaScript Object Notation] input and output, and we can dynamically update documents. We do a form of online aggregation that is essentially an aggregated, persistent view, but it’s supported inside of the key-value store.

Then there’s the graph structure and the ability to do graph analytics. The graph structure and the document structure are both built on top of that key-value store. And there’s also our visualization layer that lets an analyst access these search techniques with point and click functionality.

PwC: What’s an example of how Sqrrl might be used inside an enterprise?

Adam Fuchs: Sqrrl is used for cybersecurity investigations. An investigation could be preventive, such as when an analyst proactively examines high-risk users or assets and looks for suspicious activity associated with them. Or, an investigation could occur after an incident and focus on finding the root cause of the incident. For these types of investigations, Sqrrl is ingesting very large, disparate cybersecurity data sets, such as NetFlow, log files, threat intelligence, e-mails, and even HR information.

Sqrrl fuses this information together under a common data model, and analysts use our solution to look for patterns in the data. However, when we start working with these different data sets we naturally start running into privacy issues, because some of these data sets contain sensitive data, such as personally identifiable information [PII], financial data, or trade secrets. Data-centric security comes into play here, as we can control access to specific pieces of data in a very fine-grained way.

PwC: How do you support different user groups?

Adam Fuchs: The users of our tool include both front-line analysts in a security operations center and more advanced security investigators and incident handlers. Often, certain sensitive types of data are not available to the front-line analysts, but the more advanced investigators would be able to see all the data.

PwC: What are some of the specific analytics capabilities?

Adam Fuchs: Accumulo has a pretty abstract interface, a low-level interface. We have extended Accumulo to provide more advanced discovery analytics. Search is in there, and we have a subset of SQL to do transformations and aggregations distributed throughout the cluster, and then some graph analytics to support subgraph extraction. We also have added some machine learning capabilities to help analysts auto-detect specific portions of a subgraph that are statistically anomalous.

PwC: How does your visualization work?

Adam Fuchs: Sqrrl’s primary visualization organizes data into connected nodes and edges via a linked data property graph. This visualization technique goes beyond basic histograms and bar charts and aims to present data with high dimensionality in a compact way. Using linked data diagrams, an analyst can quickly assess what are important clusters of data to focus on.




Chris Curran

Principal and Chief Technologist, PwC US Tel: +1 (214) 754 5055 Email

Vicki Huff Eckert

Global New Business & Innovation Leader Tel: +1 (650) 387 4956 Email

Mark McCaffery

US Technology, Media and Telecommunications (TMT) Leader Tel: +1 (408) 817 4199 Email