August 31, 2016
Ritesh Ramesh is a chief technologist for the global data and analytics organization at PwC.
Ritesh Ramesh describes how NoSQL and Hadoop get used in retail environments.
PwC: You’ve been the technical lead on a number of big data projects at retailers. What’s a typical database challenge they’re encountering?
Ritesh Ramesh: Some clients run both traditional and nontraditional databases, and they use the Hadoop and NoSQL database for ingestion and pre-processing. Their customer analytics may run on a traditional database management system—for financial reporting to the CFO on sales, for instance. And they might have a NoSQL solution or perhaps Hadoop [an open-source software framework], which they use to acquire and process clickstream data. These clients do a lot of pre-processing on their clickstream data. They’ve learned that a traditional database cannot handle their typical daily acquisition of tens of gigabytes of files, rising to terabytes every month.
For other retailers, recommendation engines and the personalization of websites are classic reasons to use NoSQL databases. Everyone wants to personalize customer-facing portals and other interfaces for specific customers. I’ve seen clients use NoSQL when their websites are critical to their business models.
In addition, a lot of people are using Hadoop and NoSQL for innovation pilots. For example, someone wants to build a mobile app with a new organization or wants to do something edgy, and they don’t want to work with a traditional relational-database management system and database administrators. They do these pilots and, before you know it, they’re using NoSQL for some customer-facing apps.
I don’t really see NoSQL being used directly for enterprise reporting and dashboards. I think the traditional databases that host enterprise data warehouses are going to be used for that.
PwC: What are some of the other advantages of NoSQL in that context?
Ritesh Ramesh: The fact that you can start with a data model that doesn’t have a schema increases your flexibility during app-design iterations. That cuts your time to market. Developers typically work in an object-oriented environment when they’re trying to build an application, whether it’s a mobile or online app. NoSQL schema flexibility also aligns nicely with object orientation. As a result, you can build powerful customer-facing applications a lot faster.
It’s really just a case of simplicity with NoSQL. For example, NoSQL doesn’t need a normalized data model, which makes it possible for developers to focus on building solutions and to own the process of storing and retrieving data themselves. On top of that, NoSQL emerged in the cloud computing era, so most NoSQL options are cloud-ready.
PwC: Can you give us an example?
Ritesh Ramesh: Sure. Let’s say I’m developing an e-commerce site, and some e-commerce sites might want customers to have the option of receiving their receipts by e-mail.
In that case, NoSQL lends itself to what I call storing data by aggregate points. If a retailer is using a key-value or document database, that retailer just needs some kind of identifier to manage the data, or simply the customer’s name. Once that’s done, the retailer quickly gets a complete receipt for the order in one chunk. Then it’s easy to send that receipt to the customer’s mobile device, or by e-mail to wherever.
Now, in the traditional world, that same data will be modeled in 20 to 25 tables, which is good if you need to slice and dice the data. For slicing and dicing, it’s probably better to start with a relational model. But a key-value or document database is best for pushing out entire-order receipts.
PwC: What’s your view of the new database environment that’s emerging?
Ritesh Ramesh: It will be a polyglot environment going forward. Clients will need a tightly integrated heterogeneous set of both emerging and traditional technology components to manage all types of internal and external data. NoSQL is not going to be this alien technology coming into the enterprise and then destroying everything else. People will be forced to manage a hybrid environment. They won’t say, “Oh, I’m going to standardize my enterprise data warehouse on NoSQL or Hadoop.” That’s not going to happen.
That’s why I say that NoSQL is likely better for operational applications. For example, companies that have brick-and-mortar stores and an online e-commerce presence want their offline and online data in the same place. NoSQL can be used as a back-end operational data store to funnel in all their point-of-sale data from their stores, together with all the purchasing data from their websites. They can also do this at a very low cost. NoSQL brings the price point down so companies can scale up their operations. When that happens, a company’s price-to-performance ratio decreases over the long term.
PwC: Some people we’ve interviewed question NoSQL’s consistency.
Ritesh Ramesh: If you think about it, NoSQL provides you with implicit transactional consistency. In the retail example I mentioned earlier, if I use an OLTP [online transaction processing] database, such as Oracle, the result will be data spread across several tables. So when I write the transaction, I must be worried about who might be trying to read it while I’m writing it.
“In the case of NoSQL, consistency is implicit, because you’ll use the order ID as a key to write the entire receipt into the associated value field. It’s almost like you’re writing the transaction in a single action.”
In the case of NoSQL, consistency is implicit, because you’ll use the order ID as a key to write the entire receipt into the associated value field. It’s almost like you’re writing the transaction in a single action, because that’s how you’re going to retrieve it. It’s already ID’d and date-stamped.
Enterprises will soon figure out that NoSQL delivers implicit consistency. In Cassandra, for instance, it’s very unlikely that you’ll write an order receipt in six different places. Instead, that receipt will occupy only one place.
PwC: What about BI [business intelligence] and NoSQL?
Ritesh Ramesh: I ask for the client’s definition of BI and analytics before we take on any BI strategy project. We see a trend that BI requirements at the business function level now range across a whole spectrum of sophistication. Our team even created taxonomy for BI by business role. We said BI for digital marketing means these things, BI for store operations means these things, and BI for supply chain means these things.
Different roles require different definitions, of course. So it’s no surprise that a hybrid SQL [structured query language]/NoSQL environment is often the outcome of a BI strategy project. The managers of store operations, responsible for keeping track of inventory, ask that only real-time information in specific formats be sent to their tablets. They don’t need or want the rest of the BI data, because they don’t have time to deal with it. NoSQL is a great solution for these managers. Operational NoSQL solutions are becoming an efficient way to enable access to near-real-time information for internal and external stakeholders.
PwC: What role is NoSQL playing in the data-integration challenge at retailers? What is the NoSQL access strategy without a query language that is standard across NoSQL technologies?
Ritesh Ramesh: Data access in NoSQL is often through an API [application programming interface]. If done the right way, it works well. When companies are using NoSQL to run their websites or mobile services, they just put in the data. APIs give you flexibility. Some APIs can have a customer ID with seven columns. Other APIs can specify a customer ID with six columns, or whatever meets the business need. So NoSQL access through the API is probably a good way to go, compared to the way things are done with a traditional relational database—especially for people who use NoSQL for data integration.