Big Data

The realities of polyglot persistence in mainstream enterprises

Ritesh Ramesh describes how NoSQL and Hadoop get used in retail environments.

Solving a familiar e-commerce search problem with a NoSQL document store

Mark Unak and Sanjay Agarwal explain how document stores can help deliver precise e-commerce catalog search results.

Security at the level of key-value pairs in a NoSQL database

Adam Fuchs of Sqrrl describes the benefits of data-centric security analytics.

How NoSQL is changing enterprise data management

Oliver Halter discusses how CEOs and CIOs are being forced to tolerate some data inconsistency.

Scaling online ad innovations with the help of a NoSQL wide-column database

Vaibhav Puranik and Ken Weiner of GumGum discuss the challenges and benefits of open source databases for in-image advertising.

How enterprise graph databases are maturing

Martin Van Ryswyk and Marko Rodriguez of DataStax explore the challenges and benefits of big data analytics with graphs.

Filling in the gaps in NoSQL document stores and data lakes

Matthias Brantner describes the role database virtualization and a business-user query interface can play in heterogeneous environments.

Creating a body language of online learning with graph databases

Sean York of Pearson discusses how graph technology becomes a medium for enriching online environments.

Database futures: How Apache Spark fits in to a larger unified data architecture

Mike Franklin of the University of California, Berkeley, discusses the goals behind Spark and a more unified cloud-data ecosystem.

Creating a big data canvas with NoSQL

Tom Foth describes how analytics platforms can benefit from a blend of database types.

Intelligent, context-aware data and analytics technologies widen the decision-making aperture

These four innovations can help companies achieve an optimal mind-machine balance when it business decision making.

Why machines need humans to learn

Devices get smarter all the time, but to become truly intelligent, humans must create the models for gadgets to “learn” from.

Demystifying machine learning part 4: Image and video applications

Some firms are using machine learning to process large amounts of unstructured data, but it’s not widespread—yet.

How NoSQL key-value and wide-column stores make in-image advertising possible

Online ad innovators must process hundreds of terabytes a day at the lowest possible cost. How do they do it?

The promise of graph databases in public health

One of the main advantages of a NoSQL graph store is web-scale discovery. The graph store is one of many innovations creating a sea change in database technology: explore the promise and upheaval caused by these new technologies.

5 leading practices for data lakes

The right approach to data lakes is essential to ensure you get the most useful insights from your massive data stores.

In database evolution, two directions of development are better than one

NoSQL database technology is maturing, but the newest Apache analytics stacks have triggered another wave of database innovation.

The data lake – No longer a pipe dream for today’s enterprises

How data lakes can help reduce costs, increase efficiency, and boost innovation in the enterprise.

Demystifying machine learning: Part 3 – Exploring deep learning

What exactly is “deep learning” and what accounts for its rapid rise in popularity and media coverage?

The rise of immutable data stores

Some innovators are abandoning long-held database principles. Why?

Using document stores in business model transformation

Healthcare providers are finding they need data collection and analysis capabilities that are different from those that relational databases deliver.

Enterprises hedge their bets with NoSQL databases

Without a scalable data architecture, the customer experience suffers. Imagine you’re a retailer offering tens of thousands of products online. You have rich descriptions that include numerous attributes for each product. In standard relational databases, these attributes exist in silos, are poorly described, and cannot be indexed for maximum usefulness. So, if you’re using only a standard relational database and a conventional enterprise search engine, customers who search “17-inch laptop” will retrieve many false positive results that aren’t laptops. NoSQL1 document databases provide the capability to address this problem. With the help of data tagged in Extensible Markup Language (XML) or structured JavaScript Object Notation (JSON), NoSQL offerings such as MarkLogic and MongoDB enable more refined indexing by attribute. Dozens, hundreds, or even thousands of product attributes can serve as search filters (or facets) that a query engine such as XQuery can use to deliver far more relevant results. “Usually the product has features that can easily be attributed with standard values,” says Mark Unak, CTO of Codifyd, a consultancy that helps clients to optimize their use of e-commerce website data. For example, a customer can filter on the brand name of the laptop, and the results will include only …

The capabilities and limitations of video analytics

Video analytics promise to help retailers better understand customers. Here are three issues to keep in mind.

5 growing pains for Chief Data Science Officers

These best practices can help Chief Data Science Officers demonstrate their value and achieve success.

Data lakes and the promise of unsiloed data

Data lakes that can scale at the pace of the cloud remove integration barriers and clear a path for more timely and informed business decisions.

Do you need a Chief Data Scientist?

How to determine if your company needs a chief data scientist to help improve its business decision making.

Interview: Will data lake advocates repeat the mistakes of data warehousing?

  PwC’s Technology Forecast recently addressed the topic of data lakes. The coverage included research and interviews on data lakes and how they can help enterprises remove integration barriers and clear a path for more timely and informed business decisions. To continue the discussion and look at some of the challenges enterprises can face in implementing a shift to data lakes, we are sharing an excerpt of a conversation between Technology Forecast’s Alan Morrison and Terry Retter, president of small business consultancy BrightZone, in Reno, Nevada; a former VP/CIO of Grubb & Ellis, and a PwC alumnus. AM:  Terry, you were a CIO. Some companies say they’ve created a data lake. In reality, they’ve built a single-purpose sandbox. How can CIOs get their organizations to commit to the strategic, long-term vision of a true data lake? TR: By dealing with real problems and real users.  They should focus on a service or a perception problem among customers they must resolve to avoid losing profits or market share. They should start small, but think big, in data lake terms. They shouldn’t collect data just around a single process. Instead, they should gather everything they can think of while using the lake at first to solve a particular problem. …

The future of collaboration: Large-scale visualization

Why large-scale visualization may be the key to success for improving business decision making with data analytics.

The high value of advanced data visualization

Compelling visualizations are necessary for big data to make an impact on enterprise decision making.

The future of big data: Data lakes

Data lakes pose a number of opportunities—and challenges—for companies looking to make best use of their big data.

Finding a home for the Chief Data Science Officer

Where the chief data science officer fits in your org chart depends on your analytics strategy. Here are some models to consider.

The End of Data Standardization

We can no longer deny the drive to diversify data management technology that began in the mid-90s. The aspiration to achieve one single and simple database management system has died. I grew up with the advent of commercial relational databases in the late 80s and early 90s. At the time, the promise was clear: you could store everything in a relational database that was carefully modeled and expandable. And in doing so, you acquired the ability to access, govern and securely manage every bit of data in a single technology environment. Most companies decided on a relational database standard and ported some or all of their applications towards that single database backend. All the principles of good architecture – including cost and skill optimization played out – until they didn’t. All seemed swimming until one of my clients – a major European railway operator – wanted to geo code every bit of equipment and every centimeter of their railway network. As hard as we tried, we couldn’t meet the client’s demands well with a relational database. The advent of spatial data management systems came to the rescue. Questions like ‘What is the total book value of all assets deployed within …

The 5 Dimensions of the So-Called Data Scientist

What is “data science”? Is it really a new emerging discipline as some claim it to be; or is it the emperor in new clothes – data mining, statistics, business intelligence or analytics re-branded? Moreover, is it possible that one person can fulfil the role of a data scientist? Rather than answering this question directly, let’s review some of the skills required for someone to be a “data scientist.” First and foremost, a “data scientist” is a business or domain expert: Someone who has to have the ability to articulate how information, insights, and analytics can help business leadership answer key questions – and even determine which questions need answering – and make appropriate decisions. The data scientist will need a thorough understanding of the business across the value chain (from marketing, sales, distribution, operations, pricing, products, finance, risk, etc.) to do this well. Second, a “data scientist” is a statistics expert: Someone who has to have the ability to determine the most appropriate statistical techniques for addressing different classes of problems, apply the relevant techniques, and translate the results and generate insights in such a way that the businesses can understand the value. This will be predicated on a …

Mining Customer Insights with Speech-to-Text Technology

From touch and gesture interfaces to advanced facial recognition, our computers are communicating with us on an increasingly human level. One technology that is showing particular promise is a computer’s ability to recognize human speech or Speech-to-Text (STT). Applications such as Apple’s Siri, Google Now, and Nuance’s Dragon have brought voice-activated commands to the masses while enterprise companies are employing the technology to discover new insights from previously untapped audio and video data sources. One of the greatest benefits of STT is the ability to bridge the gap between unstructured audio/video data and advanced analytics such as machine learning, natural language processing (NLP), and graph analysis. A company’s ability to understand their most vocal customers, whether within their call centers or on video sharing sites, can lead to a better view of customers and their experiences. Call center logs can reveal interesting patterns and trends in the quality of customer agent call handling and (when combined with other data) call center operational costs. These insights could then be used to retrain customer service agents, identify and stop a poorly conceived marketing campaign, or quickly understand the root cause for a spike in call center volume. For example, PwC’s Emerging Tech …

The Potential of Context Aware Computing

Imagine driving home late at night after work, an inattentive driver plows into the side of your vehicle. Before your wheels stop spinning, a sensor in your vehicle recognizes the severity of the impact, contacts 911 so emergency vehicles can be dispatched. While the ambulance is in route, your medical records and insurance information are communicated to the receiving hospital, your driver’s history is forwarded to the police, your auto insurance company has been notified, and your vital signs are sent from your body sensor to the approaching rescue vehicle. The EMTs know exactly how to treat you even before they arrive at the crash site and the police have contacted your family to let them know you have been in an accident. Does this sound like science fiction? Actually, today some of it is already happening and some of it is coming soon. Welcome to your new “best friend” – context aware computing. Context aware computing refers to a style of computing in which situational and environmental information is used to anticipate immediate needs and proactively offer enriched, situation-aware responses. Instead of being a singular technology, it exists as a result of combining four disruptive technologies that are reaching …


Chris Curran

Principal and Chief Technologist, PwC US Tel: +1 (214) 754 5055 Email

Vicki Huff Eckert

Global New Business & Innovation Leader Tel: +1 (650) 387 4956 Email

Pierre-Alain Sur

US Technology Industry Leader Tel: +1 (646) 471 6973 Email