Taking a New Approach to Unstructured Data Management

Written by: Steven Hill - Senior Analyst, Applied Infrastructure and Storage Technologies – 451 Research

Enterprise storage has never been easy. Business depends on data—and all things data begin and end at storage—but the way we handle data in general, and unstructured data in particular, hasn’t really evolved at the same pace as other segments of the IT industry. Sure, we’ve made storage substantially faster and higher capacity, but we haven’t dealt with the real problems of storage growth caused by this increased performance and density; much less the challenges of managing data growth that’s now spanning multiple, hybrid storage environments across the world. The truth is, you can’t control what you can’t see; and as a result, a growing number of businesses are paying a great deal of money to store multiple copies of the same data over and over. Or perhaps even worse, keeping multiple versions of that same data without any references between them at all.

This massive data fragmentation between multiple storage platforms can be one of the major sources of unchecked storage growth; and added to that are the new risks of a “keep everything” approach to data management. Privacy-based initiatives like GDPR in the EU and California’s CCPA-2018 require a complete reevaluation of storage policies across many vertical markets to ensure compliance with these new regulations for securing, protecting, delivering, redacting, anonymizing and authenticating the deletion of data containing personally identifiable information (PII) on demand. While this can be a more manageable problem for database information, it’s a far greater challenge for unstructured data such as documents, video and images that make up a growing majority of enterprise data storage. Without some form of identification this data goes “dark” soon after it leaves the direct control of its creator, and initiatives like GDPR don’t make a distinction between structured and unstructured data.

There can be a number of perfectly good reasons for maintaining similar or matching data sets at multiple locations, such as data protection or increased availability. The real challenge lies in being able to maintain policy-based control of that data regardless of physical location, while at the same time making it available to the right people for the right reasons. Documents and media such as images, audio and video are making up a growing percentage of overall business data, and companies have a vested interest in making continued use of that data. But at the same time, there can be serious legal ramifications for not managing all this data properly that could potentially cost companies millions.

The cloud has changed the IT delivery model forever; and with a hybrid infrastructure, business IT is no longer limited by space, power and capital investment. The decisions regarding workload and data placement can now be based on the best combination of business needs, economics, performance and availability rather than by location alone; but with that freedom comes a need to extend data visibility, governance and policy to data wherever it may be. In this context, the problems of data fragmentation across multiple systems are almost inevitable; so, it really comes down to accepting this as a new challenge and adopting next-generation storage management based on an understanding of what our data is, rather than where it is.

Mass data fragmentation is a problem that existed before the cloud, but fortunately the technology needed to fix this is already available. From an unstructured data perspective, we believe this involves embracing a modern approach that can span data silos for backups, archives, file shares, testing and development data sets and object stores on that bridges on-premises, public cloud and at the edge. A platform-based approach can help to give you visibility into your data, wherever that data resides, and more importantly, can help you maintain greater control by reducing the number of data copies, managing storage costs, and ensuring your data stays in compliance and backed up properly. We also think an ideal solution seamlessly blends legacy, file-based storage with the management flexibility and scalability offered by metadata-based object storage. This requires a fundamental shift in the way we’ve addressed unstructured data management in the past; but it’s a change that offers the benefits of greater data availability and storage-level automation and provides a new set of options for controlling and protecting business data that’s both a major business asset and a potential liability if not handled correctly. 
1193 Hits

Creating Value with an Enterprise Data Bazaar

Lead researcher: Katy Ring, Research Director – IT Services

At 451 Research, we believe ‘the enterprise data bazaar’ can help organizations that aim to become more agile by using data to inform the direction and development of their businesses. The phrase ‘enterprise data bazaar’ is a term used to define an environment where many people can access and leverage that information to build data-driven products.

To achieve this, businesses need unified data management layers, so that data scientists and subject matter experts can decide how to deal with the stored data. These layers enable the use of datasets – or a data lake – to provide value without siloing information within the organization. However, many organizations have ended up with what could be described as a ‘data swamp’ – a single environment housing large volumes of raw data that cannot be easily accessed for any purpose, let alone multiple uses. Creating a data bazaar with these management layers would break apart the swamp by putting security at the foundation of this approach by building out the data governance and self-service data preparation functionalities.

When speaking to our clients that have data lakes, many are struck with the realization that they did not fully comprehend the risks associated with what they have built. Companies struggle to audit their lakes as part of compliance measures since each source system has difference governance and security policies. This struggle is caused by the self-service nature of a data lake, where data can be access for nearly any purpose, making it unclear is a company has protected PII data as part of regulations like GDPR.  

When companies are in this scenario, vendors and service providers are opening an internal role for a chief data officer (CDO) that can help get the business back on track. Together, this group can work out a remedy for the situation. One solution is to build a “sandbox” environment that includes company-wide policy, controls and metadata management with a ‘citizen’ data integrator tool which allows the user to give back or develop analytics on how they are using the data. With this type of tool, users can still access data in a self-service way and allow that access to be overseen by the IT group or CDO before it moves to production as a data product.

In addition to this self-service ‘sandbox’ data preparation layer, IT service providers can help companies with data governance and the data supply chain. Such providers assist in sourcing, managing and enriching the data, and sell managed services for policing data consumption. For example, in an audit, organizations need to know the data they hold, who uses it and what for. This regulation provides a strong opportunity for developing the enterprise data bazaar.

Furthermore, the self-service analytics and governance layers need to be architected the right way to enable a range of use cases over time, and this is often not what results from the development of a single-use-case project. Therefore a CDO role is so very important: this individual is the internal champion with authority to get agreement on a company-wide strategy for the capture, management and sharing of data.

Katy Ring, research director of IT services at 451 Research, examines the benefits of enterprise data bazaar, the technologies, service providers and strategies used to enable them in her Technology and Business Impact report on the Enterprise Data Bazaar. Learn more about this report.
1524 Hits

Assessing the Impact of Data Science on the Analytics Landscape

The evolution of data science – including machine learning, deep learning, and other forms of artificial intelligence – has had a significant impact on the data analytics landscape in recent years, and looks set to drive considerable change in the market in the coming years.

In response, 451 Research’s new Data Management and Analytics Market Map 2018 includes a complete re-categorization of our Analytics Market Map to reflect the realities of analytics users and use cases today, dividing the analytics market into four key sectors:
  • Analytics tools
  • Analytics platforms
  • Data science tools
  • Data science platforms
During this webinar, 451 Research’s Research Director, Data Platforms and Analytics, Matt Aslett, will explain the rationale and definitions behind the new categorization, as well as identifying the key challenges and innovations that will shape the analytics and data science market, and also revenue and growth expectations.

The webinar will also touch on other aspects of note delivered with the Data Management and Analytics Market Map 2018, including:
  • The ongoing evolution of Corporate Performance Management
  • The addition of Data science Management to the Data Management Market Map
2974 Hits