If you work in the IT industry, there is no escape from statements about how much data we’re creating these days. According to IBM, we now create 2.5 quintillion bytes of data (10 million Blu-ray disks) daily. A few years ago, EMC estimated what we would create 44 zettabytes, which is 44 trillion gigabytes, by 2020. IDC recently upped that prediction by forecasting that the amount of data created in the digital universe would reach 163 zettabytes by 2025. Beyond the zettabyte, we can count in yottabytes (1,000 zettabytes), brontobytes (a one followed by 27 zeros) and even gegobytes (a one followed by 30 zeros).
While this is more data than we can even wrap our minds around, the trouble is that most of it is data we’re not using. This week, everyone would love to see a friend’s wedding photos on social media, but next month, those pictures are less viewed and considered cold by IT, and then we might only access them on their five- and 10-year anniversaries.
Business data behaves in a very similar way. When created, much of our data is active, or hot, but it cools over time. It can become warm again, often in regular cycles, such as when reports need to be prepared at the end of a month, quarter or year.
All this data needs to be stored somewhere, and that means big business. In fact, a Gartner report reveals that companies will spend $175 billion on data center systems in 2017. Inefficiency is a big reason for this. Since it has historically been difficult to move data, IT has to overspend on storage that will deliver performance for data’s one shining moment. For the rest of its lifetime, most of our data unnecessarily eats up expensive storage capacity.
See What You’ve Been Missing
Not only is data hard to move because it interrupts applications, but it’s also hard for enterprises to see what’s going on — especially the activity level of the data files stored on various storage systems. When there are performance or cost problems, it’s difficult to determine what data is hot or not so that steps can be taken to fix them. As many enterprises are now seeking to migrate to the cloud in order to save costs and increase agility, it’s also becoming critical for enterprises to identify what data can be safely archived. Generally, cloud storage is slower than most on-premise options today (which is a key reason why it costs less), so IT wants to make sure the data going to the cloud isn’t going to interrupt any key business cycles.
Knowing anything about data starts with metadata — the data about your data. This includes information about the size of a file, its location, when it was last opened, by whom, when it was last written and so on. One way to do this that we are working on is a metadata engine, which is a piece of software that collects this information to give IT the visibility on its data’s activity, revealing which data is active and therefore necessary or inactive and can be moved to a colder storage resource like the cloud.
In addition to a metadata engine, metadata is increasingly being used for insights in the big data and BI software segments. In those cases, big data and BI applications gain access to metadata for data mining. There are also companies out there that offer the ability to tier data on different types of storage media (flash drive or disk) within their own storage products. So, as an alternative to using metadata to manage data across different storage systems, enterprises could look into adopting storage tiering products from a single vendor, as some large storage vendors provide the ability to move data across their products.
Automatically Tier The Right Data To The Right Resource
Beyond making data visible, putting this knowledge to work is the second half of solving the challenge of getting the right data to the right place at the right time. In addition to providing granular visibility into data activity, a metadata engine can also give IT the ability to automatically move data without application interruption. This helps IT overcome the pain of manual data migrations and transition to automatically tiering data across different resources in the datacenter throughout the data’s lifetime. Data virtualization makes this possible by separating the data path from the control path so that IT can manage data according to objectives for performance, protection and cost. With the intelligence and insights gained by metadata analysis, cool data that is needed regularly for reports can automatically move back to a faster storage resource to be ready and waiting when it’s time to crunch numbers.
As an executive, I know that the one problem you can’t solve is the one you don’t know about. Managing by metadata gives unprecedented clarity and visibility about data activity. This intelligence enables IT to automatically align storage capabilities with evolving business needs while keeping costs in check, even as data sprawls into the stratosphere of brontobytes.