Katrina Biscay, MS-DFS, CISSP, GMON, Director, Office of Information Security, University of Cincinnati
It’s a hot day and the sun shining through the window is warming up your office. You walk over and lower the thermostat temperature, and unwittingly add a drop to the vast pool of “dark data” your university collects and stores.
“Dark data” is a concept that looks taken from a popular spy novel, but it’s much simpler. Gartner defines dark data as:
“The information assets organizations collect, process and store during regular business activities, but generally fail to use for other purposes.”
These data assets are generated by normal operation of systems in a technology environment that fit the 4Us: unknown, undigested, unstructured, and therefore unused. The ongoing trend of “Big Data” analytics is scratching the top of this data iceberg, but it is estimated that over 90% still remains untouched. This underwater colossus is the extent of your “dark data”.
Traditional “Dark Data”
Over years, organizations became digital hoarders, keeping emails, documents, presentations, meeting minutes, financial reports, old personnel records, often with multiple copies both on premise and in the cloud. These files accumulate unchecked and forgotten, long after any primary business or educational value has ended. They hide in backup tapes, old file shares, legacy systems and databases, log aggregators, and archival storage. Learning management and email systems are a wealth of “dark data” and are for the most part untapped.
Non-Traditional “Dark Data”
The majority of “dark data” is actually generated by Internet of Things devices. IoT devices - those small, internet-enabled helpers that make our lives easier at home, and huge industrial SCADA systems managing plants and power grids. Most of it is either unstructured or stored in the format ignored by traditional analysis tools, such as raw, encoded, video, and audio.
Another source is social media posts by students and surrounding community about the university and their experience with it. The complexity of language structures and communication styles present a challenge in processing this source for useful insights, but it is a reflection of your performance on an institutional scale.
Universities are competing with each other for enrollment numbers and research dollars. Deeper understanding of the student and faculty experience on campus can give a competitive advantage to those willing to invest in “dark analysis”.
"Not everything you find will yield value, but that small needle in the haystack may turn out to be made from gold"
Wireless network logs may, for example, show which communal areas the students congregate in and for how long throughout the academic year. Drawing on the knowledge, building automation controls, such as HVAC and lighting, can be adjusted to provide comfort and to realize cost efficiencies. Network bandwidth can be tailored to match the need, and predictively allocated for informal events not typically tracked by the university.
Analyzing patterns in LMS usage can improve the student academic experience by predicting which students are more likely to drop classes based on homework submission patterns. Gaps in assignment completion and grade averages can shed a light on deficiencies in study skills and impact of campus events on academics.
The embedded advantage is that this data is already possessed and stored by the organization. The only investments necessary would be targeted analysis and business correlation.
The primary and unintended consequence of keeping scores of data is the risk it creates. Frequently, organizations that do not have the current means of analyzing the data, retain it for future use in hopes of creating a business case. But not knowing what you are storing also means not knowing what needs protecting.
1. As always, there is a cost, and not just in lost opportunity for knowledge. All this data must be stored either on premise, or at a subscription cost in the cloud. Although significantly cheaper now, storage is not free. Deciding what is valuable and required to be kept is a challenge many organizations face.
2. The more you collect, the more you have to protect from both external attackers and insider threat. What appears “old” to the business, may hold tremendous value on the blackmarket or to countries with weaker economies. And the organization will still carry the liability for the breach. In case of litigation, the university may also be compelled to produce data they have long forgotten.
3. Additional cost of compliance is also a concern. Any files containing regulated information, such as PII, ePHI and PCI have to be protected to the level of ever-changing compliance frameworks. If you fail to adequately secure this data, you open up your organization to data breaches, reputational loss, and regulatory fines. And how can you secure it, if you do not know what you have? In case of GDPR, the risk can be significant due to the assumption that upon request of an individual their data can be quickly produced in a readable format or completely erased. The costs associated with fulfilling such a request can range from data processing, to physical access, to penalties for failure.
Taking the ostrich head-in-the-sand approach of managing “dark data” will not serve your organization well long term. To keep your organization safe:
1. Assess what you have – The first step is to determine how big your data iceberg really is. Identify the primary and secondary data stores on premise and in the cloud, including off-line backups. From there, categorize all the data types and sources with the focus on potential business usability. The purpose is to separate the useful data from the digital noise.
2. Enforce retention policies–Create and enforce data retention polices in compliance with applicable regulation requirements. Add a consistent and unified master set of identifiers to the data through automated “Big Data” analysis tools. Align the two to provide uniformity, accuracy, stewardship, semantic consistency, and accountability of the data assets.
3. Encrypt everything – The initial assessment and ongoing management will take time. Consider encrypting as much of the data as possible, until you are ready to deal with it. The potential for an unintended disclosure or a breach will be significantly reduced.
4. Make a plan – Once the initial assessment is complete, make a plan of managing new incoming “dark data” before it overwhelms you again.
There is little chance that the volume of “dark data” the organizations have to deal with will decrease. Don’t be afraid to dig deep, apathy or fear of information security may lead to short term cost savings, but will leave your data treasures up for grabs. The goal is to limit potential risks and costs, while gaining valuable business insight though targeted analysis. Not everything you find will yield value, but that small needle in the haystack may turn out to be made from gold.