Covering Scientific & Technical AI

Untamed Data Is Undermining the AI Revolution

by Krishna Subramanian |
December 17, 2025

Across industries, organizations are drowning in unstructured data: files, videos, images, chat logs, design documents, and other digital debris that defy easy categorization. Analysts estimate that unstructured data accounts for up to 80 percent of enterprise information, yet most organizations have little idea what’s in it, who owns it, or how sensitive it may be. That ignorance is not benign; it’s costly, risky, and holding back progress in AI and analytics.

Recent research from Komprise underscores this gap. Nearly 60 percent of enterprise IT leaders cite unstructured data classification as a major technical barrier to scaling AI. On the business side, 62 percent say their top unstructured data challenge is reducing data risk from AI. Both concerns point to the same root issue: without effective data classification, organizations can’t safely or efficiently use what they already have.

(fullvector/Shutterstock)

Classification, the process of tagging, categorizing, and labeling data based on content, organizational context, sensitivity, or purpose, sounds like a simple administrative task. In practice, it’s a foundational capability that determines how well an organization can leverage its most valuable digital asset. It is inherently more difficult to do on unstructured data which isn’t well understood, organized, or with inherent context like structured data. Plus, most organizations today are managing 5PB+ of unstructured data, which can easily be five billion plus files, according to Komprise research. This makes manual approaches untenable at scale.

Why Classification Matters

At its core, classification bridges the divide between IT control and business value. For IT teams, it’s about curation, optimization, and protection. For business leaders, it’s about trust, speed, AI ROI, and insight. Here’s what I mean:

Curation for AI and analytics: AI models are only as good as the data that feeds them. If organizations can’t distinguish relevant, high-quality data from noise, model accuracy suffers. Unstructured data quality is not just about what’s in a file. Quality is significantly impacted by “noise” aka the redundant, irrelevant, duplicate and often conflicting versions of the same artifacts. Classification helps curate the “right” data, tagging content that’s useful for specific AI use cases, while filtering out outdated, non-authoritative, or irrelevant material. This not only improves AI performance but also accelerates deployment timelines.

(thodonal88/Shutterstock)

Storage optimization and cost control: Understanding the difference between “hot” data (frequently accessed, high business value) and “cold” data (rarely accessed, archival) is critical for managing storage efficiently. Classification enables intelligent tiering across storage platforms, moving infrequently used data to cheaper storage tiers while keeping mission-critical data instantly accessible. For global enterprises managing petabytes across on-premises and cloud systems, this can translate to millions in annual savings. Given that unstructured data constitutes more than 5PB of data for most enterprises (74%, according to the Komprise survey), this is now a must-have strategy.

Protecting misplaced sensitive data: Sensitive data, such as PII, PHI and intellectual property, often lurks in unexpected places. Without classification, these files remain hidden, unmonitored, and vulnerable to exposure. Classification is necessary for automated detection and confinement of sensitive data, ensuring compliance with privacy laws and reducing the blast radius of potential breaches.

Why Unstructured Data Classification is Difficult

Despite the clear benefits, unstructured data classification remains a stubborn problem. The culprit is architectural fragmentation.

(McIek/Shutterstock)

Most enterprises rely on two or more storage platforms in their data centers (network-attached storage, object stores, backup systems) plus one or several cloud services. Each platform can only “see” the data it stores. Metadata indexing, enrichment, and tagging happen in isolated silos, and search or policy-based actions (like encrypting or quarantining sensitive files) rarely extend across environments.

The result is a patchwork of visibility, incomplete metadata, and inconsistent policy enforcement. These fragmented processes don’t scale with the pace of data growth or the velocity of business change. As data volumes double every few years, manual tagging and siloed tools simply can’t keep up.

IT organizations need unified visibility and a cross-platform metadata layer that indexes and enriches information regardless of where it lives. Only then can they apply consistent classification logic, automate tagging, and enforce policies at scale.

Unstructured Data Management: From Chaos to Control

(Shutterstock)

Effective unstructured data management isn’t about more storage; it’s about more intelligence. Classification turns raw data into governed, actionable assets. But achieving this requires both technical and cultural change. Here’s how to do it:

Invest in unified visibility tools: A single metadata index across all storage platforms is the first step toward breaking down silos.
Automate wherever possible: Machine learning models can classify content at scale based on file type, content patterns, and sensitivity.
Align IT and business goals: Classification shouldn’t just satisfy compliance; it should bring faster insights, better AI outcomes, and data-driven decision-making.
Continuously refine: Data evolves and so must the classification schema. Regular audits and feedback loops keep categories accurate and relevant.

The Bottom Line

Unstructured data is growing faster than organizations can store or understand it. Without classification, enterprises are flying blind, wasting resources, exposing themselves to risk, and missing opportunities to innovate with AI.

The path forward is clear: make classification a first-class discipline. It’s not just a technical exercise but a business imperative that determines how well an organization can protect, optimize, and extract value from its information.

In the data-driven economy, the companies that master unstructured data classification at scale will be the ones that turn unstructured chaos into competitive advantage.

About the Author

Krishna Subramanian is the co-founder, president and COO of Komprise. She has spent over 21 years as a senior software executive who has successfully founded, built, merged and acquired businesses to generate over $500M+ new revenues – both as founder/CEO of a start-up backed by tier-one VC’s like NEA and as corporate development leader at Sun. She has the proven ability to spot emerging market opportunities before they become major trends, identify and source opportunities, and formulate and grow new businesses in areas such as cloud computing, SaaS, and enterprise collaboration.

QCWire Graphic

Electronics Giants Tap into Industrial Automation with NVIDIA Metropolis for Factories

May 30, 2023 — The $46 trillion global electronics manufacturing industry spans more than 10…

WPP Partners with NVIDIA to Build Generative AI-Enabled Content Engine for Digital Advertising

TAIPEI, Taiwan, May 30, 2023 — NVIDIA and WPP have announced they are developing a…

MediaTek Partners with NVIDIA to Transform Automobiles with AI and Accelerated Computing

May 30, 2023 — MediaTek, a leading innovator in connectivity and multimedia, is teaming with…

HPE Reports Fiscal 2023 2nd Quarter Results

HOUSTON, May 31, 2023 — Hewlett Packard Enterprise has announced financial results for the second quarter…

Syslogic Introduces Rugged Computer Based on NVIDIA Jetson AGX Orin Industrial

BADEN, Switzerland, May 31, 2023 — Syslogic has introduced the first embedded system based on…

Industry Leaders Launch RISE to Accelerate the Development of Open Source Software for RISC-V

BRUSSELS, May 31, 2023 — The RISC-V Software Ecosystem (RISE) Project is a new collaborative…

Untamed Data Is Undermining the AI Revolution

Across industries, organizations are drowning in unstructured data: files, videos, images, chat logs, design documents,…

A (Mostly) Non-Technical AI Primer

I have studied and watched artificial intelligence grow over the last forty years. Like many,…

Ai2 Releases Molmo 2 Open Multimodal Family for Video and Multi-Image Understanding

New open models unlock deep video comprehension with novel features like video tracking and multi-image…

Bell and Queen’s University Partner to Build Sovereign AI Supercomputing Infrastructure in Canada

Dec. 16, 2025 — Bell, Canada’s largest communications company, and Queen’s University, a leading research-intensive…

DOE: AI Helps Scientists Investigate the Universe’s Biggest and Smallest Phenomena

Dec. 16, 2025 — What is the structure of the quark-gluon plasma that existed at the…

CSC: Finnish Universities Join Forces to Harness AI in Research, Education, and Innovation

Finland has an excellent opportunity to become a global leader and a European frontrunner in…

Source link

What's Hot

MWC Barcelona 2026’s weirdest phones

vivo shows off the Pad6 Pro with a 4K screen, reveals its chipset

Covering Scientific & Technical AI

Covering Scientific & Technical AI

Electronics Giants Tap into Industrial Automation with NVIDIA Metropolis for Factories

WPP Partners with NVIDIA to Build Generative AI-Enabled Content Engine for Digital Advertising

MediaTek Partners with NVIDIA to Transform Automobiles with AI and Accelerated Computing

HPE Reports Fiscal 2023 2nd Quarter Results

Syslogic Introduces Rugged Computer Based on NVIDIA Jetson AGX Orin Industrial

Industry Leaders Launch RISE to Accelerate the Development of Open Source Software for RISC-V

Untamed Data Is Undermining the AI Revolution

A (Mostly) Non-Technical AI Primer

Ai2 Releases Molmo 2 Open Multimodal Family for Video and Multi-Image Understanding

Bell and Queen’s University Partner to Build Sovereign AI Supercomputing Infrastructure in Canada

DOE: AI Helps Scientists Investigate the Universe’s Biggest and Smallest Phenomena

CSC: Finnish Universities Join Forces to Harness AI in Research, Education, and Innovation

Covering Scientific & Technical AI

Grammarly Is Offering ‘Expert’ AI Reviews From Your Favorite Authors—Dead or Alive

Covering Scientific & Technical AI

Big Tech Signs White House Data Center Pledge With Good Optics and Little Substance

iPhone Pro 13 Rumored to Feature 1 TB of Storage

Oculus Quest X Headset: Discover a Shining New Star

Fujifilm’s 102-Megapixel Camera is the Size of a Typical DSLR

Review: Mi 10 Mobile with Qualcomm Snapdragon 870 Mobile Platform

Comparison of Mobile Phone Providers: 4G Connectivity & Speed

Which LED Lights for Nail Salon Safe? Comparison of Major Brands

Subscribe to Updates

What's Hot

Covering Scientific & Technical AI

Untamed Data Is Undermining the AI Revolution

Why Classification Matters

Why Unstructured Data Classification is Difficult

Unstructured Data Management: From Chaos to Control

The Bottom Line

Related

Electronics Giants Tap into Industrial Automation with NVIDIA Metropolis for Factories

WPP Partners with NVIDIA to Build Generative AI-Enabled Content Engine for Digital Advertising

MediaTek Partners with NVIDIA to Transform Automobiles with AI and Accelerated Computing

HPE Reports Fiscal 2023 2nd Quarter Results

Syslogic Introduces Rugged Computer Based on NVIDIA Jetson AGX Orin Industrial

Industry Leaders Launch RISE to Accelerate the Development of Open Source Software for RISC-V

Untamed Data Is Undermining the AI Revolution

A (Mostly) Non-Technical AI Primer

Ai2 Releases Molmo 2 Open Multimodal Family for Video and Multi-Image Understanding

Bell and Queen’s University Partner to Build Sovereign AI Supercomputing Infrastructure in Canada

DOE: AI Helps Scientists Investigate the Universe’s Biggest and Smallest Phenomena

CSC: Finnish Universities Join Forces to Harness AI in Research, Education, and Innovation

Related Posts