When Data Moves, Risk Moves with It: The Hidden Challenges of Warehousing Data

For many organizations, the move to a modern data architecture brings with it a host of appealing possibilities. It promises centralized data, broken down silos, advanced analytics, power dashboards, forecasting models, and, of course, everyone’s favorite topic: artificial intelligence. The data warehouse or data lake—often cloud-based, scalable, and fast—becomes the foundation for meeting the ultimate goal of more informed, data-driven decision-making.

Yet for internal auditors, the story rarely ends there. Behind every sleek analytics platform is a long, complex, and often painful process of moving data from where it lives today into where leaders want it to be tomorrow. That process—data ingestion, transformation, and integration—is where many of the most significant risks quietly emerge.

Companies tend to underestimate how difficult it is to move data well. The technical lift is substantial, but the organizational, governance, and control challenges are often even greater. As data volumes grow and expectations rise, internal audit functions are increasingly being asked to assess whether the journey to a data warehouse or data lake has introduced new risks, weakened controls, or created blind spots that management did not anticipate.

The result is a familiar pattern: the platform goes live, dashboards light up, and executives celebrate progress—while unresolved data quality issues, security gaps, and ownership ambiguities linger beneath the surface.

“As data streaming adoption grows, organizations need to address the importance of governance. Using many different data lakes and tools—with various governance models, schema formats and latency profiles—can be difficult to manage,” says Nicolas Orban, CEO of Conduktor, an intelligent data hub for streaming data and AI. “Fragmented data creates chaos, including missed signals, duplicated work, and poor decisions.”

The Illusion of ‘Just Moving Data’

One of the most common misconceptions is that moving data into a warehouse or data lake is primarily a technical exercise. Extract the data from source systems, load it into a centralized environment, and let analytics teams take it from there. In reality, data movement is less like relocating boxes and more like translating languages while rebuilding a house.

Most enterprise data originate in transactional systems designed for operations, not analytics. ERP platforms, claims systems, loan origination tools, HR applications, and bespoke legacy systems all store data differently, enforce different rules, and evolve on their own timelines. When companies attempt to consolidate this information, inconsistencies quickly surface.

Customer identifiers may not align across systems. Dates may be stored in different formats. Key fields may be optional in one system and mandatory in another. Over time, business users often adapt systems to meet local needs, creating undocumented workarounds that only become visible when data is pulled together for enterprise-wide reporting.

Internal auditors frequently encounter situations where the data warehouse is technically “accurate” but conceptually misleading. The numbers reconcile at a high level, yet subtle differences in definitions produce materially different interpretations. Revenue, risk exposure, headcount, or customer churn may mean slightly different things depending on the source. Once that ambiguity enters a centralized platform, it can propagate quickly, amplified by dashboards and automated reports.

From an audit perspective, this creates a new category of risk: management decisions made with confidence in data that appears authoritative but is not consistently defined or understood.

Data Quality: The Risk that Scales Fastest

Data quality challenges are not new, but data warehouses and data lakes magnify their impact. A single error in a source system can ripple through dozens of downstream reports. A flawed transformation rule can quietly distort results for months before anyone notices.

Organizations often assume that centralization will improve data quality by default. In practice, centralization simply concentrates existing problems unless deliberate controls are designed into the ingestion process. Missing values, duplicate records, outdated reference data, and invalid entries do not disappear when data is moved; they accumulate.

Consider a financial services company that consolidates loan data from multiple regional systems into a cloud-based warehouse to support credit risk modeling. Each region has historically applied slightly different rules for classifying delinquency. When those datasets are merged, the analytics team builds models on top of inconsistent assumptions. The models perform well statistically, but they embed structural bias that auditors later struggle to untangle.

For internal audit, the challenge is not only identifying data quality issues but determining accountability. Who owns the accuracy of data once it leaves the source system? Is it the business unit that generated it, the IT team that moved it, or the analytics function that transformed it? In many organizations, the answer is “everyone and no one,” which makes remediation slow and politically difficult.

Transformation Logic: Where Controls Often Go Missing

Data rarely moves into warehouses or lakes in its raw form. It is filtered, enriched, aggregated, and restructured along the way. These transformations are essential, but they are also a common point of control failure.

Transformation logic is often developed quickly to meet reporting deadlines, using scripts or tools that are poorly documented and lightly tested. Over time, as new data sources are added and business requirements change, transformation rules become layered and complex. Small changes can have unintended consequences that are difficult to detect without rigorous controls.

Internal auditors reviewing these environments often find that change management practices have not kept pace with the criticality of the data. Code changes may be promoted without formal approval. Testing may focus on whether a pipeline runs, not whether the output is correct. Documentation may lag far behind reality, leaving key processes dependent on a handful of individuals who understand how the system really works.

In regulated industries, this becomes more than an operational concern. When regulatory reporting relies on warehouse or lake data, weaknesses in transformation controls can translate directly into compliance risk. Even outside regulated reporting, management reporting errors can erode trust between executives, boards, and the functions responsible for delivering insights.

The Governance Gap in Data Lakes

Data lakes, in particular, introduce unique challenges. Designed to store large volumes of structured and unstructured data in its native format, data lakes promise flexibility and speed. But that flexibility can easily slide into disorder.

Without strong governance, data lakes can become dumping grounds where data is ingested “just in case” it might be useful later. Metadata is incomplete or nonexistent. Naming conventions vary. Data lineage is unclear. Analysts pull datasets without fully understanding their origin or limitations.

From an internal audit standpoint, this lack of structure complicates assurance efforts. Traditional audit techniques rely on clear process flows and ownership. In a poorly governed data lake, it may be impossible to determine who is responsible for validating data, approving its use, or retiring obsolete datasets.

Security risks also increase. Sensitive data may be ingested without proper classification, encryption, or access controls. Because data lakes often integrate with multiple analytics tools, access paths multiply. A user who would never have been granted direct access to a source system may be able to query the same data indirectly through the lake.

Auditors are increasingly being asked to assess not just whether controls exist, but whether the organization understands what data it holds, where it came from, and who can see it. In many cases, the answer is uncomfortable.

Legacy Systems: The Uncooperative Participants

Legacy systems deserve special mention because they often present the most stubborn obstacles. Many organizations rely on platforms that were never designed to feed modern analytics environments. Data extraction can be slow, incomplete, or disruptive to operations.

To work around these limitations, teams may rely on batch extracts, flat files, or manual processes. These stopgap solutions introduce additional risks, including data latency, incomplete captures, and opportunities for error or manipulation. Over time, they can become entrenched, even as the rest of the architecture modernizes around them.

Internal auditors often find that management underestimates the ongoing risk posed by these legacy dependencies. While the data warehouse or lake may be cloud-native and well-controlled, the weakest link remains the upstream system and the fragile processes used to extract its data.

This creates a false sense of security. Leaders see modern dashboards and assume the underlying processes are equally modern. Auditors, by contrast, are tasked with looking beneath the surface—and explaining why the risks are not where management expects them to be.

Speed Versus Control: A Tension That Never Disappears

Perhaps the most persistent challenge in data migration initiatives is the tension between speed and control. Business leaders want insights quickly. Analytics teams want flexibility. IT wants stability. Audit wants assurance. These priorities are not inherently incompatible, but they are often treated as such.

In fast-moving data programs, controls are frequently deferred with the intention of “adding them later.” In practice, later rarely comes. Once stakeholders are using reports and dashboards to make decisions, retrofitting controls becomes far more difficult. Any change that might disrupt output—even to improve accuracy—faces resistance.

Internal audit functions that engage too late in the process may find themselves in a reactive position, documenting issues that are already embedded in business-as-usual operations. Those that engage earlier have an opportunity to influence design decisions, advocating for controls that scale alongside the data platform rather than lag behind it.

This is not about slowing innovation. It is about recognizing that data movement is not a neutral act. Every design choice—what data to include, how to transform it, who can access it—has risk implications that deserve thoughtful consideration.

What This Means for Internal Audit

As data warehouses and data lakes become core infrastructure, internal audit’s role is evolving. Internal auditors are no longer just consumers of data produced by others; they are increasingly reviewers of the systems that produce that data.

This requires new skills and new conversations. Internal auditors need to understand data architecture well enough to ask the right questions, without trying to become engineers. They need to assess governance frameworks that cut across business and technology. And they need to help boards and executives understand that data risk is not confined to cybersecurity or privacy—it extends to accuracy, consistency, and decision integrity.

Perhaps most importantly, auditors must help organizations confront an uncomfortable truth: centralizing data does not automatically create clarity. In many cases, it simply concentrates complexity. The real work lies in managing that complexity with discipline, transparency, and accountability.

When data moves, risk moves with it. The organizations that recognize this early—and design their data platforms with both insight and assurance in mind—will be far better positioned to realize the promise of analytics without being blindsided by its unintended consequences.

Sidebar: The Questions Internal Auditors Should Be Asking About Data Warehouses and Data Lakes

As data warehouses and data lakes become critical infrastructure for reporting, analytics, and decision-making, internal audit’s value often lies less in technical detail and more in asking the right questions—early and persistently. These questions are not meant to turn auditors into data engineers. They are meant to surface ownership gaps, control weaknesses, and unexamined assumptions that can undermine confidence in enterprise data.

One of the first areas auditors should explore is accountability. When data moves from operational systems into a centralized environment, ownership often becomes blurred. Internal auditors should be asking who is ultimately responsible for the accuracy of data once it enters the warehouse or lake—and whether that responsibility is clearly understood and formally assigned. If data is wrong, incomplete, or misleading, who is expected to fix it? In many organizations, that answer changes depending on who is asked, which is itself a red flag.

Closely related is the question of data definition. Auditors should probe whether key metrics and fields are defined consistently across source systems and downstream reporting. It is worth asking how the organization ensures that terms like “revenue,” “customer,” “loss,” or “delinquency” mean the same thing everywhere they appear. If differences exist—and they almost always do—auditors should understand whether those differences are intentional, documented, and communicated to users of the data.

Data quality controls deserve particular attention, especially at the point of ingestion. Rather than focusing only on whether pipelines run successfully, auditors should ask what happens when data does not meet expectations. Are there automated checks for completeness, validity, and reasonableness? When those checks fail, are issues logged, investigated, and resolved, or are they quietly bypassed to keep reports on schedule? The way an organization handles exceptions often reveals more about its control environment than its formal policies.

Transformation logic is another critical area for inquiry. Internal auditors should seek to understand how data is being altered as it moves into analytical structures and whether those transformations are subject to appropriate governance. Questions about documentation, testing, and change management are especially important when transformed data is used for management reporting, external disclosures, or regulatory submissions. Auditors should also be alert to situations where a small group of individuals holds deep, undocumented knowledge of how data “really” works.

For organizations using data lakes, governance questions become even more pressing. Auditors should ask how the organization prevents the lake from becoming an unmanaged repository of poorly understood data. Is there a process for approving new data sources, classifying sensitive information, and retiring data that is no longer reliable or relevant? Equally important is understanding how users discover and select datasets, and whether there is sufficient context provided to prevent misuse or misinterpretation.

Security and access are perennial concerns, but they take on new dimensions in centralized data environments. Internal auditors should explore whether access to warehouse and lake data is aligned with business need, and whether sensitive data is protected consistently as it moves away from source systems. It is also worth asking whether indirect access through analytics tools is reviewed with the same rigor as direct system access.

Auditors should not overlook upstream dependencies. Even the most sophisticated data platform is only as reliable as the systems feeding it. Questions about legacy systems, manual extracts, and workarounds can reveal hidden risks that persist long after a new warehouse or lake goes live. Understanding where data is most fragile upstream can help auditors focus their efforts where they matter most.

Finally, internal auditors should ask how confident leadership really is in the data being used to make decisions—and what that confidence is based on. Are executives relying on trust in the technology, or on evidence that data is governed, controlled, and understood? When dashboards conflict or numbers change unexpectedly, is there a clear path to explanation and resolution?

By framing their work around these questions, internal auditors can move beyond compliance-oriented reviews and into a more strategic role—helping organizations ensure that their investment in data delivers insight without sacrificing integrity.

Joseph McCafferty is editor & publisher of Internal Audit 360°

When Data Moves, Risk Moves with It: The Hidden Challenges of Warehousing Data

The Illusion of ‘Just Moving Data’

Data Quality: The Risk that Scales Fastest

Transformation Logic: Where Controls Often Go Missing

The Governance Gap in Data Lakes

Legacy Systems: The Uncooperative Participants

Speed Versus Control: A Tension That Never Disappears

What This Means for Internal Audit

Sidebar: The Questions Internal Auditors Should Be Asking About Data Warehouses and Data Lakes

Leave a Reply Cancel reply

Legal Department Tech Budgets Expected to Double by 2028

Survey Finds Information Integrity Risk the Top Concern Among Risk Leaders

SEC Moves to Repeal Climate Change Disclosure Rules

The IIA Announces Changes to its CIA Challenge Exam Program

U.K. Audit Regulator Launches New Supervisory Model

AuditBoard Unveils New Identy, ‘Optro,’ as AI Transforms GRC