Many DevOps leaders could be sleepwalking into a regulatory breach and security nightmare when it comes to AI data privacy. Recent research shows that while most IT leaders focus on locking down production systems, there are very real dangers in non-production environments, such as AI training. These areas often use real, sensitive data, such as Personal Identification Information (PII), including customer health records, financial information, and Social Security numbers. The consequences can include data breaches, security issues, regulatory fines, and loss of market reputation and customer trust.
Nor are the risks just hypothetical. According to the Perforce 2025 State of Data Compliance & Privacy, 60% of the survey’s respondents have experienced data breaches or data theft in software development, testing, AI, and analytics environments, an 11% increase compared to 2024’s report results. 22% report regulatory non-compliance status or fines, plus a further 32% have faced audit issues. These results are all the more concerning given that 100% of the survey’s 280 global respondents must adhere to regulatory compliance, including CCPA, GDPR, and HIPAA.
Confusion, Contradictions, and Complacency
While DevOps professionals' awareness of the risks of exposing sensitive data in general may be high, the research shows that this awareness is not translating into safer practices in non-production environments like AI, and that confusion is prevalent.
91% of organizations believe that sensitive data should be allowed in AI training and testing, and 90% use sensitive data in AI
82% are of the opinion that using sensitive data in AI model training and fine-tuning is safe.
Furthermore, 84% confess to compliance exceptions in non-production environments, including AI, despite the high rate of data breaches, as well as audit and regulatory compliance issues.
Yet, DevOps leaders are not oblivious to the risks being taken:
78% of the survey’s respondents admit to being highly concerned about theft or breaches in AI model training
68% worry about compliance and audit issues.
So why are organisations taking such significant risks? It turns out that the leading reason for 76% of the survey’s respondents is data-driven decision-making. Teams working in AI training and other non-production environments like software development, testing, and analytics are data-hungry, depending on information that is as realistic as possible. The further the data drifts away from production-like values, the less accurate the results will be.
Shortcuts win over safeguards
Also, DevOps teams want access to production-like data quickly, which increases the temptation to use real customer data, despite the potential risks of inadvertently exposing sensitive information. The situation is exacerbated when feeding this information to AI. As most people are increasingly aware, AI is a blabbermouth that shares secrets and has a long memory.
The reality is that while DevOps leaders know that data privacy is essential, they want to balance that necessity against pressure to innovate as fast as possible, especially in the era of AI. There is also a cultural challenge here: protecting data is viewed as complex, time-consuming, and a roadblock to innovation.
Using Real Data in AI Is Not Necessary
It is time to put these misconceptions in their place and remove the confusion. The fact is that there is no need to use sensitive data in non-production environments, by using techniques and tools that deliver realistic fit-for-purpose data. There are also strong indications that, while many organizations might still be a confusing mire today, many have already taken steps towards better protection of data in AI training environments, with other plans emerging.
For example, it is good to hear that 95% have some form of masking policy or mandate, and that 95% are already using static data masking. This method hides or replaces sensitive data while keeping it usable for environments like AI training. Furthermore, modern static data masking tools can provide that data in a fraction of the time it previously took to give DevOps teams realistic production-like data. Whereas once it required database administrators (DBAs) taking days to deliver this data, that timeframe can be brought down to just a couple of hours, and self-served by users, rather than relying on DBAs, some of whose time is subsequently liberated for other priorities.
Another option is synthetic data, which is artificially generated using production-like values, but without having any contact whatsoever with real data. Nearly half of organisations surveyed say they are already using synthetic data in AI development, although adoption is in its early stages: a third report using it on a small scale or experimentally, and a further quarter have tried it but experienced problems with speed, scale, and quality.
Teams Will Combine Synthetic Data and Data Masking
However, synthetic data is evolving at a rapid pace, including the use of AI technology itself to automatically generate realistic artificial data at speed and customised to the situation. Looking ahead, the most likely scenario is that DevOps teams will use a combination of data masking and synthetic data techniques, depending on the use case —for example, for compliance, new applications, or to address a specific requirement.
Looking more broadly, 86% plan to invest in AI-specific data privacy solutions across the next year or so, but it is important to note that tools are not the only answer to better data privacy in AI and other non-production environments. Creating a culture where governance is prioritised and policies are consistently enforced has to be a priority, with awareness and governance part of every DevOps team's DNA.
After all, sacrificing data privacy to keep up with innovation and business demands is a hazardous strategy for DevOps leaders that could end in disaster and with a high price to pay. Conversely, putting in place a plan that gives teams high-quality, production-like data without risking exposure of sensitive data will help DevOps teams maintain, and even improve, innovation at speed and at scale.