They generate plausible code or queries but often without validating against actual structures or edge cases. This is where an understanding of prompt engineering comes in which doesn’t just mean better phrasing but translating context into constraints the model can work with. Otherwise, you're just as likely to get broken logic as usable code.
Providing that structure up front significantly improves the accuracy of the output and this kind of precision is critical in data projects.
Then there’s precision. You could be using GenAI to craft transformation pipelines or writing code. But the problem with this is that generative AI often hallucinates, which means that it confidently suggests syntax, libraries, or functions that don’t behave as described or sometimes even don’t exist! This can be especially risky when you're deploying to production or relying on subtle transformations that impact business-critical logic.
That’s why you still need to vet the output carefully. Check the generated code against official documentation, test it in a sandbox, and validate the assumptions it's making. Even better, turn the AI into a research assistant. Ask it to cite its sources, link to relevant docs, or summarize the best practices from trusted repositories. Perhaps even ask the LLM to explain the rationale behind the code it generates. This not only helps you understand what it's trying to do, but also gives you a chance to spot gaps in its logic or mismatches with your data context before integrating anything into your pipeline.
They’re also stateless. Most models can’t track your session context or versioned data logic across interactions. Unless you carefully prompt, they’ll forget key constraints or project-specific naming conventions. A work around for statelessness is maintaining a session summary. This is a running list of decisions, assumptions, and outputs that you can paste into each new prompt to keep the model aligned.
Until LLMs gain persistent memory or better long-context performance, the burden of context management is on you. Being explicit pays off.
Finally, there’s trust. In data engineering, pipelines break when assumptions are wrong. You can’t just eyeball AI output, you need test coverage, validation, and deployment-aware thinking that these tools can’t yet offer. To work around this, treat any AI-generated code or config as a first draft, not production-ready logic, always assume it's incomplete. You can build unit tests to see how well the generated code performs. In addition, consider working in a virtual environment when testing AI-suggested code. It allows you to safely install and trial new dependencies without affecting your core environment or other projects.
Used well, generative AI can accelerate boilerplate, improve documentation, and even suggest alternatives. But it’s not a drop-in replacement for domain knowledge, testing discipline, or production-readiness.
Where Does AI Fail You?
Generative AI is everywhere and it’s not perfect. If you’ve ever been frustrated by code hallucinations, vague answers, or simply found that AI has a knowledge gap when it comes to your industry, we want to know. Help us map the real-world gaps in AI adoption by sharing your experience. We’ll publish the results (anonymized) in an upcoming issue on prompt engineering for data professionals.