LLM-based AI assistants are becoming increasingly capable, but they are always at risk of hallucination, sycophancy, over-confidence, and laziness. How can these flawed and non-deterministic tools ever be useful for conducting rigorous data analysis?
I'm very glad you asked!
Enter DAAF: the Data Analyst Augmentation Framework. DAAF is a free and open-source instructions framework for Claude Code that helps skilled researchers rapidly scale their expertise and accelerate data analysis across any domain with AI assistance -- without sacrificing the transparency, rigor, or reproducibility that good science demands. DAAF sits between you and Claude Code to automatically and consistently help the standard Claude AI think and work more like a responsible researcher by:
Think of DAAF like your personal lab manager for an AI-powered research lab, informed and guided by a richly detailed library of bespoke reference material to ground everything it does in real scientific best practices.
Just like a highly skilled colleague, there are a lot of ways DAAF can help based on your own workflows and your comfort level with AI: everything from having it serve as a data documentation oracle for your hyper-specific data definition questions (e.g., "Which collection years do we have in common across these eight datasets, again?"), to one-off vibe coding requests (e.g., "Can you help me review this diff-in-diff regression specification I wrote?"), to producing entire data analytic pipelines and reports given a starting research question, to verifying the empirical reproducibility of entire past analyses, and much more.
...Okay, that all sounds pretty useful to me. But what does that all mean in practice? What does that actually look like?
Great question! Let's take a deep dive together into a real project with DAAF to see how it all really works from start to finish.
Because DAAF logs and traces everything it does on your behalf, I've developed this page as a transparent walk-through of the key steps and processes involved in working through a full end-to-end analysis with DAAF: from a single natural-language prompt to a fully reproducible data analytic pipeline complete with a consolidated and cleaned analytic dataset, several thoughtfully-constructed data visualizations, supplementary regression analyses, and an in-depth data analysis report pulling it all together for your review and ready to extend in any direction you can imagine. Everything you'll see here is pulled from raw log files generated by an actual run with DAAF: no cherry-picking or hiding here.
To start, you can inspect the initial data analysis report (right-hand panel on desktop, click the "View Analytic Report" button on the bottom of your screen on mobile) that DAAF produces by default in its "Full Pipeline Mode": the full end-to-end analytic workflow. The goal of this document is to walk the human researcher through the key findings of DAAF's analysis, after which you can proceed to making revisions, extensions, or translating it into publication-level products for various venues like journals and policymaker briefs.
As you scroll down this page, you'll see exactly how DAAF takes that initial prompt and methodically steps through an extremely deliberate research and data analysis pipeline. For each step of that workflow, I explain exactly what the point of that workflow step is and display what this actually looks like in conversation with DAAF. If you want to read more about any step, you can expand each one to see (a) exactly what each specialized assistant is doing in that step of the workflow, (b) exactly what reference files each assistant reads and references to guide its work, and (c) exactly what each assistant produces in terms of analytic code, data interpretations, or research artifacts for downstream use. Every single artifact can be viewed in the right-hand file viewer panel, as well as in the full GitHub sample project folder.
Altogether, DAAF allows researchers to massively kickstart an analytic project like this one bringing together 8 different datasets from two different data providers to answer a high-level research question with in-depth data visualizations, regression analyses, and interpretation limitations -- in all of ~30 minutes of raw human time. And from there, the researcher can use DAAF to conduct arbitrary additional analyses, data visualizations, policymaker briefs, interactive dashboards, press releases, academic paper drafting, and more -- all just another prompt or two away. Nothing DAAF produces should be treated uncritically and absolutely needs to be reviewed by the human expert, but it nonetheless represents an enormous value-add for rapidly accelerating research in alignment with our core scientific principles.
The goal of DAAF is ultimately to be a force-multiplying exo-skeleton for human researchers: a way to extend and expand their expertise to produce more phenomenal and rigorous research for the betterment of our society. Made by researchers, for researchers. And perhaps most importantly: DAAF and all accompanying educational materials are open-source and will forever be free to all.
Full Pipeline Mode is just one of the many ways researchers can use DAAF to extend, enhance, and support various research workflows and tasks. Learn more about DAAF more generally at the GitHub repos and tutorial videos linked below, or begin the walkthrough below to see how complex AI-empowered research workflows actually look in practice.