EDA Reports

Complete exploratory analysis in one command

Type /eda and AiR generates a complete R Markdown report - distributions, correlations, statistical tests, and narrative insights. The AI plans the analysis, generates the code, validates it, and audits the insights.

Get Started Free
Pipeline

Four AI passes for reliable results

AiR doesn't just dump code. It plans, generates, validates, and audits - four distinct AI passes that ensure your report is correct, complete, and insightful.

1

Plan

AI analyzes your dataset structure and plans the analysis strategy: which visualizations, which tests, which relationships to explore.

2

Generate

A complete R Markdown file is generated with code chunks, ggplot visualizations, statistical summaries, and narrative text with inline R values.

3

Validate

A second AI pass reviews the code for bugs: type safety, missing packages, deprecated functions, API misuse, and truncated code.

4

Audit

A final pass checks that narrative claims are backed by actual data. Generic filler is replaced with dataset-specific inline R values.

Depth Levels

Quick overview or deep statistical analysis

Choose the depth that matches your need. Quick for a fast scan, standard for a thorough overview, deep for publication-grade analysis with statistical tests.

Quick

~15 secondsFree

A fast overview of your dataset. Summary statistics, distribution histograms for key columns, a correlation matrix, and basic data quality checks.

  • Summary statistics table
  • Key column distributions
  • Top correlations
  • Missing data overview

Standard

~30 secondsFree

A thorough analysis covering all columns. Grouped comparisons, bivariate relationships, and detailed distribution analysis with outlier detection.

  • All Quick features
  • Per-column distribution plots
  • Bivariate scatter plots
  • Grouped comparisons
  • Outlier analysis

Deep

~60 secondsPro

Publication-grade analysis with formal statistical tests, normality checks, effect size calculations, and nuanced narrative insights backed by real numbers.

  • All Standard features
  • Formal hypothesis tests
  • Normality testing (Shapiro-Wilk)
  • Effect sizes (Cohen's d, eta²)
  • Regression previews
  • Detailed narrative insights
Contents

What you get in every report

Each EDA report is a self-contained R Markdown file that knits to a beautiful HTML document. Here's what's inside.

Data Overview

Dimensions, column types, first rows, memory usage, and a structural summary of the dataset.

Summary Statistics

Mean, median, SD, quartiles, range, skewness, and kurtosis for every numeric column.

Distributions

Histograms for numeric variables, bar charts for categorical. With density overlays and rug plots.

Missing Data Analysis

Missing value counts and percentages per column. Visualization of missing data patterns.

Correlation Analysis

Correlation matrix heatmap. Top variable pairs ranked by strength. Scatter plots for key relationships.

Statistical Tests

Normality tests, t-tests, ANOVA, chi-squared - chosen automatically based on your data types (deep depth).

Grouped Comparisons

If your data has natural grouping variables, AiR generates box plots and group-wise summaries.

Narrative Insights

Plain-English observations that cite actual values from your data. Not generic - specific to your dataset.

Visualizations

Publication-quality ggplot2 charts with consistent theming, proper labels, and color-blind friendly palettes.

Validation

12-point validation catches bugs before you see them

After generating the report, a second AI pass runs through a detailed checklist to catch and fix issues. Reports that reach you have been through two rounds of quality control.

Type safety: statistical functions only applied to correct column types
Dynamic safety: list/vector indices verified against data-dependent lengths
API currency: no deprecated functions (gather, spread, aes_string, size=)
Dependency safety: every function has its package loaded, no unused imports
Runtime safety: na.rm=TRUE on aggregations, use="complete.obs" on cor()
Boxplot aesthetics: correct axis mapping for vertical/horizontal/grouped
ID column exclusion: ID columns excluded from statistical analysis
High-cardinality safety: 20+ level categoricals filtered to top-N for charts
Unique chunk labels: no duplicate R Markdown chunk names
Truncated code detection: incomplete functions or unclosed chunks caught
Content accuracy: narrative references match actual data being analyzed
Insight quality: generic claims replaced with specific inline R values
Output

Real .Rmd files that knit to beautiful HTML

The output is a standard R Markdown file. Open it in RStudio, knit it, and get a polished HTML report - or modify the code to suit your needs.

diamonds_eda.Rmd
ValidatedAudited
---
title: "Exploratory Data Analysis: diamonds"
output: html_document
---

```{r setup, include=FALSE}
library(ggplot2)
library(dplyr)
library(tidyr)
library(knitr)
knitr::opts_chunk$set(echo = FALSE, warning = FALSE,
                      message = FALSE, fig.width = 8)
```

## Data Overview

The `diamonds` dataset contains **`r nrow(diamonds)`** observations
across **`r ncol(diamonds)`** variables. The numeric variables
include carat (`r round(min(diamonds$carat), 2)` to
`r round(max(diamonds$carat), 2)`) and price
($`r min(diamonds$price)` to $`r max(diamonds$price)`).

```{r distributions}
ggplot(diamonds, aes(x = price)) +
  geom_histogram(bins = 50, fill = "#8b5cf6", alpha = 0.7) +
  scale_x_continuous(labels = scales::comma) +
  labs(title = "Distribution of Price",
       x = "Price ($)", y = "Count") +
  theme_minimal()
```

This is a snippet from a real AiR-generated report. The full output typically includes 200-500 lines of R Markdown covering all major aspects of your dataset.

Understand your data in seconds, not hours

One command. Full analysis. Every dataset.