Build, evaluate, and interpret statistical models - without writing a line of code
A guided wizard that fits 15 model types live in your R session - supervised and unsupervised. Validate with cross-validation, generate diagnostic plots, compare models side-by-side, and get AI interpretation. All from the AiR panel inside RStudio.
Get Started FreeSix steps from dataset to interpretation
The Model Builder walks you through every stage. Pick a dataset, select variables, configure your model, and get results - all in a structured flow that keeps you in control.
Select Dataset
Choose from any dataframe loaded in your R environment. AiR shows dimensions, column types, and a preview.
Variables & Interactions
Pick target and predictors (supervised) or just variables (unsupervised). Add interaction terms between any pair of predictors.
Choose Model
12 supervised models or 3 unsupervised methods. AiR filters by your task type and shows package availability.
Configure
Validation method, metrics, stepwise selection, class imbalance handling, diagnostic plots, and model-specific parameters.
Results & Compare
Metrics, coefficients, plots, confusion matrices. Compare multiple models side-by-side and highlight the best performer.
AI Interpretation
A plain-English summary that cites actual values from your model output. Key findings, significance, and next steps.
15 models, from linear to ensemble to unsupervised
Cover the full spectrum of statistical and machine learning models. Each generates clean R code and structured JSON results.
Regression
Linear Regression
base RThe workhorse. Interpretable coefficients with p-values and R². Supports stepwise AIC/BIC.
Polynomial Regression
base RFits curved relationships using polynomial terms of your predictors.
Ridge Regression
glmnetL2 regularization. Handles multicollinearity and prevents overfitting.
Lasso Regression
glmnetL1 regularization. Built-in feature selection by shrinking coefficients to zero.
Elastic Net
glmnetCombines L1 and L2 penalties. Configurable alpha for the mixing ratio.
Classification
Logistic Regression
base RBinary classification with interpretable odds ratios and log-likelihood.
GLM (General)
base RGaussian, Poisson, Gamma, inverse Gaussian families with configurable link functions. Supports stepwise.
Naive Bayes
naivebayesFast probabilistic classifier. Works well with many features and small datasets.
K-Nearest Neighbors
classInstance-based learning. Classifies by majority vote of K nearest data points.
Regression & Classification
Decision Tree
rpartRecursive partitioning. Visual tree structure with pruning via complexity parameter.
Random Forest
randomForestEnsemble of decision trees with OOB error, variable importance, and configurable ntree.
Support Vector Machine
e1071Finds optimal separating hyperplane. Radial, linear, and polynomial kernels with tunable cost.
XGBoost
xgboostGradient boosted trees. Configurable rounds, depth, and learning rate. Top-tier predictive performance.
Unsupervised
PCA
base RPrincipal component analysis with scree plot and biplot. Explore variance structure and reduce dimensions.
K-Means Clustering
base RPartition data into K clusters. Elbow plot for optimal K and cluster visualization.
Hierarchical Clustering
base RAgglomerative clustering with dendrogram. Multiple linkage methods and distance metrics.
Four ways to validate your model
Go beyond a single train/test split. Cross-validation gives you robust, reliable performance estimates with confidence intervals.
Train/Test Split
DefaultClassic holdout. Split your data into training and test sets with a configurable ratio (50-90%). Fast and simple.
Best for: Quick iteration, large datasets, exploratory modeling.
K-Fold Cross-Validation
RecommendedDivide data into K folds, train on K-1, test on the held-out fold. Rotate K times. Reports mean ± SD across all folds.
Best for: Reliable estimates, moderate datasets. Default: 10 folds.
Repeated K-Fold CV
RobustRun K-Fold CV multiple times with different random splits. Reduces variance in your performance estimates.
Best for: Final model evaluation, publication-grade results. Default: 3 repeats × 10 folds.
Leave-One-Out (LOOCV)
ExhaustiveEach observation is used as a test set exactly once. Maximum use of your data - ideal when every data point counts.
Best for: Small datasets (<200 rows). Warning shown for large datasets.
How K-Fold Cross-Validation Works
Each fold takes a turn as the test set. Performance is averaged across all folds, giving you a mean \u00B1 SD estimate.
Beyond fitting a model
Tools for the full modeling workflow - from feature engineering to model selection to diagnostics.
Model Comparison
Run multiple models on the same data and compare them in a side-by-side table. Best metric values are highlighted automatically.
Diagnostic Plots
Residuals vs Fitted, Q-Q plots, Predicted vs Actual, and ROC curves - rendered as inline images directly in the results panel.
Interaction Terms
Add interaction terms between any pair of predictors with a click. The formula builder handles backtick-escaping automatically.
Stepwise Selection
Forward, backward, or both - using AIC or BIC. Available for LM and GLM models. See which variables were kept or dropped.
Class Imbalance
Undersample majority or oversample minority class before training. Applied to training data only to prevent data leakage.
Unsupervised Mode
Toggle to unsupervised mode for PCA, K-Means, and hierarchical clustering. No target variable needed - just select numeric features.
Choose exactly what to measure
Select the evaluation metrics that matter for your analysis. AiR calculates and displays only what you ask for.
Regression
Proportion of variance explained. 1.0 = perfect fit, 0 = no better than the mean.
Root mean squared error. In the same units as your target variable. Lower is better.
Mean absolute error. Less sensitive to outliers than RMSE. Lower is better.
R² penalized for number of predictors. Prevents overfitting from adding useless variables.
Mean absolute percentage error. Intuitive percentage scale, but unstable near zero.
Classification
Percentage of correct predictions. Simple but can be misleading on imbalanced data.
Of all positive predictions, how many were actually positive. Matters when false positives are costly.
Of all actual positives, how many did the model catch. Matters when false negatives are costly.
Harmonic mean of precision and recall. A balanced single metric for classification.
Area under the ROC curve. Measures discrimination ability across all thresholds. Binary only.
Penalizes confident wrong predictions. Lower is better. Evaluates predicted probabilities.
Structured output, not raw R console
Every model produces a clean results dashboard: metrics grid, coefficient tables, confusion matrices, and variable importance - formatted and readable.
Example: Regression Results (10-Fold CV)
Example: Multi-Class Confusion Matrix
Multi-class confusion matrices render automatically for models with 3+ classes. Binary models show TP/FP/FN/TN.
AI Interpretation
Model Summary
Linear regression predicting mpg using cyl, hp, and wt on 32 observations with 10-fold cross-validation.
Key Findings
The model explains 82.6% \u00B1 4.8% of variance across folds. Weight has the strongest effect (-3.17 per 1000lbs, p<0.001). Horsepower is not significant (p=0.142) and may be removed.
Next Steps
Try removing hp (non-significant), compare with XGBoost or Random Forest using the comparison table, or add a wt:cyl interaction term.
The AI reads your actual model output - coefficients, p-values, metrics, variable importance - and produces a contextual interpretation. No generic summaries.
When code fails, AiR fixes it automatically
Model code execution can fail for many reasons - missing packages, data type issues, convergence problems. AiR detects the failure, sends the code and error to AI for diagnosis, and retries with corrected code. You see results, not errors.
The auto-fix pass sends the full R code and complete console output to AI for diagnosis. It understands that datasets are pre-loaded in your R session and focuses on actual code bugs. If the fix succeeds, you see results with an “Auto-fixed” badge. If the issue is truly unfixable (e.g., missing data), you get a clear plain-English explanation instead of a cryptic R error.
Every model produces reproducible R code
The exact code that trained your model is available as a clean, self-contained R script. Copy it, save it, share it - it runs identically every time.
# Model: Linear Regression on mtcars
# Target: mpg | Predictors: cyl, hp, wt
# Validation: 10-Fold Cross-Validation
# Generated by AiR
set.seed(42)
library(caret)
.air_data <- mtcars[, c("mpg", "cyl", "hp", "wt")]
.air_data <- na.omit(.air_data)
ctrl <- trainControl(method = "cv", number = 10,
savePredictions = "final")
model <- train(mpg ~ cyl + hp + wt,
data = .air_data,
method = "lm",
trControl = ctrl)
# Results
print(model)
summary(model$finalModel)Ready to build models in RStudio?
Install AiR, load a dataset, and start modeling in under a minute.