R Package
Extract structured study data from scientific papers directly into R data frames ready for meta-analysis.
Installation
Install the package from GitHub:
# install.packages("remotes")
remotes::install_github("Dawes-Institute/evidence.guide.R")Setup
Get your API key from your Dashboard and set it in R:
library(evidenceguide)
# Set for this session
eg_set_api_key("ck_your_api_key_here")
# Or persist to ~/.Renviron so it loads automatically
eg_set_api_key("ck_your_api_key_here", persist = TRUE)Quick Start
Process PDFs with a single function call:
result <- eg_process(
files = c("study1.pdf", "study2.pdf", "study3.pdf"),
mode = "text",
progress = TRUE
)
# View extracted studies
result$studiesOutput Format
The studies data frame contains one row per study with these columns:
| Column | Description |
|---|---|
| doi | Paper DOI |
| title | Paper title |
| year | Publication year |
| journal | Journal name |
| study_number | Study index within paper |
| design | Experimental design |
| hypothesis | Study hypothesis |
| test | Test statistic type (t, F, χ², etc.) |
| stat_value | Test statistic value |
| df | Degrees of freedom |
| n | Sample size |
| p | P-value |
| snippet | Source text excerpt |
| confidence | Extraction confidence score |
Step-by-Step Workflow
For more control, run each step separately:
# 1. Upload PDFs
jobs <- eg_upload(c("paper1.pdf", "paper2.pdf"))
# 2. Wait for processing
results <- eg_wait(jobs$job_id, interval = 2, timeout = 600)
# 3. Convert to tidy data frame
studies <- as_studies_df(
purrr::set_names(results$result, results$job_id)
)Interactive File Selection
Call eg_process() without arguments to open a file picker:
result <- eg_process() # Opens dialog to select PDFsProcessing Large Batches
For many papers, process in batches to manage memory and avoid timeouts:
batch_process <- function(files, batch_size = 5) {
all_studies <- list()
for (i in seq(1, length(files), by = batch_size)) {
batch <- files[i:min(i + batch_size - 1, length(files))]
cat("Processing batch", ceiling(i / batch_size), "of", ceiling(length(files) / batch_size), "\n")
result <- eg_process(files = batch, progress = TRUE)
all_studies[[length(all_studies) + 1]] <- result$studies
}
dplyr::bind_rows(all_studies)
}
# Usage
all_data <- batch_process(list.files("papers/", pattern = "\\.pdf$", full.names = TRUE))Example: Quick Meta-Analysis
library(evidenceguide)
library(dplyr)
# Extract data from papers
result <- eg_process(files = list.files("my_papers/", pattern = "\\.pdf$", full.names = TRUE))
# Filter to significant findings
significant <- result$studies %>%
filter(p < 0.05)
# Summary by test type
result$studies %>%
count(test, sort = TRUE)
# Export for further analysis
write.csv(result$studies, "extracted_studies.csv", row.names = FALSE)Options
Processing modes
mode = "text"(default): Faster, works well for most papersmode = "vision": Uses image-based extraction for complex layouts
Reprocessing
recompute = TRUE: Force reprocessing even if the paper was analyzed before
Timeouts
interval: Seconds between status checks (default: 1.5)timeout: Maximum wait time in seconds (default: 1800)
Troubleshooting
Authentication errors
Verify your API key is set correctly with eg_get_api_key().
Insufficient credits
Check your balance in your Dashboard. Each paper costs one credit.
Timeout errors
Increase the timeout parameter for large papers, or process fewer files at once.