R Package

Extract structured study data from scientific papers directly into R data frames ready for meta-analysis.

Installation

Install the package from GitHub:

# install.packages("remotes")
remotes::install_github("Dawes-Institute/evidence.guide.R")

Setup

Get your API key from your Dashboard and set it in R:

library(evidenceguide)

# Set for this session
eg_set_api_key("ck_your_api_key_here")

# Or persist to ~/.Renviron so it loads automatically
eg_set_api_key("ck_your_api_key_here", persist = TRUE)

Quick Start

Process PDFs with a single function call:

result <- eg_process(
  files = c("study1.pdf", "study2.pdf", "study3.pdf"),
  mode = "text",
  progress = TRUE
)

# View extracted studies
result$studies

Output Format

The studies data frame contains one row per study with these columns:

Column	Description
doi	Paper DOI
title	Paper title
year	Publication year
journal	Journal name
study_number	Study index within paper
design	Experimental design
hypothesis	Study hypothesis
test	Test statistic type (t, F, χ², etc.)
stat_value	Test statistic value
df	Degrees of freedom
n	Sample size
p	P-value
snippet	Source text excerpt
confidence	Extraction confidence score

Step-by-Step Workflow

For more control, run each step separately:

# 1. Upload PDFs
jobs <- eg_upload(c("paper1.pdf", "paper2.pdf"))

# 2. Wait for processing
results <- eg_wait(jobs$job_id, interval = 2, timeout = 600)

# 3. Convert to tidy data frame
studies <- as_studies_df(
  purrr::set_names(results$result, results$job_id)
)

Interactive File Selection

Call eg_process() without arguments to open a file picker:

result <- eg_process()  # Opens dialog to select PDFs

Processing Large Batches

For many papers, process in batches to manage memory and avoid timeouts:

batch_process <- function(files, batch_size = 5) {
  all_studies <- list()

  for (i in seq(1, length(files), by = batch_size)) {
    batch <- files[i:min(i + batch_size - 1, length(files))]
    cat("Processing batch", ceiling(i / batch_size), "of", ceiling(length(files) / batch_size), "\n")

    result <- eg_process(files = batch, progress = TRUE)
    all_studies[[length(all_studies) + 1]] <- result$studies
  }

  dplyr::bind_rows(all_studies)
}

# Usage
all_data <- batch_process(list.files("papers/", pattern = "\\.pdf$", full.names = TRUE))

Example: Quick Meta-Analysis

library(evidenceguide)
library(dplyr)

# Extract data from papers
result <- eg_process(files = list.files("my_papers/", pattern = "\\.pdf$", full.names = TRUE))

# Filter to significant findings
significant <- result$studies %>%
  filter(p < 0.05)

# Summary by test type
result$studies %>%
  count(test, sort = TRUE)

# Export for further analysis
write.csv(result$studies, "extracted_studies.csv", row.names = FALSE)

Options

Processing modes

mode = "text" (default): Faster, works well for most papers
mode = "vision": Uses image-based extraction for complex layouts

Reprocessing

recompute = TRUE: Force reprocessing even if the paper was analyzed before

Timeouts

interval: Seconds between status checks (default: 1.5)
timeout: Maximum wait time in seconds (default: 1800)

Troubleshooting

Authentication errors

Verify your API key is set correctly with eg_get_api_key().

Insufficient credits

Check your balance in your Dashboard. Each paper costs one credit.

Timeout errors

Increase the timeout parameter for large papers, or process fewer files at once.