Back to API Docs

R Package

Extract structured study data from scientific papers directly into R data frames ready for meta-analysis.

Installation

Install the package from GitHub:

# install.packages("remotes")
remotes::install_github("Dawes-Institute/evidence.guide.R")

Setup

Get your API key from your Dashboard and set it in R:

library(evidenceguide)

# Set for this session
eg_set_api_key("ck_your_api_key_here")

# Or persist to ~/.Renviron so it loads automatically
eg_set_api_key("ck_your_api_key_here", persist = TRUE)

Quick Start

Process PDFs with a single function call:

result <- eg_process(
  files = c("study1.pdf", "study2.pdf", "study3.pdf"),
  mode = "text",
  progress = TRUE
)

# View extracted studies
result$studies

Output Format

The studies data frame contains one row per study with these columns:

ColumnDescription
doiPaper DOI
titlePaper title
yearPublication year
journalJournal name
study_numberStudy index within paper
designExperimental design
hypothesisStudy hypothesis
testTest statistic type (t, F, χ², etc.)
stat_valueTest statistic value
dfDegrees of freedom
nSample size
pP-value
snippetSource text excerpt
confidenceExtraction confidence score

Step-by-Step Workflow

For more control, run each step separately:

# 1. Upload PDFs
jobs <- eg_upload(c("paper1.pdf", "paper2.pdf"))

# 2. Wait for processing
results <- eg_wait(jobs$job_id, interval = 2, timeout = 600)

# 3. Convert to tidy data frame
studies <- as_studies_df(
  purrr::set_names(results$result, results$job_id)
)

Interactive File Selection

Call eg_process() without arguments to open a file picker:

result <- eg_process()  # Opens dialog to select PDFs

Processing Large Batches

For many papers, process in batches to manage memory and avoid timeouts:

batch_process <- function(files, batch_size = 5) {
  all_studies <- list()

  for (i in seq(1, length(files), by = batch_size)) {
    batch <- files[i:min(i + batch_size - 1, length(files))]
    cat("Processing batch", ceiling(i / batch_size), "of", ceiling(length(files) / batch_size), "\n")

    result <- eg_process(files = batch, progress = TRUE)
    all_studies[[length(all_studies) + 1]] <- result$studies
  }

  dplyr::bind_rows(all_studies)
}

# Usage
all_data <- batch_process(list.files("papers/", pattern = "\\.pdf$", full.names = TRUE))

Example: Quick Meta-Analysis

library(evidenceguide)
library(dplyr)

# Extract data from papers
result <- eg_process(files = list.files("my_papers/", pattern = "\\.pdf$", full.names = TRUE))

# Filter to significant findings
significant <- result$studies %>%
  filter(p < 0.05)

# Summary by test type
result$studies %>%
  count(test, sort = TRUE)

# Export for further analysis
write.csv(result$studies, "extracted_studies.csv", row.names = FALSE)

Options

Processing modes

  • mode = "text" (default): Faster, works well for most papers
  • mode = "vision": Uses image-based extraction for complex layouts

Reprocessing

recompute = TRUE: Force reprocessing even if the paper was analyzed before

Timeouts

  • interval: Seconds between status checks (default: 1.5)
  • timeout: Maximum wait time in seconds (default: 1800)

Troubleshooting

Authentication errors

Verify your API key is set correctly with eg_get_api_key().

Insufficient credits

Check your balance in your Dashboard. Each paper costs one credit.

Timeout errors

Increase the timeout parameter for large papers, or process fewer files at once.

Resources