SRAmetadb Export

SRAmetadb Export

Export the SRAKE database to the classic SRAmetadb.sqlite format for use with existing bioinformatics tools.

Basic usage

# Export with FTS5 (recommended)
srake db export -o SRAmetadb.sqlite

# Export with FTS3 for legacy tool compatibility
srake db export -o SRAmetadb.sqlite --fts-version 3

Options

FlagDefaultDescription
-o, --outputSRAmetadb.sqliteOutput file path
--dbauto-detectedSource database path
--fts-version5FTS version (3 or 5)
--batch-size10000Records per batch
--compressfalseGzip compress output
-f, --forcefalseOverwrite existing file

Output schema

The exported database contains:

TableDescription
studyResearch studies
experimentSequencing experiments
sampleBiological samples
runSequencing runs
submissionData submissions
sraDenormalized join of all tables
sra_ftFull-text search virtual table (FTS3 or FTS5)
metaInfoVersion and creation metadata
col_descColumn descriptions

FTS version choice

FTS5 (default): Faster queries, smaller index, better Unicode support. Use for new projects.

FTS3: Use when tools specifically require FTS3 (e.g., older R/Bioconductor SRAdb package).

FTS5 support requires building SRAKE with the sqlite_fts5 build tag.

Usage with R

library(DBI)
con <- dbConnect(RSQLite::SQLite(), "SRAmetadb.sqlite")
dbGetQuery(con, "SELECT * FROM study WHERE organism = 'Homo sapiens' LIMIT 10")
dbGetQuery(con, "SELECT * FROM sra_ft WHERE sra_ft MATCH 'cancer AND RNA-Seq'")

Usage with Python

import sqlite3
import pandas as pd

conn = sqlite3.connect("SRAmetadb.sqlite")
df = pd.read_sql("SELECT * FROM study WHERE organism = 'Homo sapiens'", conn)

Schema mapping

The export handles these transformations:

  • JSON arrays to pipe-delimited strings (["A","B"] to A|B)
  • Nested metadata flattened to columns
  • Missing legacy fields populated with defaults
  • SRA/Entrez URLs generated automatically