Tool Compatibility

Compatibility with Other Tools

SRAKE (SRA Knowledge Engine) provides comprehensive functionality that matches and extends popular SRA metadata tools. This guide shows how SRAKE commands map to equivalent operations in other tools.

Feature Comparison Matrix

Feature	SRAKE	SRAdb	ffq	pysradb	MetaSRA
Local Database	✅	✅	❌	✅	❌
Streaming Processing	✅	❌	❌	❌	❌
Accession Conversion	✅	✅	✅	✅	✅
Multi-source Download	✅	✅	✅	✅	❌
Relationship Queries	✅	✅	❌	✅	❌
Batch Operations	✅	✅	✅	✅	❌
Resume Capability	✅	❌	❌	✅	❌
Filtering on Ingest	✅	❌	❌	❌	❌
REST API	✅	❌	❌	❌	✅
Aspera Support	✅	✅	❌	✅	❌

Command Equivalents

SRAdb (R Package) → SRAKE

SRAdb:

# Convert SRP to GSE
sraConvert(in_acc = "SRP123456",
           out_type = "gse")

# Convert GSM to SRX
sraConvert(in_acc = "GSM123456",
           out_type = "srx")

srake:

# Convert SRP to GSE
srake convert SRP123456 --to GSE

# Convert GSM to SRX
srake convert GSM123456 --to SRX

SRAdb:

# Download SRA files
getSRAfile(in_acc = "SRR123456",
          method = "curl")

# Download FASTQ files
getFASTQfile(in_acc = "SRR123456",
            srcType = "ftp")

srake:

# Download SRA files
srake download SRR123456

# Download FASTQ files
srake download SRR123456 --type fastq

SRAdb:

# Search metadata
getSRA(search_terms = "breast cancer",
       out_types = c("study", "sample", "experiment"))

srake:

# Search metadata
srake search "breast cancer" --format json

SRAdb:

# Get SRA info
getSRAinfo(in_acc = "SRP123456",
          sra_con = sra_con)

srake:

# Get metadata
srake metadata SRP123456 --detailed

ffq → srake

ffq:

# Get metadata for an accession
ffq SRR123456

# Get metadata with specific depth
ffq -l 2 GSE123456

# Save to JSON
ffq -o metadata.json SRR123456

srake:

# Get metadata for an accession
srake metadata SRR123456

# Get related metadata
srake studies SRR123456 --detailed

# Save to JSON
srake metadata SRR123456 --format json --output metadata.json

ffq:

# Get FTP links
ffq --ftp SRR123456

# Get AWS links
ffq --aws SRR123456

# Get GCP links
ffq --gcp SRR123456

srake:

# Download from FTP
srake download SRR123456 --source ftp

# Download from AWS
srake download SRR123456 --source aws

# Download from GCP
srake download SRR123456 --source gcp

ffq:

# Query from multiple databases
ffq GSE123456  # Queries GEO
ffq SRR123456  # Queries SRA
ffq ENCSR000EYA # Queries ENCODE

srake:

# Convert between databases
srake convert GSE123456 --to SRP  # GEO to SRA
srake convert SRR123456 --to GSM  # SRA to GEO

# Direct metadata query
srake metadata GSE123456  # Handles any accession type

pysradb → srake

pysradb:

# Get metadata
from pysradb import SRAweb
db = SRAweb()
df = db.sra_metadata('SRP123456')

# Detailed metadata
df = db.sra_metadata('SRP123456', detailed=True)

srake:

# Get metadata
srake metadata SRP123456

# Detailed metadata
srake metadata SRP123456 --detailed --format json

pysradb:

# Download SRA files
db.download(df, protocol='fasp')

# Download with filters
db.download(df, filter_by_library_strategy='RNA-Seq')

srake:

# Download with Aspera
srake download SRP123456 --aspera

# Download with filters (filter during ingest)
srake ingest --auto --strategies RNA-Seq
srake download SRP123456

pysradb:

# GSM to SRP
srp = db.gsm_to_srp(['GSM123456'])

# SRP to GSE
gse = db.srp_to_gse(['SRP123456'])

# SRX to SRR
srr = db.srx_to_srr(['SRX123456'])

srake:

# GSM to SRP
srake convert GSM123456 --to SRP

# SRP to GSE
srake convert SRP123456 --to GSE

# SRX to SRR
srake runs SRX123456

pysradb:

# Search by study
results = db.search_by_study_title('cancer')

# Search experiments
results = db.search_sra_studies('breast cancer', max_results=100)

srake:

# Search studies
srake search "cancer" --limit 100

# Search with filters
srake search "breast cancer" --organism "homo sapiens" --limit 100

Advanced Feature Mapping

Batch Operations

Other tools often require scripting:

# pysradb
for acc in accession_list:
    db.sra_metadata(acc)

srake provides native batch support:

# Batch conversion
srake convert --batch accessions.txt --to GSE

# Batch download
srake download --list runs.txt --parallel 4

# Batch metadata
srake metadata SRX001 SRX002 SRX003 --format json

Filtering Capabilities

Most tools require post-processing:

# SRAdb - filter after retrieval
data <- getSRA(search_terms = "*")
filtered <- subset(data, organism == "Homo sapiens")

srake filters during ingestion:

# Filter at source - more efficient
srake ingest --auto \
  --organisms "homo sapiens" \
  --platforms ILLUMINA \
  --strategies RNA-Seq \
  --min-reads 10000000

Resume and Recovery

Other tools typically lack resume:

# pysradb - no built-in resume
# If interrupted, must restart from beginning

srake has intelligent resume:

# Automatic resume from interruption
srake ingest --file large_archive.tar.gz
# If interrupted, rerun same command to resume

Migration Guide

From SRAdb

Database setup:

# SRAdb: Download SQLite file
# srake: Ingest directly
srake ingest --auto

Query syntax:

# SRAdb: SQL queries
# srake: Simple CLI commands
srake search "your query"

Output formats:

# Both support multiple formats
srake search "query" --format json

From ffq

Metadata retrieval:

# ffq focuses on links
# srake provides full metadata
srake metadata SRR123456 --detailed

Download URLs:

# ffq shows URLs
# srake downloads directly
srake download SRR123456 --dry-run  # To see URLs
srake download SRR123456             # To download

From pysradb

Python to CLI:

# pysradb requires Python scripting
# srake works from command line
srake convert GSM123456 --to SRX

DataFrame to formats:

# pysradb returns DataFrames
# srake supports multiple formats
srake search "query" --format csv

Unique srake Advantages

1. Streaming Architecture

Process 14GB+ files with minimal RAM
No need to extract archives to disk
Zero-copy data transfer

2. Checkpoint System

Resume from exact interruption point
Track progress across sessions
No duplicate processing

3. Integrated Filtering

Filter during ingestion, not after
Reduce database size
Faster subsequent queries

4. Unified CLI

Single tool for all operations
Consistent command structure
No language-specific setup

Performance Comparison

Operation	srake	SRAdb	pysradb
14GB Archive Ingestion	15 min	45 min*	35 min*
Memory Usage	200MB	8GB+	4GB+
Resume Support	✅	❌	❌
Concurrent Processing	✅	❌	Limited

*Requires full extraction to disk first

API Endpoints

For tools that need programmatic access, srake provides REST API equivalents:

# Start API server
srake server --port 8080

# Query endpoints
curl "http://localhost:8080/api/search?q=cancer"
curl "http://localhost:8080/api/metadata/SRP123456"
curl "http://localhost:8080/api/convert?from=SRP123456&to=GSE"

Conclusion

srake combines the best features of existing tools while adding unique capabilities like streaming processing, checkpoint recovery, and integrated filtering. It provides a unified, efficient solution for SRA metadata management that scales from small queries to massive dataset processing.

API Reference CLI Reference