Tool Compatibility

Tool Compatibility

Compatibility with Other Tools

SRAKE (SRA Knowledge Engine) provides comprehensive functionality that matches and extends popular SRA metadata tools. This guide shows how SRAKE commands map to equivalent operations in other tools.

Feature Comparison Matrix

FeatureSRAKESRAdbffqpysradbMetaSRA
Local Database
Streaming Processing
Accession Conversion
Multi-source Download
Relationship Queries
Batch Operations
Resume Capability
Filtering on Ingest
REST API
Aspera Support

Command Equivalents

SRAdb (R Package) → SRAKE

SRAdb:

# Convert SRP to GSE
sraConvert(in_acc = "SRP123456",
           out_type = "gse")

# Convert GSM to SRX
sraConvert(in_acc = "GSM123456",
           out_type = "srx")

srake:

# Convert SRP to GSE
srake convert SRP123456 --to GSE

# Convert GSM to SRX
srake convert GSM123456 --to SRX

SRAdb:

# Download SRA files
getSRAfile(in_acc = "SRR123456",
          method = "curl")

# Download FASTQ files
getFASTQfile(in_acc = "SRR123456",
            srcType = "ftp")

srake:

# Download SRA files
srake download SRR123456

# Download FASTQ files
srake download SRR123456 --type fastq

SRAdb:

# Search metadata
getSRA(search_terms = "breast cancer",
       out_types = c("study", "sample", "experiment"))

srake:

# Search metadata
srake search "breast cancer" --format json

SRAdb:

# Get SRA info
getSRAinfo(in_acc = "SRP123456",
          sra_con = sra_con)

srake:

# Get metadata
srake metadata SRP123456 --detailed

ffq → srake

ffq:

# Get metadata for an accession
ffq SRR123456

# Get metadata with specific depth
ffq -l 2 GSE123456

# Save to JSON
ffq -o metadata.json SRR123456

srake:

# Get metadata for an accession
srake metadata SRR123456

# Get related metadata
srake studies SRR123456 --detailed

# Save to JSON
srake metadata SRR123456 --format json --output metadata.json

ffq:

# Get FTP links
ffq --ftp SRR123456

# Get AWS links
ffq --aws SRR123456

# Get GCP links
ffq --gcp SRR123456

srake:

# Download from FTP
srake download SRR123456 --source ftp

# Download from AWS
srake download SRR123456 --source aws

# Download from GCP
srake download SRR123456 --source gcp

ffq:

# Query from multiple databases
ffq GSE123456  # Queries GEO
ffq SRR123456  # Queries SRA
ffq ENCSR000EYA # Queries ENCODE

srake:

# Convert between databases
srake convert GSE123456 --to SRP  # GEO to SRA
srake convert SRR123456 --to GSM  # SRA to GEO

# Direct metadata query
srake metadata GSE123456  # Handles any accession type

pysradb → srake

pysradb:

# Get metadata
from pysradb import SRAweb
db = SRAweb()
df = db.sra_metadata('SRP123456')

# Detailed metadata
df = db.sra_metadata('SRP123456', detailed=True)

srake:

# Get metadata
srake metadata SRP123456

# Detailed metadata
srake metadata SRP123456 --detailed --format json

pysradb:

# Download SRA files
db.download(df, protocol='fasp')

# Download with filters
db.download(df, filter_by_library_strategy='RNA-Seq')

srake:

# Download with Aspera
srake download SRP123456 --aspera

# Download with filters (filter during ingest)
srake ingest --auto --strategies RNA-Seq
srake download SRP123456

pysradb:

# GSM to SRP
srp = db.gsm_to_srp(['GSM123456'])

# SRP to GSE
gse = db.srp_to_gse(['SRP123456'])

# SRX to SRR
srr = db.srx_to_srr(['SRX123456'])

srake:

# GSM to SRP
srake convert GSM123456 --to SRP

# SRP to GSE
srake convert SRP123456 --to GSE

# SRX to SRR
srake runs SRX123456

pysradb:

# Search by study
results = db.search_by_study_title('cancer')

# Search experiments
results = db.search_sra_studies('breast cancer', max_results=100)

srake:

# Search studies
srake search "cancer" --limit 100

# Search with filters
srake search "breast cancer" --organism "homo sapiens" --limit 100

Advanced Feature Mapping

Batch Operations

Other tools often require scripting:

# pysradb
for acc in accession_list:
    db.sra_metadata(acc)

srake provides native batch support:

# Batch conversion
srake convert --batch accessions.txt --to GSE

# Batch download
srake download --list runs.txt --parallel 4

# Batch metadata
srake metadata SRX001 SRX002 SRX003 --format json

Filtering Capabilities

Most tools require post-processing:

# SRAdb - filter after retrieval
data <- getSRA(search_terms = "*")
filtered <- subset(data, organism == "Homo sapiens")

srake filters during ingestion:

# Filter at source - more efficient
srake ingest --auto \
  --organisms "homo sapiens" \
  --platforms ILLUMINA \
  --strategies RNA-Seq \
  --min-reads 10000000

Resume and Recovery

Other tools typically lack resume:

# pysradb - no built-in resume
# If interrupted, must restart from beginning

srake has intelligent resume:

# Automatic resume from interruption
srake ingest --file large_archive.tar.gz
# If interrupted, rerun same command to resume

Migration Guide

From SRAdb

  1. Database setup:

    # SRAdb: Download SQLite file
    # srake: Ingest directly
    srake ingest --auto
  2. Query syntax:

    # SRAdb: SQL queries
    # srake: Simple CLI commands
    srake search "your query"
  3. Output formats:

    # Both support multiple formats
    srake search "query" --format json

From ffq

  1. Metadata retrieval:

    # ffq focuses on links
    # srake provides full metadata
    srake metadata SRR123456 --detailed
  2. Download URLs:

    # ffq shows URLs
    # srake downloads directly
    srake download SRR123456 --dry-run  # To see URLs
    srake download SRR123456             # To download

From pysradb

  1. Python to CLI:

    # pysradb requires Python scripting
    # srake works from command line
    srake convert GSM123456 --to SRX
  2. DataFrame to formats:

    # pysradb returns DataFrames
    # srake supports multiple formats
    srake search "query" --format csv

Unique srake Advantages

1. Streaming Architecture

  • Process 14GB+ files with minimal RAM
  • No need to extract archives to disk
  • Zero-copy data transfer

2. Checkpoint System

  • Resume from exact interruption point
  • Track progress across sessions
  • No duplicate processing

3. Integrated Filtering

  • Filter during ingestion, not after
  • Reduce database size
  • Faster subsequent queries

4. Unified CLI

  • Single tool for all operations
  • Consistent command structure
  • No language-specific setup

Performance Comparison

OperationsrakeSRAdbpysradb
14GB Archive Ingestion15 min45 min*35 min*
Memory Usage200MB8GB+4GB+
Resume Support
Concurrent ProcessingLimited

*Requires full extraction to disk first

API Endpoints

For tools that need programmatic access, srake provides REST API equivalents:

# Start API server
srake server --port 8080

# Query endpoints
curl "http://localhost:8080/api/search?q=cancer"
curl "http://localhost:8080/api/metadata/SRP123456"
curl "http://localhost:8080/api/convert?from=SRP123456&to=GSE"

Conclusion

srake combines the best features of existing tools while adding unique capabilities like streaming processing, checkpoint recovery, and integrated filtering. It provides a unified, efficient solution for SRA metadata management that scales from small queries to massive dataset processing.