CLI Reference

SRAKE CLI Reference

Complete reference for all SRAKE (SRA Knowledge Engine) commands and options.

Global Flags

These flags are available for all commands:

--no-color - Disable colored output
-v, --verbose - Enable verbose output
-q, --quiet - Suppress non-error output
-y, --yes - Assume yes to all prompts (non-interactive mode)
--debug - Enable debug output for troubleshooting
--help - Show help for any command

Commands

`srake ingest`

Ingest SRA metadata from NCBI or local archives.

srake ingest [flags]

Flags

--auto - Auto-select the best file from NCBI
--daily - Ingest the latest daily update
--monthly - Ingest the latest monthly dataset
--file <path> - Ingest specific file (local or NCBI)
--list - List available files without ingesting
--db <path> - Database path (default: “~/.local/share/srake/srake.db”)
--force - Force ingestion even if data exists
--no-progress - Disable progress bar

Filtering Flags

--taxon-ids <ids> - Filter by taxonomy IDs (comma-separated)
--exclude-taxon-ids <ids> - Exclude taxonomy IDs
--date-from <YYYY-MM-DD> - Start date for filtering
--date-to <YYYY-MM-DD> - End date for filtering
--organisms <names> - Filter by organism names
--platforms <names> - Filter by platforms (ILLUMINA, OXFORD_NANOPORE, etc.)
--strategies <names> - Filter by library strategies (RNA-Seq, WGS, etc.)
--min-reads <n> - Minimum read count filter
--max-reads <n> - Maximum read count filter
--stats-only - Only show statistics without inserting data

Examples

# Auto-ingest best file
srake ingest --auto

# Non-interactive ingest (no prompts)
srake ingest --auto --yes

# Ingest with filters
srake ingest --auto --taxon-ids 9606 --platforms ILLUMINA --strategies RNA-Seq

# List available files
srake ingest --list

# Debug mode to see detailed processing
srake ingest --auto --debug

`srake search`

Search SRA metadata with quality control and multiple search modes.

srake search <query> [flags]

Search Flags

-o, --organism <name> - Filter by organism
--platform <name> - Filter by platform
--library-strategy <name> - Filter by library strategy
-l, --limit <n> - Maximum results (default: 100)
--offset <n> - Pagination offset
--search-mode <mode> - Search mode: database|fts|hybrid|vector (default: hybrid)

Quality Control Flags

--similarity-threshold <float> - Minimum similarity score (0-1)
--min-score <float> - Minimum absolute score
--top-percentile <int> - Return only top N% of results
--show-confidence - Include confidence level in results

Output Flags

-f, --format <type> - Output format (table|json|csv|tsv|xml)
--output <file> - Save results to file
--no-header - Omit header in output
--fields <list> - Comma-separated list of fields to include

Examples

# Basic search
srake search "breast cancer"

# Search with quality control
srake search "RNA-Seq" --similarity-threshold 0.7 --show-confidence

# Vector semantic search
srake search "tumor gene expression" --search-mode vector

# Advanced filtering
srake search "transcriptome" \
  --organism "homo sapiens" \
  --library-strategy RNA-Seq \
  --platform ILLUMINA \
  --top-percentile 10

# Export filtered results
srake search "cancer" --format csv --output results.csv

`srake convert`

Convert between different accession types (SRA, GEO, BioProject, BioSample).

srake convert [<accession> ...] [flags]

Flags

--to <type> - Target accession type (required)
- Options: GSE, SRP, SRX, GSM, SRR, SRS, PRJNA, BIOSAMPLE
-f, --format <type> - Output format (table|json|yaml|csv|tsv)
-o, --output <file> - Save results to file
--batch <file> - Read accessions from file
--dry-run - Preview conversions without executing

Examples

# Convert SRA Project to GEO Series
srake convert SRP123456 --to GSE

# Convert multiple accessions
srake convert SRP001 SRP002 SRP003 --to GSE

# Batch conversion from file
srake convert --batch accessions.txt --to SRX --output results.json

# Convert from stdin (pipe-friendly)
echo "SRP123456" | srake convert --to GSE
cat accession_list.txt | srake convert --to GSM --format json

# Preview conversion without executing
srake convert SRP123456 --to GSE --dry-run

# Debug mode to see conversion details
srake convert SRP123456 --to GSE --debug

Supported Conversions

From	To	Description
SRP	GSE, SRX, SRR, SRS, PRJNA	Study to related accessions
SRX	GSM, SRP, SRR, SRS	Experiment to related accessions
SRR	SRX, SRP, GSM	Run to parent accessions
SRS	SRX, GSM, BIOSAMPLE	Sample to related accessions
GSE	SRP, GSM	GEO Series to SRA/samples
GSM	SRX, SRR, GSE	GEO Sample to SRA/series
PRJNA	SRP	BioProject to SRA Project
SAMN	SRS	BioSample to SRA Sample

`srake runs`

Get all runs for a study, experiment, or sample.

srake runs <accession> [flags]

Flags

-d, --detailed - Include detailed information
-f, --format <type> - Output format (table|json|yaml|csv|tsv)
-o, --output <file> - Save results to file
-l, --limit <n> - Limit number of results
--fields <list> - Comma-separated list of fields

Examples

# Get runs for a study
srake runs SRP123456

# Get detailed run information
srake runs SRX123456 --detailed

# Export as JSON
srake runs SRP123456 --format json --output runs.json

`srake samples`

Get all samples for a study or experiment.

srake samples <accession> [flags]

Flags

-d, --detailed - Include organism and taxonomy information
-f, --format <type> - Output format (table|json|yaml|csv|tsv)
-o, --output <file> - Save results to file
-l, --limit <n> - Limit number of results

Examples

# Get samples for a study
srake samples SRP123456

# Get detailed sample information
srake samples SRP123456 --detailed

# Export as CSV
srake samples SRX123456 --format csv --output samples.csv

`srake experiments`

Get all experiments for a study or sample.

srake experiments <accession> [flags]

Flags

-d, --detailed - Include platform and library information
-f, --format <type> - Output format (table|json|yaml|csv|tsv)
-o, --output <file> - Save results to file
-l, --limit <n> - Limit number of results

Examples

# Get experiments for a study
srake experiments SRP123456

# Get experiments for a sample
srake experiments SRS123456 --detailed

`srake studies`

Get study information for any SRA accession.

srake studies <accession> [flags]

Flags

-d, --detailed - Include abstract and full metadata
-f, --format <type> - Output format (table|json|yaml|csv|tsv)
-o, --output <file> - Save results to file

Examples

# Get study from an experiment
srake studies SRX123456

# Get study from a run with details
srake studies SRR123456 --detailed

`srake download`

Download SRA data files from multiple sources.

srake download [<accession> ...] [flags]

Flags

-s, --source <type> - Download source (auto|ftp|aws|gcp|ncbi)
-t, --type <type> - File type (sra|fastq|fasta)
-o, --output <dir> - Output directory (default: “./”)
--threads <n> - Download threads per file (default: 1)
-p, --parallel <n> - Parallel downloads (default: 1)
--aspera - Use Aspera for high-speed transfer
-l, --list <file> - File containing accessions
--retry <n> - Number of retry attempts (default: 3)
--validate - Validate downloaded files (default: true)
--dry-run - Show what would be downloaded

Examples

# Basic download
srake download SRR123456

# Download from AWS with parallel transfers
srake download SRR123456 --source aws --threads 4

# Download all runs for a study
srake download SRP123456 --type fastq --output ./data/

# Batch download from file
srake download --list runs.txt --parallel 4

# Download from stdin (pipe-friendly)
echo "SRR123456" | srake download --type fastq
srake runs SRP123456 | srake download --parallel 4

# High-speed Aspera transfer
srake download SRR123456 --aspera

# Dry run to preview downloads
srake download SRP123456 --dry-run

# Non-interactive download (no prompts)
srake download SRP123456 --yes

# Debug mode for troubleshooting
srake download SRR123456 --debug

Automatic Expansion

The download command automatically expands:

SRP → all runs in the study
SRX → all runs in the experiment
SRS → all runs for the sample

`srake metadata`

Get detailed metadata for specific accessions.

srake metadata <accession> [accessions...] [flags]

Flags

-f, --format <type> - Output format (table|json|yaml)
--fields <list> - Comma-separated list of fields
--expand - Expand nested structures

Examples

# Get metadata for an experiment
srake metadata SRX123456

# Get multiple accessions as JSON
srake metadata SRX123456 SRX123457 --format json

# Select specific fields
srake metadata SRR999999 --fields title,platform,strategy

`srake index`

Manage search index for fast full-text and vector search.

srake index [flags]

Index Operations

--build - Build search index from database
--rebuild - Rebuild index from scratch (removes existing)
--verify - Verify index integrity
--stats - Show index statistics
--resume - Resume interrupted index building

Index Options

--batch-size <n> - Documents per batch (default: 1000)
--workers <n> - Number of parallel workers
--path <dir> - Index directory path
--with-embeddings - Build vector embeddings for semantic search
--embedding-model <name> - Model for embeddings (default: SapBERT)
--progress - Show progress bar
--progress-file <file> - Save progress to file
--checkpoint-dir <dir> - Directory for checkpoints

Examples

# Build search index with progress
srake index --build --progress

# Build with vector embeddings for semantic search
srake index --build --with-embeddings

# Build with custom batch size and path
srake index --build --batch-size 5000 --path /custom/index

# Build embeddings with quantized model (faster, less memory)
SRAKE_MODEL_VARIANT=quantized srake index --build --with-embeddings

# Resume interrupted build
srake index --resume

# Rebuild from scratch
srake index --rebuild

# Verify index integrity
srake index --verify

# Show index statistics
srake index --stats

`srake server`

Start the API server for programmatic access and AI integration.

srake server [flags]

Flags

-p, --port <n> - Port to listen on (default: 8080)
--host <addr> - Host to bind to (default: localhost)
--enable-cors - Enable CORS for web access
--enable-mcp - Enable Model Context Protocol for AI assistants
--db <path> - Database path
--index-path <path> - Search index path
--log-level <level> - Log level (debug|info|warn|error)

Examples

# Start server with all features
srake server --port 8082 --enable-cors --enable-mcp

# Custom database and index
srake server --db /path/to/db --index-path /path/to/index

# Production deployment
srake server --host 0.0.0.0 --port 80 --enable-cors

# With environment variables
SRAKE_DB_PATH=test.db SRAKE_INDEX_PATH=/tmp/index srake server

API Endpoints

/api/v1/search - Search with quality control
/api/v1/stats - Database statistics
/api/v1/studies/{id} - Study metadata
/api/v1/export - Export search results
/api/v1/health - Service health check
/mcp - MCP JSON-RPC endpoint
/mcp/capabilities - MCP server capabilities

`srake db`

Database management commands.

srake db <subcommand> [flags]

Subcommands

info - Show database statistics and information
export - Export database to SRAmetadb format

Examples

# Show database statistics
srake db info

# Export to SRAmetadb format
srake db export -o SRAmetadb.sqlite

`srake db export`

Export the srake database to SRAmetadb.sqlite format for compatibility with tools expecting the original SRAmetadb schema.

srake db export [flags]

Flags

-o, --output <file> - Output database file path (default: “SRAmetadb.sqlite”)
--db <path> - Source database path (defaults to ~/.local/share/srake/srake.db)
--fts-version <n> - FTS version: 3 for compatibility, 5 for modern (default: 5)
--batch-size <n> - Batch size for data transfer (default: 10000)
--progress - Show progress bar (default: true)
--compress - Compress output with gzip
-f, --force - Overwrite existing output file

Examples

# Basic export with FTS5 (recommended)
srake db export -o SRAmetadb.sqlite

# Export with FTS3 for 100% compatibility
srake db export -o SRAmetadb.sqlite --fts-version 3

# Export from specific database
srake db export --db /path/to/srake.db -o SRAmetadb.sqlite

# Export with compression
srake db export -o SRAmetadb.sqlite.gz --compress

# Large dataset with custom batch size
srake db export -o SRAmetadb.sqlite --batch-size 50000

Output Schema

The exported database contains:

Standard tables: study, experiment, sample, run, submission
Denormalized table: sra (joins all tables for easy querying)
Full-text search: sra_ft (FTS3 or FTS5 virtual table)
Metadata: metaInfo (version and creation info)
Column descriptions: col_desc (field documentation)

Compatibility Notes

FTS5 (default): Modern, faster, smaller index size, better Unicode support
FTS3: Use for compatibility with older tools that require FTS3
The export maps srake’s modern schema to the classic SRAmetadb format
JSON fields are converted to pipe-delimited strings
Missing legacy fields are populated with appropriate defaults

`srake config`

Configuration and path management commands.

srake config <subcommand> [flags]

Subcommands

paths - Show all active paths and environment variables
show - Display current configuration
init - Initialize default configuration file
edit - Open configuration in editor

Flags (init)

--force - Overwrite existing configuration

Examples

# View all paths
srake config paths

# Initialize configuration
srake config init

# Edit configuration
srake config edit

# Show current config
srake config show

`srake cache`

Cache management commands for controlling disk usage.

srake cache <subcommand> [flags]

Subcommands

info - Show cache information and sizes
clean - Remove cache files

Flags (clean)

--all - Remove all cache including indices
--older <duration> - Remove files older than duration (e.g., 30d, 24h)
--search - Remove search result cache
--downloads - Remove downloaded files
--index - Remove search index (requires rebuild)

Examples

# View cache usage
srake cache info

# Clean downloads older than 30 days
srake cache clean --older 30d

# Remove all downloads
srake cache clean --downloads

# Clean everything (with confirmation)
srake cache clean --all

Output Formats

Most commands support multiple output formats:

table (default) - Human-readable table with colors
json - JSON format for programmatic use
yaml - YAML format
csv - Comma-separated values
tsv - Tab-separated values
xml - XML format (convert command only)

Environment Variables

Path Configuration

SRAKE_CONFIG_HOME - Override config directory (default: ~/.config/srake)
SRAKE_DATA_HOME - Override data directory (default: ~/.local/share/srake)
SRAKE_CACHE_HOME - Override cache directory (default: ~/.cache/srake)
SRAKE_STATE_HOME - Override state directory (default: ~/.local/state/srake)
SRAKE_DB_PATH - Override database path (default: ~/.local/share/srake/srake.db)
SRAKE_INDEX_PATH - Override search index path (default: ~/.cache/srake/index)
SRAKE_MODELS_PATH - Override models directory for embeddings

Search Configuration

SRAKE_MODEL_VARIANT - Model variant for embeddings: full|quantized (default: full)
SRAKE_DEFAULT_LIMIT - Default search result limit
SRAKE_SEARCH_MODE - Default search mode: database|fts|hybrid|vector

Output Control

NO_COLOR - Disable colored output globally
SRAKE_NO_COLOR - Disable colored output for srake
SRAKE_DEBUG - Enable debug output
SRAKE_VERBOSE - Enable verbose output

Cloud Configuration

AWS_REGION - Affects download source auto-selection
GCP_PROJECT - Affects download source auto-selection

Exit Codes

0 - Success
1 - General error
2 - Command line usage error
130 - Interrupted (Ctrl+C)

Getting Started with SRAKE Configuration & Paths