SRAKE - SRA Knowledge Engine

Quality-Controlled Search

Multiple search modes with similarity thresholds, confidence scoring, and vector embeddings

Comprehensive Filtering

Filter by organism, platform, library details, date ranges, and sequencing metrics

Aggregation & Analytics

Group results by field, get counts, and analyze metadata distributions

RESTful API & MCP

HTTP API with OpenAPI spec and Model Context Protocol for AI assistant integration

Streaming Architecture

Process 14GB+ archives with minimal memory using zero-copy streaming

Resume & Recovery

Intelligent checkpoint system for resuming interrupted operations

Quick Start with SRAKE

# Using Go
go install github.com/nishad/srake/cmd/srake@latest

# Using Homebrew
brew tap nishad/srake
brew install srake

# Using Docker
docker pull ghcr.io/nishad/srake:latest

# Auto-select and ingest
srake ingest --auto

# With filters
srake ingest --file archive.tar.gz \
  --taxon-ids 9606 \
  --platforms ILLUMINA \
  --strategies RNA-Seq

# Build search index
srake index --build --progress

# Build with vector embeddings
srake index --build --with-embeddings

# Verify index
srake index --stats

# Quality-controlled search
srake search "breast cancer" \
  --similarity-threshold 0.7 \
  --show-confidence

# Vector semantic search
srake search "tumor gene expression" \
  --search-mode vector

# Export results
srake search "RNA-Seq" --format json

# Start API server
srake server --port 8082 \
  --enable-cors \
  --enable-mcp

# Test API
curl "http://localhost:8082/api/v1/search?\
query=cancer&similarity_threshold=0.7"

SRA Knowledge Engine