API Reference
REST API Reference
SRAKE provides a comprehensive RESTful API with OpenAPI 3.0 specification for programmatic access to all search and metadata functionality.
Quick Start
# Start the API server
srake server --port 8082 --enable-cors --enable-mcp
# Test the API
curl "http://localhost:8082/api/v1/search?query=RNA-Seq&limit=5"
# Check API health
curl "http://localhost:8082/api/v1/health"
OpenAPI Specification
The complete API specification is available in openapi.yaml
which can be imported into:
- Swagger UI for interactive documentation
- Postman for testing
- OpenAPI Generator for client SDK generation
Base URL
http://localhost:8082/api/v1
Server Configuration
# Basic server start
srake server
# With custom configuration
srake server \
--port 8082 \
--host 0.0.0.0 \
--enable-cors \
--enable-mcp
# Using environment variables
SRAKE_DB_PATH=/path/to/db.sqlite \
SRAKE_INDEX_PATH=/path/to/index \
srake server
Authentication
The API currently does not require authentication. For production deployments, authentication can be added via reverse proxy.
Core Endpoints
Search Operations
GET /api/v1/search
Perform advanced searches with quality control and multiple search modes.
Query Parameters:
Parameter | Type | Description | Default | Example |
---|---|---|---|---|
query | string | Search query text | - | RNA-Seq |
limit | integer | Maximum results (1-1000) | 20 | 10 |
offset | integer | Pagination offset | 0 | 0 |
organism | string | Filter by organism | - | homo sapiens |
library_strategy | string | Filter by strategy | - | RNA-Seq |
platform | string | Filter by platform | - | ILLUMINA |
search_mode | string | Search mode: database , fts , hybrid , vector | hybrid | hybrid |
similarity_threshold | float | Min similarity (0-1) | - | 0.7 |
min_score | float | Minimum score | - | 2.0 |
top_percentile | integer | Top N% of results | - | 10 |
show_confidence | boolean | Include confidence level | false | true |
Example Requests:
# Simple search
curl "http://localhost:8082/api/v1/search?query=breast%20cancer"
# Advanced search with quality control
curl "http://localhost:8082/api/v1/search?query=RNA-Seq&\
similarity_threshold=0.7&\
show_confidence=true&\
organism=homo%20sapiens"
# Vector semantic search
curl "http://localhost:8082/api/v1/search?query=tumor%20gene%20expression&\
search_mode=vector&\
similarity_threshold=0.8"
Response Example:
{
"results": [
{
"id": "SRX22037872",
"type": "experiment",
"title": "RNA-Seq of breast cancer cells",
"organism": "Homo sapiens",
"platform": "ILLUMINA",
"library_strategy": "RNA-Seq",
"score": 8.5,
"similarity": 0.92,
"confidence": "high"
}
],
"total_results": 150,
"query": "breast cancer",
"time_taken_ms": 125,
"search_mode": "hybrid"
}
POST /api/v1/search
Advanced search with complex filters in request body.
Request Body:
{
"query": "breast cancer transcriptome",
"filters": {
"organism": "homo sapiens",
"library_strategy": "RNA-Seq",
"platform": "ILLUMINA"
},
"limit": 10,
"search_mode": "hybrid",
"similarity_threshold": 0.8,
"show_confidence": true
}
Metadata Endpoints
GET /api/v1/studies
List all studies with pagination.
curl "http://localhost:8082/api/v1/studies?limit=10&offset=0"
GET /api/v1/studies/{accession}
Get detailed study information.
curl "http://localhost:8082/api/v1/studies/SRP259537"
GET /api/v1/studies/{accession}/experiments
Get all experiments for a study.
curl "http://localhost:8082/api/v1/studies/SRP259537/experiments"
GET /api/v1/studies/{accession}/samples
Get all samples for a study.
curl "http://localhost:8082/api/v1/studies/SRP259537/samples"
GET /api/v1/studies/{accession}/runs
Get all runs for a study.
curl "http://localhost:8082/api/v1/studies/SRP259537/runs?limit=100"
GET /api/v1/experiments/{accession}
Get experiment details.
curl "http://localhost:8082/api/v1/experiments/SRX22037872"
GET /api/v1/samples/{accession}
Get sample details.
curl "http://localhost:8082/api/v1/samples/SRS19840123"
GET /api/v1/runs/{accession}
Get run details.
curl "http://localhost:8082/api/v1/runs/SRR25889421"
Statistics
GET /api/v1/stats
Get comprehensive database statistics.
curl "http://localhost:8082/api/v1/stats"
Response Example:
{
"total_documents": 1234567,
"indexed_documents": 1234000,
"index_size": 2147483648,
"last_indexed": "2024-12-19T10:30:00Z",
"top_organisms": [
{"name": "Homo sapiens", "count": 450000, "percentage": 36.5},
{"name": "Mus musculus", "count": 280000, "percentage": 22.7}
],
"top_platforms": [
{"name": "ILLUMINA", "count": 950000, "percentage": 77.0}
],
"top_strategies": [
{"name": "RNA-Seq", "count": 400000, "percentage": 32.4}
]
}
Export
POST /api/v1/export
Export search results in various formats.
Supported Formats:
json
- Standard JSONcsv
- Comma-separated valuestsv
- Tab-separated valuesxml
- XML formatjsonl
- Newline-delimited JSON
Request Body:
{
"query": "RNA-Seq human",
"format": "csv",
"limit": 100,
"filters": {
"organism": "homo sapiens"
}
}
Example:
# Export as CSV
curl -X POST http://localhost:8082/api/v1/export \
-H "Content-Type: application/json" \
-d '{"query":"RNA-Seq","format":"csv","limit":100}' \
-o results.csv
Health Monitoring
GET /api/v1/health
Check service health status.
curl "http://localhost:8082/api/v1/health"
Response:
{
"status": "healthy",
"timestamp": "2024-12-19T10:30:00Z",
"search_service": "healthy",
"metadata_service": "healthy"
}
MCP (Model Context Protocol)
SRAKE supports MCP for AI assistant integration using JSON-RPC 2.0.
MCP Capabilities
curl "http://localhost:8082/mcp/capabilities"
MCP Tools
Available tools for AI assistants:
search_sra
- Search with quality controlget_metadata
- Get detailed metadatafind_similar
- Vector similarity searchexport_results
- Export in various formats
Example MCP Request:
curl -X POST http://localhost:8082/mcp \
-H "Content-Type: application/json" \
-d '{
"jsonrpc": "2.0",
"method": "tools/call",
"params": {
"name": "search_sra",
"arguments": {
"query": "breast cancer RNA-Seq",
"limit": 5,
"similarity_threshold": 0.7
}
},
"id": 1
}'
Search Modes
SRAKE supports multiple search modes for different use cases:
Mode | Description | Use Case |
---|---|---|
database | Direct SQL queries | Exact matching, filters only |
fts | Full-text search with Bleve | Text search with highlights |
hybrid | Combines database and FTS | Best of both worlds (default) |
vector | Semantic search with embeddings | Conceptual similarity |
Quality Control
Control search quality with these parameters:
- Similarity Threshold (0-1): Minimum similarity score for results
- Min Score: Minimum absolute score requirement
- Top Percentile: Return only top N% of results
- Confidence Levels:
high
(>0.8),medium
(0.5-0.8),low
(<0.5)
Client Libraries
Python
import requests
import pandas as pd
# Search API
response = requests.get('http://localhost:8082/api/v1/search', params={
'query': 'breast cancer',
'organism': 'homo sapiens',
'library_strategy': 'RNA-Seq',
'similarity_threshold': 0.7,
'show_confidence': True,
'limit': 100
})
data = response.json()
df = pd.DataFrame(data['results'])
print(f"Found {data['total_results']} results")
print(f"Top result confidence: {df.iloc[0]['confidence']}")
JavaScript/TypeScript
// Using fetch API
const params = new URLSearchParams({
query: 'cancer',
organism: 'homo sapiens',
search_mode: 'hybrid',
similarity_threshold: '0.8',
limit: '50'
});
fetch(`http://localhost:8082/api/v1/search?${params}`)
.then(res => res.json())
.then(data => {
console.log(`Found ${data.total_results} results`);
console.log(`Search took ${data.time_taken_ms}ms`);
data.results.forEach(result => {
console.log(`${result.id}: ${result.title} (${result.confidence})`);
});
});
R
library(httr)
library(jsonlite)
# Search with quality control
response <- GET("http://localhost:8082/api/v1/search",
query = list(
query = "RNA-Seq",
organism = "homo sapiens",
similarity_threshold = 0.7,
show_confidence = TRUE,
limit = 100
))
data <- fromJSON(content(response, "text"))
print(paste("Total results:", data$total_results))
print(paste("Search mode:", data$search_mode))
# Convert to dataframe
df <- as.data.frame(data$results)
curl/bash
#!/bin/bash
# Search and process results
curl -s "http://localhost:8082/api/v1/search?\
query=cancer&\
similarity_threshold=0.8&\
show_confidence=true&\
format=json" | \
jq -r '.results[] |
select(.confidence == "high") |
"\(.id)\t\(.organism)\t\(.library_strategy)\t\(.confidence)"' | \
while IFS=$'\t' read -r acc org strat conf; do
echo "Processing $acc ($org, $strat) with $conf confidence"
# Further processing...
done
Error Handling
All endpoints return appropriate HTTP status codes:
Status Code | Description |
---|---|
200 OK | Successful request |
400 Bad Request | Invalid parameters |
404 Not Found | Resource not found |
500 Internal Server Error | Server error |
503 Service Unavailable | Service unhealthy |
Error responses include detailed information:
{
"error": true,
"message": "Invalid search parameters",
"status": 400
}
Performance Tips
- Use specific filters to reduce result set size
- Enable pagination with
limit
andoffset
for large results - Set quality thresholds to get only high-confidence results
- Use appropriate search mode for your use case
- Cache frequently accessed data on the client side
CORS Configuration
CORS is enabled with the --enable-cors
flag:
srake server --enable-cors
For production, configure specific allowed origins via reverse proxy.
Rate Limiting
Rate limiting is not implemented in the core API. For production:
- Use a reverse proxy (nginx, Caddy)
- Implement API key authentication
- Add per-client rate limits
Migration Guide
From pysradb
# pysradb (old)
from pysradb import SRAdb
db = SRAdb()
df = db.search("cancer", detailed=True)
# SRAKE API (new)
import requests
import pandas as pd
response = requests.get('http://localhost:8082/api/v1/search',
params={'query': 'cancer', 'limit': 1000})
df = pd.DataFrame(response.json()['results'])
From SRAdb (R)
# SRAdb (old)
library(SRAdb)
sqlfile <- getSRAdbFile()
con <- dbConnect(SQLite(), sqlfile)
rs <- dbGetQuery(con, "SELECT * FROM sra WHERE organism = 'Homo sapiens'")
# SRAKE API (new)
library(httr)
library(jsonlite)
response <- GET("http://localhost:8082/api/v1/search",
query = list(organism = "homo sapiens"))
rs <- fromJSON(content(response, "text"))$results
Advanced Features
Hybrid Search Weighting
Control the balance between database and FTS results:
curl "http://localhost:8082/api/v1/search?\
query=cancer&\
search_mode=hybrid&\
hybrid_weight=0.7" # 70% FTS, 30% database
Field-Specific Export
Export only specific fields:
{
"query": "RNA-Seq",
"format": "csv",
"fields": ["id", "title", "organism", "platform"],
"limit": 100
}
Batch Processing
Process multiple queries efficiently:
#!/bin/bash
queries=("cancer" "diabetes" "alzheimer")
for q in "${queries[@]}"; do
curl -s "http://localhost:8082/api/v1/search?query=$q&limit=10" \
> "${q}_results.json"
done