Automation & Scripting
Automation & Scripting Guide
srake is designed to work seamlessly in automated workflows and scripts, following clig.dev best practices for command-line interfaces.
Non-Interactive Mode
When running srake in scripts or CI/CD pipelines, use the --yes
flag to automatically accept all prompts:
#!/bin/bash
# Automated daily ingest script
srake ingest --auto --yes --quiet
Pipeline Composition with stdin
srake commands support stdin input, making them perfect for Unix pipelines:
Chaining Commands
# Find all RNA-Seq experiments and download them
srake search "RNA-Seq" --format tsv | \
cut -f1 | \
srake download --type fastq --parallel 4
# Convert a list of accessions
cat accessions.txt | srake convert --to GSE --format json > converted.json
# Process search results through multiple tools
srake search "homo sapiens" --limit 1000 | \
grep "ILLUMINA" | \
cut -f1 | \
srake metadata --format json
Batch Processing
# Process accessions from a file
while IFS= read -r accession; do
srake convert "$accession" --to GSE --quiet
done < accessions.txt
# Or use stdin directly
cat accessions.txt | srake convert --to GSE --output results.json
Dry Run Mode
Test your commands without making changes using --dry-run
:
# Preview what would be downloaded
srake download SRP123456 --dry-run
# Check conversions before executing
echo -e "SRP001\nSRP002\nSRP003" | srake convert --to GSE --dry-run
Debugging Scripts
Use the --debug
flag to troubleshoot issues:
# Enable debug output for detailed logging
srake download SRR123456 --debug 2> debug.log
# Combine with verbose for maximum information
srake convert SRP123456 --to GSE --debug --verbose
Error Handling
srake follows Unix conventions for exit codes:
0
: Success1
: General error2
: Command line usage error
#!/bin/bash
set -e # Exit on any error
# Check if download succeeded
if srake download SRR123456 --yes --quiet; then
echo "Download successful"
else
echo "Download failed with exit code $?"
exit 1
fi
Output Formats for Scripts
Use structured output formats for easier parsing:
# JSON for complex processing
srake search "mouse" --format json | jq '.[] | .accession'
# TSV for simple column extraction
srake search "human" --format tsv --no-header | awk '{print $1}'
# CSV for spreadsheet tools
srake convert SRP123456 --to GSE --format csv > results.csv
Parallel Processing
Leverage GNU parallel for large-scale processing:
# Download multiple accessions in parallel
cat accessions.txt | parallel -j 4 srake download {} --yes --quiet
# Convert accessions in parallel
parallel -j 8 srake convert {} --to GSE ::: SRP001 SRP002 SRP003
Cron Jobs
Example cron job for automated daily ingestion:
# Daily SRA metadata update at 2 AM
0 2 * * * /usr/local/bin/srake ingest --daily --yes --quiet >> /var/log/srake.log 2>&1
# Weekly full ingest on Sundays
0 3 * * 0 /usr/local/bin/srake ingest --auto --yes --force >> /var/log/srake.log 2>&1
Docker Integration
Run srake in containerized environments:
# Non-interactive Docker execution
docker run -v $(pwd)/data:/data \
srake-image \
srake ingest --auto --yes --db /data/metadata.db
# Pipe data into containerized srake
cat accessions.txt | docker run -i srake-image \
srake convert --to GSE --format json
CI/CD Integration
GitHub Actions Example
name: Update SRA Metadata
on:
schedule:
- cron: '0 0 * * *' # Daily at midnight
workflow_dispatch:
jobs:
update:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Setup Go
uses: actions/setup-go@v4
with:
go-version: '1.25'
- name: Install srake
run: go install github.com/nishad/srake/cmd/srake@latest
- name: Update metadata
run: |
srake ingest --auto --yes --quiet
srake db info
- name: Upload database
uses: actions/upload-artifact@v3
with:
name: sra-metadata
path: ./data/metadata.db
GitLab CI Example
update-sra-metadata:
stage: data
script:
- srake ingest --auto --yes --quiet
- srake search "homo sapiens" --limit 100 --format json > latest_human.json
artifacts:
paths:
- ./data/metadata.db
- latest_human.json
only:
- schedules
Shell Functions
Create helpful shell functions for common tasks:
# Add to ~/.bashrc or ~/.zshrc
# Quick SRA to GEO conversion
sra2geo() {
echo "$1" | srake convert --to GSE --quiet | tail -1
}
# Download helper with defaults
sra_download() {
srake download "$@" --type fastq --parallel 4 --yes
}
# Search and count results
sra_count() {
srake search "$1" --format tsv --no-header | wc -l
}
# Usage
$ sra2geo SRP123456
GSE98765
$ sra_download SRR123456 SRR123457
# Downloads with optimized settings
$ sra_count "homo sapiens RNA-Seq"
1234
Best Practices
- Always use
--yes
in scripts to avoid hanging on prompts - Use
--quiet
to suppress non-essential output in production scripts - Enable
--debug
when developing to understand command behavior - Test with
--dry-run
first before running destructive operations - Check exit codes for proper error handling
- Use structured output formats (JSON/TSV) for reliable parsing
- Leverage stdin for composability with other Unix tools
- Set appropriate timeouts for network operations in CI/CD
Environment Variables
srake respects standard environment variables:
# Disable colored output
export NO_COLOR=1
# Custom database location
export SRAKE_DB=/custom/path/metadata.db
# Run with environment overrides
NO_COLOR=1 srake search "mouse" --format table
Logging
Redirect output streams for logging:
# Log errors only
srake ingest --auto 2> errors.log
# Log everything
srake ingest --auto --verbose > output.log 2>&1
# Separate stdout and stderr
srake search "human" > results.txt 2> errors.log
# Tee for both console and file
srake ingest --auto 2>&1 | tee -a srake.log
This guide ensures your srake automation is robust, maintainable, and follows Unix philosophy for maximum interoperability.