Filtering
Filters are applied during ingestion to control which records are imported into the database. This reduces database size and processing time.
Taxonomy filters
# Human only (NCBI taxonomy ID 9606)
srake ingest --auto --taxon-ids 9606
# Multiple species
srake ingest --auto --taxon-ids 9606,10090,7955
# By organism name
srake ingest --auto --organisms "Homo sapiens"
# Exclude specific taxa
srake ingest --auto --exclude-taxon-ids 32630Platform and library filters
# Illumina RNA-Seq
srake ingest --auto --platforms ILLUMINA --strategies RNA-Seq
# Multiple platforms
srake ingest --auto --platforms ILLUMINA,PACBIO_SMRT
# Multiple strategies
srake ingest --auto --strategies RNA-Seq,WGS,ChIP-SeqSupported platforms: ILLUMINA, PACBIO_SMRT, OXFORD_NANOPORE, ION_TORRENT, ABI_SOLID, LS454, BGISEQ, COMPLETE_GENOMICS, HELICOS, CAPILLARY.
Date range filters
# Records from 2024 only
srake ingest --auto --date-from 2024-01-01 --date-to 2024-12-31
# Everything after a date
srake ingest --auto --date-from 2023-06-01Sequencing metrics
# Minimum 10 million reads
srake ingest --auto --min-reads 10000000
# Read count range
srake ingest --auto --min-reads 1000000 --max-reads 100000000
# Minimum 1 billion bases
srake ingest --auto --min-bases 1000000000Combining filters
All filters use AND logic:
srake ingest --auto \
--taxon-ids 9606 \
--platforms ILLUMINA \
--strategies RNA-Seq \
--date-from 2024-01-01 \
--min-reads 10000000Preview mode
Use --stats-only to see what would be imported without actually inserting data:
srake ingest --auto --taxon-ids 9606 --stats-onlyUse --filter-verbose to see per-record filter decisions during ingestion.