Cluster mode

*The interpretation of the Advanced part is only available in EN.

Overview

SAW now supports submitting analysis tasks in cluster mode on Linux systems, compatible with the SGE job scheduling tool. Cluster mode allows analysts to execute tasks in parallel in high-performance computing environments, maximizing cluster resources and improving analysis efficiency.

Cluster mode support: Define default job submission rules and resource requirements through configuration files.
Resource allocation optimization: Dynamically adjust thread count, memory size, and memory retry factors to optimize resource utilization.
Task management: supports job submission, logging, and task interruption.
Parallel computing: supports intra-rule and inter-rule parallel computing, suitable for large-scale data analysis.

Enabling Cluster Mode

Enable cluster mode by specifying --job-mode=sge when running tasks. If cluster mode is not specified, the local mode is used by default.

Resource configuration

A resource configuration file is used to define default job submission rules and resource requirements. A resource file, /path/to/software/package/saw/config/resources.yaml, is typically in YAML format and includes the following sections:

Default resource configuration: defines global defaults for rules without explicit configurations.
Rule-specific resource configuration:
- thread count,
- memory size (GB),
- maximum of times to retry,
- memory retry factors (the memory allocated for the next retry will be increased by a specified multiple) for specific rules.

If the default value of a field is null, it indicates that the program will automatically calculate the resources required for this component in real-time. Typically, these components do not consume excessive computing resources.

Example file of resource configuration yaml :

version: "1.0"

global:
  scheduler: "local"  #choice: local, sge
  default_setting:
    max_retries: 0

schedulers:  #optional
  sge:
    queue: " "  #if none, default queue will be used

-------------------------------------------------------
#optional rule settings, pick needed ones
rules:
  generate_mask:
    threads: 1
    mem_gb: 1.5
    retry_factor: 2
  generate_cid_count:
    threads: 16
    mem_gb: 1 
    retry_factor: 2
  alignment:  #read alignment
    threads: null 
    mem_gb: null  #from CID counts
    retry_factor: 1.5
  annotation:  #gene annotation
    threads: null  #lowest 8 
    mem_gb: 70 
    retry_factor: 1.5
  merge_cid_info:
    threads: 10
    mem_gb: 3
    retry_factor: 2
  image_registration:  #image-related processing
    threads: null
    mem_gb: 20
    retry_factor: 1.5
  tissue_cut:
    threads: 10
    mem_gb: 12
    retry_factor: 2
  microbe_analysis:
    threads: 10
    mem_gb: 66
    retry_factor: 2
  microbe_tissue_cut:
    threads: 10
    mem_gb: 11
    retry_factor: 2
  clustering:  #clustering analysis based on matrices
    threads: 1
    mem_gb: 8
    retry_factor: 2
  cell_cut:
    threads: 1
    mem_gb: 15
    retry_factor: 2
  saturation: 
    threads: 1
    mem_gb: 7
    retry_factor: 2
  protein_mapping:
    threads: null
    mem_gb: 45
    retry_factor: 2
  protein_tissue_cut:
    threads: 10
    mem_gb: 8
    retry_factor: 2
  protein_clustering:
    threads: 11
    mem_gb: 3
    retry_factor: 2
  protein_cell_cut:
    threads: 1
    mem_gb: 15
    retry_factor: 2
  protein_saturation:
    threads: 1
    mem_gb: 10
    retry_factor: 2
  protein_remove_background:
    threads: 10
    mem_gb: 20
    retry_factor: 2
  protein_calculate_statistics:
    threads: 1
    mem_gb: 45
    retry_factor: 2
  generate_gef:
    threads: 1
    mem_gb: 15
    retry_factor: 2
  generate_report:
    threads: 1
    mem_gb: 15
    retry_factor: 2
  package_visualization:
    threads: 1
    mem_gb: 1
    retry_factor: 2

Default resource configuration is suitable for most Stereo-seq datasets (chip size <= 2*3). If your Stereo-seq sequencing data volume is large or you are working with a large Stereo-seq chip (chip size > 2*3), it is essential to adjust the resource configuration in the resources.yaml file.

Submit analysis

Enable SGE mode to submit SAW count analysis task:

cd /saw/runs

saw count \
    --id=SGE_test \
    --sn=C04144D5 \
    --omics=transcriptomics \
    --kit-version='Stereo-seq N FFPE V1.0' \
    --sequencing-type='PE75_25+59' \
    --organism=mouse \
    --tissue=brain \
    --chip-mask=./C04144D5.barcodeToPos.h5 \
    --fastqs=./reads \
    --reference=./mouse_transcriptome \
    --threads-num=96 \
    --memory=100 \
    --job-mode=sge

Or use a specified configuration YAML to start an analysis task:

Remember to adjust the scheduler to "sge" and set a queue if you are using a specific YAML file in SGE clustering mode.

cd /saw/runs

saw count \
    --id=SGE_with_a_specific_yaml_test \
    --sn=C04144D5 \
    --omics=transcriptomics \
    --kit-version='Stereo-seq N FFPE V1.0' \
    --sequencing-type='PE75_25+59' \
    --organism=mouse \
    --tissue=brain \
    --chip-mask=./C04144D5.barcodeToPos.h5 \
    --fastqs=./reads \
    --reference=./mouse_transcriptome \
    --threads-num=96 \
    --memory=100 \
    --job-mode=./specific_configuration_for_my_stereoseq_chip.yaml

Cluster mode

Cluster mode

Overview

Enabling Cluster Mode

Resource configuration

Submit analysis

results matching ""

No results matching ""