🌙 Dark mode

Demultiplexing & Analysis

After sequencing your pooled, barcoded library, the uSort-M CLI assigns reads to specific wells, generates per-well consensus sequences, and performs variant calling. The pipeline uses Dorado for barcode demultiplexing with the LevSeq barcode system, minimap2 for reference alignment, and samtools for consensus generation.

Overview

The analysis workflow consists of three main steps:

  1. Demultiplexing (usortm demux): Assign reads to wells via LevSeq barcodes, align to references, generate per-well consensus
  2. Hit picking (usortm pick): Generate cherry-picking lists for Integra ASSIST PLUS robots
  3. Reporting (usortm report): Generate QC reports, plate maps, and coverage statistics

Pipeline Stages

The usortm demux command runs a multi-stage pipeline:

  1. Reference alignment: Align reads to your variant library with minimap2 to determine read direction
  2. Strand splitting: Separate forward and reverse reads (required because LevSeq barcodes NB13–NB96 and RB13–RB96 are reverse complements)
  3. Forward barcode (FBC) demux: Dorado identifies the forward barcode (96 barcodes = plate columns)
  4. Reverse barcode (RBC) demux: Dorado identifies the reverse barcode (4 per plate = plate quadrants)
  5. Well mapping: Combine FBC + RBC assignments to map each read to a 384-well plate position
  6. Consensus & variant calling: Generate per-well consensus sequences via samtools and identify variants

Prerequisites

The demux pipeline requires three external tools (see Getting Started for installation):

ToolMin VersionPurpose
dorado1.3+Barcode demultiplexing
minimap22.20+Reference alignment
samtools1.16+BAM processing & consensus

usortm auto-discovers dorado in common locations (~/Downloads/dorado-*/bin/, ~/.dorado/bin/). You can also set DORADO_PATH, MINIMAP2_PATH, or SAMTOOLS_PATH environment variables.

Input Files

Before running the CLI, ensure you have:

1. Sequencing Data (FASTQ)

Raw sequencing reads in FASTQ format from Plasmidsaurus, ONT, or other long-read platforms:

# Plain or gzipped FASTQ
reads.fastq
reads.fastq.gz

2. Library CSV or Reference FASTA

Provide your variant library for reference alignment and variant calling. You can use either format:

Option A: Library CSV (recommended — auto-converted to reference FASTA):

Name,Sequence
K44A,ATGGCTAAAGGTGCAGAACTGTTTACCGGT...
G45A,ATGGCTAAAGGTGAAGCACTGTTTACCGGT...
WT,ATGGCTAAAGGTGAAGAACTGTTTACCGGT...

Option B: Multi-entry reference FASTA (one entry per variant):

>K44A
ATGGCTAAAGGTGCAGAACTGTTTACCGGT...
>G45A
ATGGCTAAAGGTGAAGCACTGTTTACCGGT...

3. Project Directory

If you created a project with usortm plan, the project directory contains metadata and barcode config:

my_project/
├── usortm_project.json   # Project state and parameters
├── variants.csv          # Variant list
├── mask_config.toml      # Barcode flanking sequences (editable)
├── barcodes/             # Barcode assignments
└── sorting_instructions.md

Running Demultiplexing

Basic Usage

# Using a library CSV (auto-generates reference FASTA)
usortm demux my_project/ --fastq reads.fastq --library-csv variants.csv

Or with a pre-built reference FASTA:

usortm demux my_project/ --fastq reads.fastq --reference reference.fasta

Options

# Custom mask sequences for a different plasmid backbone
usortm demux my_project/ \
  --fastq reads.fastq \
  --library-csv variants.csv \
  --mask-config custom_masks.toml

# Adjust quality thresholds
usortm demux my_project/ \
  --fastq reads.fastq \
  --library-csv variants.csv \
  --min-reads 50 \
  --min-fraction 0.7

# Use multiple CPU cores for faster alignment
usortm demux my_project/ \
  --fastq reads.fastq \
  --library-csv variants.csv \
  --threads 8

See CLI Reference for all available options.

Mask Configuration

The barcode mask (flanking) sequences tell Dorado where to find barcodes within each read. The defaults are set for the cutinase expression vector. If you use a different plasmid backbone, edit mask_config.toml in your project directory (generated by usortm plan) or pass a custom file with --mask-config.

# mask_config.toml
[fbc]
mask1_front = "AATATAAATT"
mask1_rear  = "CTGAGATACCTACAGCGTGAGC"
mask2_front = "CAAGTGAGAAATCACCATGAGTGACG"
mask2_rear  = "ATAATTTATA"

[rbc]
mask1_front = "TATAAATTAT"
mask1_rear  = "CGTCACTCATGGTGATTTCTCACTTG"
mask2_front = "GCTCACGCTGTAGGTATCTCAG"
mask2_rear  = "AATTTATATT"

Output Files

The demux command generates output files in project/demux_output/:

1. Well Assignments (well_assignments.csv)

plate,well,reads,variant,consensus_fraction
1,A1,487,K44A,0.95
1,A2,312,G45A,0.88
1,B1,156,WT,0.92

2. Demux Summary (demux_summary.json)

Per-stage read counts showing the demultiplexing funnel:

{
  "input_reads": 2400000,
  "aligned_reads": 2100000,
  "demuxed_reads": 1850000,
  "assigned_reads": 1650000,
  "wells_with_data": 725,
  "wells_passing": 680
}

3. Per-Read Table (read_df.csv)

Detailed per-read assignments with barcode and reference information.

4. Per-Well Summary (well_df.csv)

Aggregated well-level statistics including read depth and dominant variant.

Generating Reports

Create summary reports from demux results:

usortm report my_project/

Output formats (use --format to select):

  • HTML — Interactive summary with coverage statistics and read depth
  • CSV — Plate maps, final variant-to-well mapping, missing variants list
  • JSON — Machine-readable report with all metrics

Generating Pick Lists

Create a cherry-picking list for sequence-verified clones:

# Pick one well per unique variant (default)
usortm pick my_project/

# Pick all wells (including duplicates)
usortm pick my_project/ --all-hits

# Custom transfer volume
usortm pick my_project/ --volume 3.0

# Target 96-well plates instead of 384
usortm pick my_project/ --target-format 96

The pick list is ordered to match your input library CSV, so variants appear in the same order you defined them. For each variant, the well with the highest read count is selected.

Output format (semicolon-delimited CSV for Integra ASSIST PLUS):

SampleID;SourcePlateID;SourceWell;TargetPlateID;TargetWell;TransferVolume
K44A;1;K23;0;A1;5.0
G45A;1;A11;0;B1;5.0
T203Y;2;C5;0;C1;5.0

Quality Control

Expected Metrics

Metric Good Acceptable Poor
Well assignment rate >70% 50-70% <50%
Wells with data >70% 50-70% <50%
Mean reads per well >50 20-50 <20
Variant recovery >90% 75-90% <75%

Troubleshooting

Problem Possible Cause Solution
Low assignment rate Incorrect mask sequences for your backbone Edit mask_config.toml to match your plasmid flanking regions
Few wells with data Low sequencing depth, PCR failures Check sequencing QC, re-run with more cycles
Uneven read distribution Pooling bias, amplification artifacts Normalize inputs before pooling, reduce PCR cycles
Low alignment rate Reference mismatch, wrong library CSV Verify reference FASTA matches your actual library sequences
Multiple variants per well Contamination, poor FACS gating Check single-cell mode, use purity gating
Low variant recovery Library skew, insufficient oversampling Increase fold sampling, check transformation scale

Complete Workflow Example

# 1. Plan experiment
usortm plan variants.csv --output my_experiment/

# 2. (Optional) Edit mask_config.toml if using a non-cutinase backbone
# vi my_experiment/mask_config.toml

# 3. Perform wet lab steps (assembly, sorting, barcoding, sequencing)
# ...

# 4. Demultiplex sequencing data
usortm demux my_experiment/ \
  --fastq sequencing_data.fastq.gz \
  --library-csv variants.csv \
  --threads 8

# 5. Generate hit-picking list
usortm pick my_experiment/

# 6. Generate comprehensive report
usortm report my_experiment/

# 7. Review report
open my_experiment/report/summary.html

Python API

For programmatic access and custom analysis pipelines, see the Python API documentation.