bcbio-nextgen v1.1.1 Release Notes

Release Date: 2018-11-06 // over 5 years ago
    • ๐Ÿ‘ single-cell RNA-seq: add built-in support for 10x_v2.
    • ๐Ÿ›  Fix UMI support for small RNA. Compatible with Qiagen UMI small RNA protocol.
    • Ignore .Renviron when running Rscript to head-off PATH conflicts.
    • Support SRR ids to download samples with bcbio_prepare_samples script.
    • 0๏ธโƒฃ tumor-only prioritization: do not apply LowPriority filter by default, instead annotate with external databases. Use tumoronly_germline_filter to re-enable previous behavior.
    • 0๏ธโƒฃ UMIs: apply default filtering based on de-duplicated read depth. Uses --min-reads 2 with raw de-duplicated coverage of 800 or more or --min-reads 1 otherwise. Allows error correction with UMIs for higher depth samples.
    • 0๏ธโƒฃ gemini: databases no longer created by default. Use tools_on: [gemini] or tools_on: [gemini_orig] to create a database. We now use a reduced database for build 37 to match build 38 and make this forward compatible with CWL.
    • 0๏ธโƒฃ vcfanno: run gemini and somatic annotations by default, producing annotated VCFs with external information.
    • ๐Ÿ‘ alignment preparation: support a list of split files from multiple sequencing lanes, merging into a single fastq
    • ๐Ÿ‘ variant: support octopus variant caller for germline and somatic samples.
    • peddy: fix bug where not all files uploaded on first pipeline run
    • peddy: For somatic analyses use separate germline calls for tumor/normal, if available, or extracted germline calls from supported callers, instead of somatic variants.
    • ๐Ÿ‘ GATK: support ploidy specification during joint calling.
    • GATK BQSR: bin qualities into static groups (10, 20, 30) to match GATK4 recommendations. Thanks to Severine Catreux.
    • ๐Ÿ‘ GATK: support 4.0.10.0 which does not use UCSC 2bit references for Spark tools
    • ๐Ÿ‘ variant calling: support bcftools 1.9 which is more strict about duplicated key names in INFO and FORMAT.
    • seq2c: Upload global calls, coverage and read_mapping files to project directory.
    • RNA-seq variant calling: Apply annotations after joint calling for GATK to avoid import errors with GenomicsDB. Thanks to Komal Rathi.
    • โฌ†๏ธ CWL: add --cwl target to bcbio_nextgen.py upgrade to add and maintain bcbio-vm.
    • CWL: use standard null instead of string "null" for representing None values.
    • ๐Ÿ‘ CWL: support for heterogeneity and structural variant callers that make use of variant inputs.
    • ๐Ÿ‘ CWL: support ensemble calling for combining multiple variant callers.
    • ๐Ÿšš ensemble: remove no-ALT ref calls that contribute to incorrect ensemble outputs
    • RNA-seq: output a matrix of un-deduped UMI counts when doing single-cell/DGE for quality control purposes. This is called tagcounts-dupes.mtx in the final directory.
    • single-cell RNA-seq: allow pre-transformed FASTQ files as input to DGE/single-cell pipeline.
    • single-cell RNA-seq: only create one index per specified genome instead of per sample
    • fgbio: back compatibility for older quality setting --min-consensus-base-quality
    • RNA-seq: fix for fusion_caller getting interpreted as a path, leading to memoization/upload issues.
    • RNA-seq: memoize rRNA quality calculations, speeding up reruns.
    • RNA-seq: prefix description with an X if it starts with a number, for R compatibility. Thanks to Avinash Reddy and Dan Stetson at AstraZeneca.
    • single-cell RNA-seq: respect --positional flag with the new tag counting. Thanks to Babak Alaei at AstraZeneca.
    • 0๏ธโƒฃ RNA-seq: turn on --seqBias flag by default for Salmon as early-version overfitting issues have been fixed.
    • RNA-seq: report insert size from Salmon fragment distribution, not samtools stats.
    • RNA-seq: when processing explant samples, produce a combined tx2gene.csv file from all organisms processed.