bcbio-nextgen/CHANGELOG and bcbio-nextgen Releases

All Versions

Latest Version

1.2.9

Avg Release Cycle

65 days

Latest Release

864 days ago

Changelog History

Page 2

v1.1.9 Changes
December 06, 2019
- 🛠 Fix for get VEP cache.
- 👌 Support Picard's new syntax for ReorderSam (REFERENCE -> SEQUENCE_DICTIONARY).
- ✂ Remove mitochondrial reads from ChIP/ATAC-seq calling.
- ➕ Add documentation describing ATAC-seq outputs.
- ➕ Add ENCODE library complexity metrics for ATAC/ChIP-seq to MultiQC report
  👀 (see https://www.encodeproject.org/data-standards/terms/#library for a description of the metrics)
- ➕ Add STAR sample-specific 2-pass. This helps assign a moderate number of reads per genes. Thanks
  to @naumenko-sa for the intial implementation and push to get this going.
- 🛠 Index transcriptomes only once for pseudo/quasi aligner tools. This fixes race conditions that
  can happen.
- ➕ Add --buildversion option, for tracking which version of a gene build was used. This is used
  during bcbio_setup_genome.py. Suggested formats are source_version, so Ensembl_94,
  EnsemblMetazoa_25, FlyBase_26, etc.
- Sort MACS2 bedgraph files before compressing. Thanks to @LMannarino for the suggestion.
- 📇 Check for the reserved field sample in RNA-seq metadata and quit with a useful error message.
  Thanks to @marypiper for suggesting this.
- 🆓 Split ATAC-seq BAM files into nucleosome-free and mono/di/tri nucleosome files, so we can call
  peaks on them separately.
- Call peaks on NF/MN/DN/TN regions separately for each caller during ATAC-seq.
- 👍 Allow viral contamination to be assasyed on non tumor/normal samples.
- Ensure EBV coverage is calculated when run on genomes with it included as a contig.
v1.1.8 Changes
October 29, 2019
- ➕ Add antibody configuration option. Setting a specific antibody for ChIP-seq will use appropriate
  📚 settings for that antibody. See the documentation for supported antibodies.
- Add use_lowfreq_filter for forcing vardict to report variants with low allelic frequency,
  👉 useful for calling somatic variants in panels with high coverage.
- 🛠 Fix for checking for pre-existing inputs with python3.
- ➕ Add keep_duplicates option for ChIP/ATAC-seq which does not remove duplicates before peak calling.
  0️⃣ Defaults to False.
- ➕ Add keep_multimappers for ChIP/ATAC-seq which does not remove multimappers before peak calling.
  0️⃣ Defaults to False.
- ✂ Remove ethnicity as a required column in PED files.
v1.1.7 Changes
October 11, 2019
1.1.7
- 👍 hot fix for dataclasses not being supported in 3.6. Use namedtuple instead.
v1.1.6 Changes
October 10, 2019
- GATK ApplyBQSRSpark: avoid StreamClosed issue with GATK 4.1+
- 🛠 RNA-seq: fixes for cufflinks preparation due to python3 transition.
- RNA-seq: output count tables from tximport for genes and transcripts. These
  are in bcbioRNASeq/results/date/genes/counts and
  bcbioRNASeq/results/data/transcripts/counts.
- qualimap (RNA-seq): disable stranded mode for qualimap, as it gives incorrect
  results with the hisat2 aligner and for RNA-seq just setting it to unstranded
- Add quantify_genome_alignments option to use genome alignments to quantify
  with Salmon.
- ➕ Add --validateMappings flag to Salmon read quantification mode.
- VEP cache is not installing anymore from bcbio run
- ➕ Add support for Salmon SA method when STAR alignments are not available
  (for hg38).
- ➕ Add support for the new read model for filtering in Mutect2. This is
  experimental, and a little flaky, so it can optionally be turned on via:
  tools_on: mutect2_readmodel. Thanks to @lbeltrame for implementing this
  🔋 feature and doing a ton of work debugging.
- Swap pandas from_csv call to read_csv.
- 👉 Make STAR respect the transcriptome_gtf option.
- Prefix regular expression with r. Thanks to @smoe for finding all of these.
- ➕ Add informative logging messages at beginning of bcbio run. Includes the version
  🔧 and the configuration files being used.
- Swap samtools mpileup to use bcftools mpileup as samtools mpileup is being
  🚀 deprecated (https://github.com/samtools/samtools/releases/tag/1.9).
- 👍 Ensure locale is set to one supporting UTF-8 bcbio-wide. This may need to get
  ⏪ reverted if it introduces issues.
- ➕ Added hg38 support for STAR. We did this by taking hg38 and removing the alts,
  decoys and HLA sequences.
- ➕ Added support for the arriba fusion caller.
- ➕ Added back missing programs from the version provenance file. Fixed formatting
  problems introduced by switch to python3.
- ➕ Added initial support for whole genome bisulfite sequencing using bismark. Thanks to
  @hackdna for implementing this and @jnhutchinson for drafting the initial
  🚧 pipeline. This is a work in progress in collaboration with @gcampanella, who
  🔀 has a similar implementation with some extra features that we will be merging
  in soon.
- 0️⃣ qualimap for RNA-seq runs on the downsampled BAM files by default. Set
  tools_on: [qualimap_full] to run on the full BAM files.
- ➕ Add STAR junction files to the files captured at the end of a run.
v1.1.5 Changes
April 12, 2019
- 🛠 Fixes for Python3 incompatibilities on distributed IPython runs.
- Numerous smaller Python3 incompatibilities with strings/unicode and types. Thanks to the community for reporting these.
- GATK HaplotypeCaller: correctly apply skipping of marked duplicates only for amplicon runs. Thanks to Ben Liesfeld.
- 🛠 Fix format detection for bzip2 fastq inputs.
- 👌 Support latest GATK4 MuTect2 (4.1.1.0) with changes to ploidy and reference parameters.
- 👌 Support changes to GATK4 for VQSR --resource specification in 4.1.1.0. Thanks to Timothee Cezard.
- 👌 Support latest bedtools (2.28.0) which expects SAM heads for bgzipped BED inputs.
v1.1.4 Changes
April 03, 2019
- 🚚 Move to Python 3.6. A python2 environment in the install runs non python3 compatible programs. The codebase is still compatible with python 2.7 but will only get run and tested on python 3 for future releases.
- RNA-seq: fix for race condition when creating the pizzly cache
- RNA-seq: Add Salmon to multiqc report.
- RNA-seq single-cell/DGE: Properly strip transcript versions from GENCODE GTFs.
- RNA-seq: Faster and more flexible rRNA biotype lookup.
- ⚡️ Move to R3.5.1, including updates to all CRAN and Bioconductor packages.
- tumor-only germline prioritization: provide more useful germline filtering based on prioritization INFO tag (EPR) rather than filter field.
- Install: do not require fabric for tool and data installs, making full codebase compatible with python 3.
- variant: Filter out variants with missing ALT alleles output by GATK4.
- GATK: enable specification of spark specific parameters with gatk-spark resources.
- RNA-seq single-cell/DGE: added demultiplexed option. If set to True, treat the data as if it has already been demultiplexed into cells/wells.
- Multiple orders of magnitude faster templating with thousands of input files.
v1.1.3 Changes
January 29, 2019
- 👍 CNV: support background inputs for CNVkit, GATK4 CNV and seq2c. Allows pre-computed panel of normals for tumor-only or single sample CNV calling.
- variant: avoid race condition on processing input BED files for variant calling when no pre-specific variant_regions available.
- structural variation upload: avoid uploading multiple batched calls into sample directories. For lumpy will now have a single output per batch in a sample folder.
- install: respect pre-specified bioconda and conda-forge in condarc configuration. Allows use of custom package mirrors.
- 🚚 seq2c: move specialized pre-call calculation upstream to coverage estimation. Allows use of seq2c in CWL runs.
- 🚚 MultiQC upload: fix bug where results from parallel run not moved to final directory.
- GATK4 CNV: fix for standardize VCF output, correcting number of columns.
- RNA-seq variation: fix for over-filtering variants near splice junctions with STAR.
- Structural variant gene annotation: simplify and handle issues with multidirectional comparisons. Handle issues with out of order start/end from CNVkit.
- Catch and report unicode characters in templating or YAML descriptions.
v1.1.2 Changes
December 12, 2018
- VarDict low frequency somatic filters: generalize strand and mismatch based filter based on cross-validation to avoid over filtering on high depth panels.
- strelka2 joint calling: switch to improved gvcfgenotyper approach for calling from gVCFs.
- 👍 Heterogeneity: initial support for PureCN and TitanCNA heterogeneity analysis including reporting on LOH in HLA for human samples. Work in progress validations: https://github.com/bcbio/bcbio_validations/tree/master/TCGA-heterogeneity
- 👍 CNV: initial support for GATK4 CNV calling as alternative to CNVkit for tumor normal analyses
- VarDict RNA-seq variant calling: avoid structural variants with recent vardict-java.
- RNA-seq variation: filter RNA-seq variants close to splice junctions, supporting STAR and hisat2.
- RNA-seq variation: add snpEff effects to output variant calls. Thanks to Manasa Surakala.
- RNA-seq: gzip/bgzip FASTQ files in work/fastq instead of the original directory.
- 👉 use biobambam2 BAM to FASTQ conversion instead of Picard in all cases.
- 👍 Trimming: add built-in support for adapters from the SMARTer Universal Low Input RNA Kit (truseq2) and the Illumina NEXTera DNA prep kit from NEB (nextera2).
- ChIP/ATAC-seq: allow skipping duplicate marking.
- joint calling: ensure correct upload to final directory when no annotations present
- 👷 Logging: fix logging in parallel runs with new joblib loky backend. Thanks to Ben Liesfeld and Roland Ewald.
v1.1.1 Changes
November 06, 2018
- 👍 single-cell RNA-seq: add built-in support for 10x_v2.
- 🛠 Fix UMI support for small RNA. Compatible with Qiagen UMI small RNA protocol.
- Ignore .Renviron when running Rscript to head-off PATH conflicts.
- Support SRR ids to download samples with bcbio_prepare_samples script.
- 0️⃣ tumor-only prioritization: do not apply LowPriority filter by default, instead annotate with external databases. Use tumoronly_germline_filter to re-enable previous behavior.
- 0️⃣ UMIs: apply default filtering based on de-duplicated read depth. Uses --min-reads 2 with raw de-duplicated coverage of 800 or more or --min-reads 1 otherwise. Allows error correction with UMIs for higher depth samples.
- 0️⃣ gemini: databases no longer created by default. Use tools_on: [gemini] or tools_on: [gemini_orig] to create a database. We now use a reduced database for build 37 to match build 38 and make this forward compatible with CWL.
- 0️⃣ vcfanno: run gemini and somatic annotations by default, producing annotated VCFs with external information.
- 👍 alignment preparation: support a list of split files from multiple sequencing lanes, merging into a single fastq
- 👍 variant: support octopus variant caller for germline and somatic samples.
- peddy: fix bug where not all files uploaded on first pipeline run
- peddy: For somatic analyses use separate germline calls for tumor/normal, if available, or extracted germline calls from supported callers, instead of somatic variants.
- 👍 GATK: support ploidy specification during joint calling.
- GATK BQSR: bin qualities into static groups (10, 20, 30) to match GATK4 recommendations. Thanks to Severine Catreux.
- 👍 GATK: support 4.0.10.0 which does not use UCSC 2bit references for Spark tools
- 👍 variant calling: support bcftools 1.9 which is more strict about duplicated key names in INFO and FORMAT.
- seq2c: Upload global calls, coverage and read_mapping files to project directory.
- RNA-seq variant calling: Apply annotations after joint calling for GATK to avoid import errors with GenomicsDB. Thanks to Komal Rathi.
- ⬆️ CWL: add --cwl target to bcbio_nextgen.py upgrade to add and maintain bcbio-vm.
- CWL: use standard null instead of string "null" for representing None values.
- 👍 CWL: support for heterogeneity and structural variant callers that make use of variant inputs.
- 👍 CWL: support ensemble calling for combining multiple variant callers.
- 🚚 ensemble: remove no-ALT ref calls that contribute to incorrect ensemble outputs
- RNA-seq: output a matrix of un-deduped UMI counts when doing single-cell/DGE for quality control purposes. This is called tagcounts-dupes.mtx in the final directory.
- single-cell RNA-seq: allow pre-transformed FASTQ files as input to DGE/single-cell pipeline.
- single-cell RNA-seq: only create one index per specified genome instead of per sample
- fgbio: back compatibility for older quality setting --min-consensus-base-quality
- RNA-seq: fix for fusion_caller getting interpreted as a path, leading to memoization/upload issues.
- RNA-seq: memoize rRNA quality calculations, speeding up reruns.
- RNA-seq: prefix description with an X if it starts with a number, for R compatibility. Thanks to Avinash Reddy and Dan Stetson at AstraZeneca.
- single-cell RNA-seq: respect --positional flag with the new tag counting. Thanks to Babak Alaei at AstraZeneca.
- 0️⃣ RNA-seq: turn on --seqBias flag by default for Salmon as early-version overfitting issues have been fixed.
- RNA-seq: report insert size from Salmon fragment distribution, not samtools stats.
- RNA-seq: when processing explant samples, produce a combined tx2gene.csv file from all organisms processed.
v1.1.0 Changes
July 11, 2018
- Germline calls: rename outputs to samplename-germline to provide easier to understand outputs in final directory.
- ➕ Add bcbioRNASeq object creation and automatic quality report generation with tools_on: [bcbiornaseq]
- 👍 CWL: Support germline/somatic calling for tumor samples.
- CNVkit: improve whole genome runs. Better speed in normalize_sv_coverage through parallelization and avoiding logging. Avoid memory errors in segmentation.
- UMI: upload prepared UMI bam file (pre-consensus) to final output directory
- ➕ Add support for bbmap as an aligner
- RNA-seq variant calling: parallelize GATK HaplotypeCaller over regions to avoid memory and timeout issues.
- 👌 Support joint calling with GATK using pre-prepared gVCF inputs.
- RNA-seq variant calling: allow annotation of output variants with vcfanno
- 👌 Support hg38 builds with peddy QC
- 👍 QC: support VerifyBamID2 for contamination detection
- CWL: adjust defaults for align_split_size and nomap_split_targets to match different parallelization and overhead for these runs
- 👍 CWL: support for Cromwell runner
- custom genomes: Unzip GTF file prior to installation.
- Avoid making variant_regions required during processing (by filling with coverage) to differentiate targeted and non analyses downstream.
- 👍 Avoid attempts to download pre-installed S3 genomes, providing better errors with missing genome installs.
- Trimming: add explicit polyg option for removing 3' G stretches in NovaSeq and NextSeq data. Now defaults to no polyG trimming unless turned on.
- Chip-seq: Add RiP calculation for chip-seq data.
- 👍 DeepVariant and Strelka2 support for customized targeted/genome calling models per region to handle heterogeneous inputs.
- STAR: enable passing custom options for alignment.
- Add tools_off: [coverage_qc] option to skip calculating coverage stats (samtools-stats and picard).
- ➕ Adding BAM file for each sample in small-RNAseq pipeline, samtools and qualimap qc metrics to multiqc report.
- 👍 Allow arbitrary genomes for ChIP-seq. Thanks to @evchambers for pointing out the issue.

bcbio-nextgen changelog

Validated, scalable, community developed variant calling, RNA-seq and small RNA analysis

Changelog History

Page 2

v1.1.9 Changes

v1.1.8 Changes

v1.1.7 Changes

1.1.7

v1.1.6 Changes

v1.1.5 Changes

v1.1.4 Changes

v1.1.3 Changes

v1.1.2 Changes

v1.1.1 Changes

v1.1.0 Changes

bcbio-nextgen changelog

Validated, scalable, community developed variant calling, RNA-seq and small RNA analysis

Changelog History Page 2

1.1.7

Changelog History

Page 2