All Versions
62
Latest Version
Avg Release Cycle
65 days
Latest Release
862 days ago

Changelog History
Page 2

  • v1.1.9 Changes

    December 06, 2019
    • ๐Ÿ›  Fix for get VEP cache.
    • ๐Ÿ‘Œ Support Picard's new syntax for ReorderSam (REFERENCE -> SEQUENCE_DICTIONARY).
    • โœ‚ Remove mitochondrial reads from ChIP/ATAC-seq calling.
    • โž• Add documentation describing ATAC-seq outputs.
    • โž• Add ENCODE library complexity metrics for ATAC/ChIP-seq to MultiQC report
      ๐Ÿ‘€ (see https://www.encodeproject.org/data-standards/terms/#library for a description of the metrics)
    • โž• Add STAR sample-specific 2-pass. This helps assign a moderate number of reads per genes. Thanks
      to @naumenko-sa for the intial implementation and push to get this going.
    • ๐Ÿ›  Index transcriptomes only once for pseudo/quasi aligner tools. This fixes race conditions that
      can happen.
    • โž• Add --buildversion option, for tracking which version of a gene build was used. This is used
      during bcbio_setup_genome.py. Suggested formats are source_version, so Ensembl_94,
      EnsemblMetazoa_25, FlyBase_26, etc.
    • Sort MACS2 bedgraph files before compressing. Thanks to @LMannarino for the suggestion.
    • ๐Ÿ“‡ Check for the reserved field sample in RNA-seq metadata and quit with a useful error message.
      Thanks to @marypiper for suggesting this.
    • ๐Ÿ†“ Split ATAC-seq BAM files into nucleosome-free and mono/di/tri nucleosome files, so we can call
      peaks on them separately.
    • Call peaks on NF/MN/DN/TN regions separately for each caller during ATAC-seq.
    • ๐Ÿ‘ Allow viral contamination to be assasyed on non tumor/normal samples.
    • Ensure EBV coverage is calculated when run on genomes with it included as a contig.
  • v1.1.8 Changes

    October 29, 2019
    • โž• Add antibody configuration option. Setting a specific antibody for ChIP-seq will use appropriate
      ๐Ÿ“š settings for that antibody. See the documentation for supported antibodies.
    • Add use_lowfreq_filter for forcing vardict to report variants with low allelic frequency,
      ๐Ÿ‘‰ useful for calling somatic variants in panels with high coverage.
    • ๐Ÿ›  Fix for checking for pre-existing inputs with python3.
    • โž• Add keep_duplicates option for ChIP/ATAC-seq which does not remove duplicates before peak calling.
      0๏ธโƒฃ Defaults to False.
    • โž• Add keep_multimappers for ChIP/ATAC-seq which does not remove multimappers before peak calling.
      0๏ธโƒฃ Defaults to False.
    • โœ‚ Remove ethnicity as a required column in PED files.
  • v1.1.7 Changes

    October 11, 2019

    1.1.7

    • ๐Ÿ‘ hot fix for dataclasses not being supported in 3.6. Use namedtuple instead.
  • v1.1.6 Changes

    October 10, 2019
    • GATK ApplyBQSRSpark: avoid StreamClosed issue with GATK 4.1+
    • ๐Ÿ›  RNA-seq: fixes for cufflinks preparation due to python3 transition.
    • RNA-seq: output count tables from tximport for genes and transcripts. These
      are in bcbioRNASeq/results/date/genes/counts and
      bcbioRNASeq/results/data/transcripts/counts.
    • qualimap (RNA-seq): disable stranded mode for qualimap, as it gives incorrect
      results with the hisat2 aligner and for RNA-seq just setting it to unstranded
    • Add quantify_genome_alignments option to use genome alignments to quantify
      with Salmon.
    • โž• Add --validateMappings flag to Salmon read quantification mode.
    • VEP cache is not installing anymore from bcbio run
    • โž• Add support for Salmon SA method when STAR alignments are not available
      (for hg38).
    • โž• Add support for the new read model for filtering in Mutect2. This is
      experimental, and a little flaky, so it can optionally be turned on via:
      tools_on: mutect2_readmodel. Thanks to @lbeltrame for implementing this
      ๐Ÿ”‹ feature and doing a ton of work debugging.
    • Swap pandas from_csv call to read_csv.
    • ๐Ÿ‘‰ Make STAR respect the transcriptome_gtf option.
    • Prefix regular expression with r. Thanks to @smoe for finding all of these.
    • โž• Add informative logging messages at beginning of bcbio run. Includes the version
      ๐Ÿ”ง and the configuration files being used.
    • Swap samtools mpileup to use bcftools mpileup as samtools mpileup is being
      ๐Ÿš€ deprecated (https://github.com/samtools/samtools/releases/tag/1.9).
    • ๐Ÿ‘ Ensure locale is set to one supporting UTF-8 bcbio-wide. This may need to get
      โช reverted if it introduces issues.
    • โž• Added hg38 support for STAR. We did this by taking hg38 and removing the alts,
      decoys and HLA sequences.
    • โž• Added support for the arriba fusion caller.
    • โž• Added back missing programs from the version provenance file. Fixed formatting
      problems introduced by switch to python3.
    • โž• Added initial support for whole genome bisulfite sequencing using bismark. Thanks to
      @hackdna for implementing this and @jnhutchinson for drafting the initial
      ๐Ÿšง pipeline. This is a work in progress in collaboration with @gcampanella, who
      ๐Ÿ”€ has a similar implementation with some extra features that we will be merging
      in soon.
    • 0๏ธโƒฃ qualimap for RNA-seq runs on the downsampled BAM files by default. Set
      tools_on: [qualimap_full] to run on the full BAM files.
    • โž• Add STAR junction files to the files captured at the end of a run.
  • v1.1.5 Changes

    April 12, 2019
    • ๐Ÿ›  Fixes for Python3 incompatibilities on distributed IPython runs.
    • Numerous smaller Python3 incompatibilities with strings/unicode and types. Thanks to the community for reporting these.
    • GATK HaplotypeCaller: correctly apply skipping of marked duplicates only for amplicon runs. Thanks to Ben Liesfeld.
    • ๐Ÿ›  Fix format detection for bzip2 fastq inputs.
    • ๐Ÿ‘Œ Support latest GATK4 MuTect2 (4.1.1.0) with changes to ploidy and reference parameters.
    • ๐Ÿ‘Œ Support changes to GATK4 for VQSR --resource specification in 4.1.1.0. Thanks to Timothee Cezard.
    • ๐Ÿ‘Œ Support latest bedtools (2.28.0) which expects SAM heads for bgzipped BED inputs.
  • v1.1.4 Changes

    April 03, 2019
    • ๐Ÿšš Move to Python 3.6. A python2 environment in the install runs non python3 compatible programs. The codebase is still compatible with python 2.7 but will only get run and tested on python 3 for future releases.
    • RNA-seq: fix for race condition when creating the pizzly cache
    • RNA-seq: Add Salmon to multiqc report.
    • RNA-seq single-cell/DGE: Properly strip transcript versions from GENCODE GTFs.
    • RNA-seq: Faster and more flexible rRNA biotype lookup.
    • โšก๏ธ Move to R3.5.1, including updates to all CRAN and Bioconductor packages.
    • tumor-only germline prioritization: provide more useful germline filtering based on prioritization INFO tag (EPR) rather than filter field.
    • Install: do not require fabric for tool and data installs, making full codebase compatible with python 3.
    • variant: Filter out variants with missing ALT alleles output by GATK4.
    • GATK: enable specification of spark specific parameters with gatk-spark resources.
    • RNA-seq single-cell/DGE: added demultiplexed option. If set to True, treat the data as if it has already been demultiplexed into cells/wells.
    • Multiple orders of magnitude faster templating with thousands of input files.
  • v1.1.3 Changes

    January 29, 2019
    • ๐Ÿ‘ CNV: support background inputs for CNVkit, GATK4 CNV and seq2c. Allows pre-computed panel of normals for tumor-only or single sample CNV calling.
    • variant: avoid race condition on processing input BED files for variant calling when no pre-specific variant_regions available.
    • structural variation upload: avoid uploading multiple batched calls into sample directories. For lumpy will now have a single output per batch in a sample folder.
    • install: respect pre-specified bioconda and conda-forge in condarc configuration. Allows use of custom package mirrors.
    • ๐Ÿšš seq2c: move specialized pre-call calculation upstream to coverage estimation. Allows use of seq2c in CWL runs.
    • ๐Ÿšš MultiQC upload: fix bug where results from parallel run not moved to final directory.
    • GATK4 CNV: fix for standardize VCF output, correcting number of columns.
    • RNA-seq variation: fix for over-filtering variants near splice junctions with STAR.
    • Structural variant gene annotation: simplify and handle issues with multidirectional comparisons. Handle issues with out of order start/end from CNVkit.
    • Catch and report unicode characters in templating or YAML descriptions.
  • v1.1.2 Changes

    December 12, 2018
    • VarDict low frequency somatic filters: generalize strand and mismatch based filter based on cross-validation to avoid over filtering on high depth panels.
    • strelka2 joint calling: switch to improved gvcfgenotyper approach for calling from gVCFs.
    • ๐Ÿ‘ Heterogeneity: initial support for PureCN and TitanCNA heterogeneity analysis including reporting on LOH in HLA for human samples. Work in progress validations: https://github.com/bcbio/bcbio_validations/tree/master/TCGA-heterogeneity
    • ๐Ÿ‘ CNV: initial support for GATK4 CNV calling as alternative to CNVkit for tumor normal analyses
    • VarDict RNA-seq variant calling: avoid structural variants with recent vardict-java.
    • RNA-seq variation: filter RNA-seq variants close to splice junctions, supporting STAR and hisat2.
    • RNA-seq variation: add snpEff effects to output variant calls. Thanks to Manasa Surakala.
    • RNA-seq: gzip/bgzip FASTQ files in work/fastq instead of the original directory.
    • ๐Ÿ‘‰ use biobambam2 BAM to FASTQ conversion instead of Picard in all cases.
    • ๐Ÿ‘ Trimming: add built-in support for adapters from the SMARTer Universal Low Input RNA Kit (truseq2) and the Illumina NEXTera DNA prep kit from NEB (nextera2).
    • ChIP/ATAC-seq: allow skipping duplicate marking.
    • joint calling: ensure correct upload to final directory when no annotations present
    • ๐Ÿ‘ท Logging: fix logging in parallel runs with new joblib loky backend. Thanks to Ben Liesfeld and Roland Ewald.
  • v1.1.1 Changes

    November 06, 2018
    • ๐Ÿ‘ single-cell RNA-seq: add built-in support for 10x_v2.
    • ๐Ÿ›  Fix UMI support for small RNA. Compatible with Qiagen UMI small RNA protocol.
    • Ignore .Renviron when running Rscript to head-off PATH conflicts.
    • Support SRR ids to download samples with bcbio_prepare_samples script.
    • 0๏ธโƒฃ tumor-only prioritization: do not apply LowPriority filter by default, instead annotate with external databases. Use tumoronly_germline_filter to re-enable previous behavior.
    • 0๏ธโƒฃ UMIs: apply default filtering based on de-duplicated read depth. Uses --min-reads 2 with raw de-duplicated coverage of 800 or more or --min-reads 1 otherwise. Allows error correction with UMIs for higher depth samples.
    • 0๏ธโƒฃ gemini: databases no longer created by default. Use tools_on: [gemini] or tools_on: [gemini_orig] to create a database. We now use a reduced database for build 37 to match build 38 and make this forward compatible with CWL.
    • 0๏ธโƒฃ vcfanno: run gemini and somatic annotations by default, producing annotated VCFs with external information.
    • ๐Ÿ‘ alignment preparation: support a list of split files from multiple sequencing lanes, merging into a single fastq
    • ๐Ÿ‘ variant: support octopus variant caller for germline and somatic samples.
    • peddy: fix bug where not all files uploaded on first pipeline run
    • peddy: For somatic analyses use separate germline calls for tumor/normal, if available, or extracted germline calls from supported callers, instead of somatic variants.
    • ๐Ÿ‘ GATK: support ploidy specification during joint calling.
    • GATK BQSR: bin qualities into static groups (10, 20, 30) to match GATK4 recommendations. Thanks to Severine Catreux.
    • ๐Ÿ‘ GATK: support 4.0.10.0 which does not use UCSC 2bit references for Spark tools
    • ๐Ÿ‘ variant calling: support bcftools 1.9 which is more strict about duplicated key names in INFO and FORMAT.
    • seq2c: Upload global calls, coverage and read_mapping files to project directory.
    • RNA-seq variant calling: Apply annotations after joint calling for GATK to avoid import errors with GenomicsDB. Thanks to Komal Rathi.
    • โฌ†๏ธ CWL: add --cwl target to bcbio_nextgen.py upgrade to add and maintain bcbio-vm.
    • CWL: use standard null instead of string "null" for representing None values.
    • ๐Ÿ‘ CWL: support for heterogeneity and structural variant callers that make use of variant inputs.
    • ๐Ÿ‘ CWL: support ensemble calling for combining multiple variant callers.
    • ๐Ÿšš ensemble: remove no-ALT ref calls that contribute to incorrect ensemble outputs
    • RNA-seq: output a matrix of un-deduped UMI counts when doing single-cell/DGE for quality control purposes. This is called tagcounts-dupes.mtx in the final directory.
    • single-cell RNA-seq: allow pre-transformed FASTQ files as input to DGE/single-cell pipeline.
    • single-cell RNA-seq: only create one index per specified genome instead of per sample
    • fgbio: back compatibility for older quality setting --min-consensus-base-quality
    • RNA-seq: fix for fusion_caller getting interpreted as a path, leading to memoization/upload issues.
    • RNA-seq: memoize rRNA quality calculations, speeding up reruns.
    • RNA-seq: prefix description with an X if it starts with a number, for R compatibility. Thanks to Avinash Reddy and Dan Stetson at AstraZeneca.
    • single-cell RNA-seq: respect --positional flag with the new tag counting. Thanks to Babak Alaei at AstraZeneca.
    • 0๏ธโƒฃ RNA-seq: turn on --seqBias flag by default for Salmon as early-version overfitting issues have been fixed.
    • RNA-seq: report insert size from Salmon fragment distribution, not samtools stats.
    • RNA-seq: when processing explant samples, produce a combined tx2gene.csv file from all organisms processed.
  • v1.1.0 Changes

    July 11, 2018
    • Germline calls: rename outputs to samplename-germline to provide easier to understand outputs in final directory.
    • โž• Add bcbioRNASeq object creation and automatic quality report generation with tools_on: [bcbiornaseq]
    • ๐Ÿ‘ CWL: Support germline/somatic calling for tumor samples.
    • CNVkit: improve whole genome runs. Better speed in normalize_sv_coverage through parallelization and avoiding logging. Avoid memory errors in segmentation.
    • UMI: upload prepared UMI bam file (pre-consensus) to final output directory
    • โž• Add support for bbmap as an aligner
    • RNA-seq variant calling: parallelize GATK HaplotypeCaller over regions to avoid memory and timeout issues.
    • ๐Ÿ‘Œ Support joint calling with GATK using pre-prepared gVCF inputs.
    • RNA-seq variant calling: allow annotation of output variants with vcfanno
    • ๐Ÿ‘Œ Support hg38 builds with peddy QC
    • ๐Ÿ‘ QC: support VerifyBamID2 for contamination detection
    • CWL: adjust defaults for align_split_size and nomap_split_targets to match different parallelization and overhead for these runs
    • ๐Ÿ‘ CWL: support for Cromwell runner
    • custom genomes: Unzip GTF file prior to installation.
    • Avoid making variant_regions required during processing (by filling with coverage) to differentiate targeted and non analyses downstream.
    • ๐Ÿ‘ Avoid attempts to download pre-installed S3 genomes, providing better errors with missing genome installs.
    • Trimming: add explicit polyg option for removing 3' G stretches in NovaSeq and NextSeq data. Now defaults to no polyG trimming unless turned on.
    • Chip-seq: Add RiP calculation for chip-seq data.
    • ๐Ÿ‘ DeepVariant and Strelka2 support for customized targeted/genome calling models per region to handle heterogeneous inputs.
    • STAR: enable passing custom options for alignment.
    • Add tools_off: [coverage_qc] option to skip calculating coverage stats (samtools-stats and picard).
    • โž• Adding BAM file for each sample in small-RNAseq pipeline, samtools and qualimap qc metrics to multiqc report.
    • ๐Ÿ‘ Allow arbitrary genomes for ChIP-seq. Thanks to @evchambers for pointing out the issue.