All Versions
62
Latest Version
Avg Release Cycle
65 days
Latest Release
862 days ago
Changelog History
Page 2
Changelog History
Page 2
-
v1.1.9 Changes
December 06, 2019- ๐ Fix for get VEP cache.
- ๐ Support Picard's new syntax for ReorderSam (REFERENCE -> SEQUENCE_DICTIONARY).
- โ Remove mitochondrial reads from ChIP/ATAC-seq calling.
- โ Add documentation describing ATAC-seq outputs.
- โ Add ENCODE library complexity metrics for ATAC/ChIP-seq to MultiQC report
๐ (see https://www.encodeproject.org/data-standards/terms/#library for a description of the metrics) - โ Add STAR sample-specific 2-pass. This helps assign a moderate number of reads per genes. Thanks
to @naumenko-sa for the intial implementation and push to get this going. - ๐ Index transcriptomes only once for pseudo/quasi aligner tools. This fixes race conditions that
can happen. - โ Add --buildversion option, for tracking which version of a gene build was used. This is used
duringbcbio_setup_genome.py
. Suggested formats are source_version, so Ensembl_94,
EnsemblMetazoa_25, FlyBase_26, etc. - Sort MACS2 bedgraph files before compressing. Thanks to @LMannarino for the suggestion.
- ๐ Check for the reserved field
sample
in RNA-seq metadata and quit with a useful error message.
Thanks to @marypiper for suggesting this. - ๐ Split ATAC-seq BAM files into nucleosome-free and mono/di/tri nucleosome files, so we can call
peaks on them separately. - Call peaks on NF/MN/DN/TN regions separately for each caller during ATAC-seq.
- ๐ Allow viral contamination to be assasyed on non tumor/normal samples.
- Ensure EBV coverage is calculated when run on genomes with it included as a contig.
-
v1.1.8 Changes
October 29, 2019- โ Add
antibody
configuration option. Setting a specific antibody for ChIP-seq will use appropriate
๐ settings for that antibody. See the documentation for supported antibodies. - Add
use_lowfreq_filter
for forcing vardict to report variants with low allelic frequency,
๐ useful for calling somatic variants in panels with high coverage. - ๐ Fix for checking for pre-existing inputs with python3.
- โ Add
keep_duplicates
option for ChIP/ATAC-seq which does not remove duplicates before peak calling.
0๏ธโฃ Defaults to False. - โ Add
keep_multimappers
for ChIP/ATAC-seq which does not remove multimappers before peak calling.
0๏ธโฃ Defaults to False. - โ Remove ethnicity as a required column in PED files.
- โ Add
-
v1.1.7 Changes
October 11, 20191.1.7
- ๐ hot fix for dataclasses not being supported in 3.6. Use namedtuple instead.
-
v1.1.6 Changes
October 10, 2019- GATK ApplyBQSRSpark: avoid StreamClosed issue with GATK 4.1+
- ๐ RNA-seq: fixes for cufflinks preparation due to python3 transition.
- RNA-seq: output count tables from tximport for genes and transcripts. These
are inbcbioRNASeq/results/date/genes/counts
and
bcbioRNASeq/results/data/transcripts/counts
. - qualimap (RNA-seq): disable stranded mode for qualimap, as it gives incorrect
results with the hisat2 aligner and for RNA-seq just setting it to unstranded - Add
quantify_genome_alignments
option to use genome alignments to quantify
with Salmon. - โ Add
--validateMappings
flag to Salmon read quantification mode. - VEP cache is not installing anymore from bcbio run
- โ Add support for Salmon SA method when STAR alignments are not available
(for hg38). - โ Add support for the new read model for filtering in Mutect2. This is
experimental, and a little flaky, so it can optionally be turned on via:
tools_on: mutect2_readmodel
. Thanks to @lbeltrame for implementing this
๐ feature and doing a ton of work debugging. - Swap pandas
from_csv
call toread_csv
. - ๐ Make STAR respect the
transcriptome_gtf
option. - Prefix regular expression with r. Thanks to @smoe for finding all of these.
- โ Add informative logging messages at beginning of bcbio run. Includes the version
๐ง and the configuration files being used. - Swap samtools mpileup to use bcftools mpileup as samtools mpileup is being
๐ deprecated (https://github.com/samtools/samtools/releases/tag/1.9). - ๐ Ensure locale is set to one supporting UTF-8 bcbio-wide. This may need to get
โช reverted if it introduces issues. - โ Added hg38 support for STAR. We did this by taking hg38 and removing the alts,
decoys and HLA sequences. - โ Added support for the arriba fusion caller.
- โ Added back missing programs from the version provenance file. Fixed formatting
problems introduced by switch to python3. - โ Added initial support for whole genome bisulfite sequencing using bismark. Thanks to
@hackdna for implementing this and @jnhutchinson for drafting the initial
๐ง pipeline. This is a work in progress in collaboration with @gcampanella, who
๐ has a similar implementation with some extra features that we will be merging
in soon. - 0๏ธโฃ qualimap for RNA-seq runs on the downsampled BAM files by default. Set
tools_on: [qualimap_full]
to run on the full BAM files. - โ Add STAR junction files to the files captured at the end of a run.
-
v1.1.5 Changes
April 12, 2019- ๐ Fixes for Python3 incompatibilities on distributed IPython runs.
- Numerous smaller Python3 incompatibilities with strings/unicode and types. Thanks to the community for reporting these.
- GATK HaplotypeCaller: correctly apply skipping of marked duplicates only for amplicon runs. Thanks to Ben Liesfeld.
- ๐ Fix format detection for bzip2 fastq inputs.
- ๐ Support latest GATK4 MuTect2 (4.1.1.0) with changes to ploidy and reference parameters.
- ๐ Support changes to GATK4 for VQSR --resource specification in 4.1.1.0. Thanks to Timothee Cezard.
- ๐ Support latest bedtools (2.28.0) which expects SAM heads for bgzipped BED inputs.
-
v1.1.4 Changes
April 03, 2019- ๐ Move to Python 3.6. A python2 environment in the install runs non python3 compatible programs. The codebase is still compatible with python 2.7 but will only get run and tested on python 3 for future releases.
- RNA-seq: fix for race condition when creating the pizzly cache
- RNA-seq: Add Salmon to multiqc report.
- RNA-seq single-cell/DGE: Properly strip transcript versions from GENCODE GTFs.
- RNA-seq: Faster and more flexible rRNA biotype lookup.
- โก๏ธ Move to R3.5.1, including updates to all CRAN and Bioconductor packages.
- tumor-only germline prioritization: provide more useful germline filtering based on prioritization INFO tag (EPR) rather than filter field.
- Install: do not require fabric for tool and data installs, making full codebase compatible with python 3.
- variant: Filter out variants with missing ALT alleles output by GATK4.
- GATK: enable specification of spark specific parameters with
gatk-spark
resources. - RNA-seq single-cell/DGE: added
demultiplexed
option. If set to True, treat the data as if it has already been demultiplexed into cells/wells. - Multiple orders of magnitude faster templating with thousands of input files.
-
v1.1.3 Changes
January 29, 2019- ๐ CNV: support background inputs for CNVkit, GATK4 CNV and seq2c. Allows pre-computed panel of normals for tumor-only or single sample CNV calling.
- variant: avoid race condition on processing input BED files for variant calling when no pre-specific variant_regions available.
- structural variation upload: avoid uploading multiple batched calls into sample directories. For lumpy will now have a single output per batch in a sample folder.
- install: respect pre-specified bioconda and conda-forge in condarc configuration. Allows use of custom package mirrors.
- ๐ seq2c: move specialized pre-call calculation upstream to coverage estimation. Allows use of seq2c in CWL runs.
- ๐ MultiQC upload: fix bug where results from parallel run not moved to final directory.
- GATK4 CNV: fix for standardize VCF output, correcting number of columns.
- RNA-seq variation: fix for over-filtering variants near splice junctions with STAR.
- Structural variant gene annotation: simplify and handle issues with multidirectional comparisons. Handle issues with out of order start/end from CNVkit.
- Catch and report unicode characters in templating or YAML descriptions.
-
v1.1.2 Changes
December 12, 2018- VarDict low frequency somatic filters: generalize strand and mismatch based filter based on cross-validation to avoid over filtering on high depth panels.
- strelka2 joint calling: switch to improved gvcfgenotyper approach for calling from gVCFs.
- ๐ Heterogeneity: initial support for PureCN and TitanCNA heterogeneity analysis including reporting on LOH in HLA for human samples. Work in progress validations: https://github.com/bcbio/bcbio_validations/tree/master/TCGA-heterogeneity
- ๐ CNV: initial support for GATK4 CNV calling as alternative to CNVkit for tumor normal analyses
- VarDict RNA-seq variant calling: avoid structural variants with recent vardict-java.
- RNA-seq variation: filter RNA-seq variants close to splice junctions, supporting STAR and hisat2.
- RNA-seq variation: add snpEff effects to output variant calls. Thanks to Manasa Surakala.
- RNA-seq: gzip/bgzip FASTQ files in
work/fastq
instead of the original directory. - ๐ use biobambam2 BAM to FASTQ conversion instead of Picard in all cases.
- ๐ Trimming: add built-in support for adapters from the SMARTer Universal Low Input RNA Kit (truseq2) and the Illumina NEXTera DNA prep kit from NEB (nextera2).
- ChIP/ATAC-seq: allow skipping duplicate marking.
- joint calling: ensure correct upload to final directory when no annotations present
- ๐ท Logging: fix logging in parallel runs with new joblib loky backend. Thanks to Ben Liesfeld and Roland Ewald.
-
v1.1.1 Changes
November 06, 2018- ๐ single-cell RNA-seq: add built-in support for 10x_v2.
- ๐ Fix UMI support for small RNA. Compatible with Qiagen UMI small RNA protocol.
- Ignore .Renviron when running Rscript to head-off PATH conflicts.
- Support SRR ids to download samples with bcbio_prepare_samples script.
- 0๏ธโฃ tumor-only prioritization: do not apply LowPriority filter by default, instead
annotate with external databases. Use
tumoronly_germline_filter
to re-enable previous behavior. - 0๏ธโฃ UMIs: apply default filtering based on de-duplicated read depth. Uses
--min-reads 2
with raw de-duplicated coverage of 800 or more or--min-reads 1
otherwise. Allows error correction with UMIs for higher depth samples. - 0๏ธโฃ gemini: databases no longer created by default. Use
tools_on: [gemini]
ortools_on: [gemini_orig]
to create a database. We now use a reduced database for build 37 to match build 38 and make this forward compatible with CWL. - 0๏ธโฃ vcfanno: run gemini and somatic annotations by default, producing annotated VCFs with external information.
- ๐ alignment preparation: support a list of split files from multiple sequencing lanes, merging into a single fastq
- ๐ variant: support octopus variant caller for germline and somatic samples.
- peddy: fix bug where not all files uploaded on first pipeline run
- peddy: For somatic analyses use separate germline calls for tumor/normal, if available, or extracted germline calls from supported callers, instead of somatic variants.
- ๐ GATK: support ploidy specification during joint calling.
- GATK BQSR: bin qualities into static groups (10, 20, 30) to match GATK4 recommendations. Thanks to Severine Catreux.
- ๐ GATK: support 4.0.10.0 which does not use UCSC 2bit references for Spark tools
- ๐ variant calling: support bcftools 1.9 which is more strict about duplicated key names in INFO and FORMAT.
- seq2c: Upload global calls, coverage and read_mapping files to project directory.
- RNA-seq variant calling: Apply annotations after joint calling for GATK to avoid import errors with GenomicsDB. Thanks to Komal Rathi.
- โฌ๏ธ CWL: add
--cwl
target to bcbio_nextgen.py upgrade to add and maintain bcbio-vm. - CWL: use standard null instead of string "null" for representing None values.
- ๐ CWL: support for heterogeneity and structural variant callers that make use of variant inputs.
- ๐ CWL: support ensemble calling for combining multiple variant callers.
- ๐ ensemble: remove no-ALT ref calls that contribute to incorrect ensemble outputs
- RNA-seq: output a matrix of un-deduped UMI counts when doing single-cell/DGE
for quality control purposes. This is called
tagcounts-dupes.mtx
in the final directory. - single-cell RNA-seq: allow pre-transformed FASTQ files as input to DGE/single-cell pipeline.
- single-cell RNA-seq: only create one index per specified genome instead of per sample
- fgbio: back compatibility for older quality setting
--min-consensus-base-quality
- RNA-seq: fix for
fusion_caller
getting interpreted as a path, leading to memoization/upload issues. - RNA-seq: memoize rRNA quality calculations, speeding up reruns.
- RNA-seq: prefix
description
with an X if it starts with a number, for R compatibility. Thanks to Avinash Reddy and Dan Stetson at AstraZeneca. - single-cell RNA-seq: respect
--positional
flag with the new tag counting. Thanks to Babak Alaei at AstraZeneca. - 0๏ธโฃ RNA-seq: turn on
--seqBias
flag by default for Salmon as early-version overfitting issues have been fixed. - RNA-seq: report insert size from Salmon fragment distribution, not samtools stats.
- RNA-seq: when processing explant samples, produce a combined tx2gene.csv file from all organisms processed.
-
v1.1.0 Changes
July 11, 2018- Germline calls: rename outputs to
samplename-germline
to provide easier to understand outputs in final directory. - โ Add bcbioRNASeq object creation and automatic quality report generation
with
tools_on: [bcbiornaseq]
- ๐ CWL: Support germline/somatic calling for tumor samples.
- CNVkit: improve whole genome runs. Better speed in normalize_sv_coverage through parallelization and avoiding logging. Avoid memory errors in segmentation.
- UMI: upload prepared UMI bam file (pre-consensus) to final output directory
- โ Add support for bbmap as an aligner
- RNA-seq variant calling: parallelize GATK HaplotypeCaller over regions to avoid memory and timeout issues.
- ๐ Support joint calling with GATK using pre-prepared gVCF inputs.
- RNA-seq variant calling: allow annotation of output variants with vcfanno
- ๐ Support hg38 builds with peddy QC
- ๐ QC: support VerifyBamID2 for contamination detection
- CWL: adjust defaults for align_split_size and nomap_split_targets to match different parallelization and overhead for these runs
- ๐ CWL: support for Cromwell runner
- custom genomes: Unzip GTF file prior to installation.
- Avoid making variant_regions required during processing (by filling with coverage) to differentiate targeted and non analyses downstream.
- ๐ Avoid attempts to download pre-installed S3 genomes, providing better errors with missing genome installs.
- Trimming: add explicit
polyg
option for removing 3' G stretches in NovaSeq and NextSeq data. Now defaults to no polyG trimming unless turned on. - Chip-seq: Add RiP calculation for chip-seq data.
- ๐ DeepVariant and Strelka2 support for customized targeted/genome calling models per region to handle heterogeneous inputs.
- STAR: enable passing custom options for alignment.
- Add
tools_off: [coverage_qc]
option to skip calculating coverage stats (samtools-stats and picard). - โ Adding BAM file for each sample in small-RNAseq pipeline, samtools and qualimap qc metrics to multiqc report.
- ๐ Allow arbitrary genomes for ChIP-seq. Thanks to @evchambers for pointing out the issue.
- Germline calls: rename outputs to