bcbio-nextgen v0.7.6 Release Notes

Release Date: 2014-01-15 // over 10 years ago
    • ๐Ÿ“‡ Expand template functionality to provide additional ability to add metadata to samples with input CSV. Includes customization of algorithm section and better matching of samples using input file names. Improve ability to distinguish fastq pairs.
    • Generalize snpEff database preparation to use individual databases located with each genome. Enables better multi-organism support.
    • Enable tumor/normal paired called with FreeBayes. Contributed by Luca Beltrame.
    • Provide additional parallelization of bgzip preparation, performing grabix indexing in parallel for paired ends.
    • ๐Ÿ›  Fix downsampling with GATK-lite 2.3.9 releases by moving to sambamba based downsampling. Thanks to Przemek Lyszkiewicz.
    • ๐Ÿ– Handle Illumina format input files for bwa-mem alignment, and cleanly convert these when preparing bgzipped inputs for parallel alignment. Thanks to Miika Ahdesmaki.
    • ๐Ÿ‘ Provide better algorithm for distinguishing bwa-mem and bwa-aln usage. Now does random sampling of first 2 million reads instead of taking the first set of reads which may be non-generalizable. Also lowers requirement to use bwa-mem to 75% of reads being smaller than 70bp. Thanks to Paul Tang.
    • Enable specification of a GATK key file in the bcbio_system resources keyfile parameter. Disables callbacks to GATK tracking. Thanks to Severine Catreux for keyfile to debug with.
    • Correctly handle preparation of pre-aligned BAM files when sorting and coordinate specification needed. Thanks to Severine Catreux.
    • ๐Ÿ›  Fix incorrect quality flag being passed to Tophat. Thanks to Miika Ahdesmaki.
    • ๐Ÿ›  Fix Tophat not respecting the existing --transcriptome-index. Thanks to Miika Ahdesmaki.
    • Keep original gzipped fastq files. Thanks again to Miika Ahdesmaki.
    • ๐Ÿ›  Fixed incompatibility with complexity calculation and IPython.
    • โž• Added strand-specific RNA-seq support via the strandedness option.
    • โž• Added Cufflinks support.
    • Set stranded flag properly in htseq-count. Thanks to Miika Ahdesmaki.
    • ๐Ÿ›  Fix to ensure Tophat receives a minimum of 8 gb of memory, regardless of number of cores.
    • Remove hybrid_bait and hybrid_target which were no longer used with new lightweight QC framework. Prefer better coverage framework moving forward.
    • โž• Added extra summary information to the project-summary.yaml file so downstream tools can locate what genome resources were used.
    • โž• Added test_run option to the sample configuration file. Set it to True to run a small subset of your data through the pipeline to make sure everything is working okay.
    • ๐Ÿ‘ Fusion support added by setting fusion_mode: True in the algorithim section. Not officially documented for now until we can come up with best practices for it.
    • ๐Ÿ‘ STAR support re-enabled.
    • ๐Ÿ›  Fixed issue with the complexity calculation throwing an exception when there were not enough reads.
    • โž• Add disambiguation stats to final project-summary.yaml file. Thanks to Miika Ahdesmaki.
    • โœ‚ Remove Estimated Library Size and Complexity from RNA-seq QC summary information as they were confusing and unnecessarily alarming, respectively. Thanks to Miika Ahdesmaki and Sara Dempster.
    • ๐Ÿ‘ท Several memory allocation errors resulting in jobs getting killed in ๐Ÿ›  cluster environments for overusing their memory limit fixed.
    • โž• Added JVM options by default to Picard to allocate enough memory for large BAM->FastQ conversion.