My centre is working on an analysis pipeline for whole genome sequencing data. The sequencing and alignment are being performed off-site and my centre will be receiving VCF files for annotation and curation of variants.
There is little to no literature on validating pipelines for WGS. If anyone has any, would you kindly share?
Does anyone have a proposed pipeline for annotation? We will be using Alissa 5.3 Interpret and were thinking of initially filtering variants out by read depth and then sorting them into variant type (SNV, CNV, and SV). Or would it be better to have two separate pipelines for annotation? One for CNVs and one for SNVs?
Following the variant type filter for CNVs, would it then make sense to sort them by size (> or < 5kb)?
I was just hoping to bounce some ideas back and forth as this is a first for our centre and we currently do not have access to a bioinformatician.
Thank you for any and all help!
https://github.com/imgag/megSAP - this is how we done it at our clinics
Have you checked gatk tool?
Sarek which is a Nextflow pipeline is quite nice for WGS analysis.
Looks like you are planning to use a commercial tool for the annotation of VCF. If you have no command line expertise/access to unix servers then this may be the way to go. All the tools being mentioned in this thread will require you to have access to and some expertise with command line.
Since you are not going to do primary analysis of data there is no validation of that part of the pipeline. There is literature available for pipeline validation (paper1, paper2 etc).
GDC has a defined DNAseq analysis pipeline. GATK best practices workflows are a good place to start as well.
You've given me a bit more direction and I'm feeling less lost now. I definitely do not have command line expertise or access to unix servers. The purpose of this exercise is to first implement the variant annotation in a research setting to then be transitioned into a diagnostic workflow for rapid WGS for acute care patients.
Really appreciate the feedback!
I can also give an ad about the tool I wrote during my PhD studies - https://github.com/imgag/ClinCNV . It is to detect CNVs in clinical settings (1KB for 30x, one can go into higher resolution with higher coverage, but not more than 500bp I'd say - files become huge). It works in maybe 4 hospitals as for now. It can be not the best tool in terms of precision/recall (but it is surely decent) - but I got a massive feedback from clinicians and was implementing everything they asked me. Several hundreds of patients were diagnosed with it in our clinics only.
Here is the presentation: https://github.com/imgag/ClinCNV/blob/master/doc/ClinCNV_thesis_presentation.pdf