Question: Whole Genome Analysis Pipeline (Illumina)
2
gravatar for Nandini
7.8 years ago by
Nandini840
Nandini840 wrote:

Hi,

I would like to know what is the feasible algorithm to map human whole genome sequences (Illumina) ? And what is the general pipeline followed for variant calling for whole genome analysis ?

Thank you, Nandini

illumina pipeline • 8.9k views
ADD COMMENTlink written 7.8 years ago by Nandini840
1
gravatar for Sean Davis
7.8 years ago by
Sean Davis26k
National Institutes of Health, Bethesda, MD
Sean Davis26k wrote:

You might take a look here for some ideas:

http://www.broadinstitute.org/gsa/wiki/index.php/Best_Practice_Variant_Detection_with_the_GATK_v3

If you have not done this yourself before, I highly suggest getting a collaborator to work with you on these data.

ADD COMMENTlink written 7.8 years ago by Sean Davis26k
1

Yes, is there a bioinformatics group where you work? There are many things to consider and it would help a lot if you can discuss this with people who have experience working with next generation sequence data.

ADD REPLYlink written 7.8 years ago by Rubal7770

Thank you for your reply. I have worked on SOLiD whole genome before using bioscope and now I have changed to Illumina ( each sample has been sequenced with flow cell having 7 or 8 lanes with 2 reads each)
So I was wondering if BWA-> Base quality score recalibration ->Local realignment -> MarkDuplicates -> Variant calling is a good option .

ADD REPLYlink written 7.8 years ago by Nandini840

Local realignment should probably come after BWA and before marking duplicates and recalibration.

ADD REPLYlink written 7.8 years ago by Sean Davis26k
1
gravatar for eonsim
7.8 years ago by
eonsim100
Belgium
eonsim100 wrote:

The Gatk pipeline in the previous post is pretty good, but can be a bit painful when implementing the whole thing (and CPU/io intensive). I've been using http://www.realtimegenomics.com a lot for our sequencing project (1200x coverages of bovine genome) and their pipeline is a lot cleaner, ergonomic (4 commands, format, map, coverage, snp or cnv) and faster (5-10x on our cluster) than the BWA/GATK pipeline while giving comparable results (both gave 99.6% concordance with snp chip calls). And their documentation is pretty good, note while they are commercial there is a free license that's suitable for most research and commercial use on a small to medium scale, they support there software very very well.

The output from the rtg pipeline can be feed into GATK as well if you want just need to filter the bams slightly.

ADD COMMENTlink modified 7.8 years ago • written 7.8 years ago by eonsim100
0
gravatar for eric.torstenson
7.7 years ago by
Nashville, TN
eric.torstenson0 wrote:

We just finished up our own automated pipeline which uses BWA, GATK, ANNOVAR and samtools to process fastq through to annotated VCF. It was designed for our illumina, human-whole genome data, so it assumes paired end data ATM, but it might be of use. It can download and compile/install each of the components (except ANNOVAR, which you'll need to give them your email address to get access to) and allows very high level of control over each of the programs via a single configuration file (which makes it easier to add data later on). It should run on PBS and SGE clusters as well as in serial, and helps ease the hassle of managing all of those jobs.

It's open source, and pretty extendable, but we haven't really put much effort into documenting how to do that just yet:) But, if you have another program that you prefer for variant calls or alignment, you probably can reuse one of the templates to have it use the alternate program. There are instructions on doing just that in the user's guide.

Anyway, if you are interested, have a look at ASAP. If anyone has ideas or questions relating to ASAP, I'd be happy to answer them.

ADD COMMENTlink written 7.7 years ago by eric.torstenson0
0
gravatar for reshetovdenis
7.7 years ago by
reshetovdenis0 wrote:

We've created the pipeline that calls SNPs and SVs. The results are presented to users in Excel tables with effect annotation of each variation. Also the data about protein function, pathways and diseases is presented. The pipeline integrates: GATK best practice Pindel Ensembl variant effect prediction Polyphen SIFT http://code.google.com/p/ngs-pipeline/

ADD COMMENTlink written 7.7 years ago by reshetovdenis0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 698 users visited in the last hour