exome seq- Post mapping question
Entering edit mode
3.4 years ago
geneart$$ ▴ 50

Hi all,

I have posted my mapping stats on one of my samples for your information. So I am at the post-mapping stage of exome seq analysis. I have watched Broad video on variant calling and galaxy tutorial etc. This is the first time I am doing exome seq analysis. Just feeling stuck and don't know how to proceed!

As you can see I have good number of reads that have both pairs mapped to the reference (98.26%) and no duplicates.

My question is:

1. Do I filter out just the properly paired reads only and take that to proceed into variant calling for indel rearrangement,BQSR and var calling? ( this way I can be very confident in my var calling)


2. Filter out my bams, to include mapped single mates ( when just one mate of the pair is mapped and the other is not) as well? If so then do I include the single mates with another mate in another chromosome? and then proceed to var calling?

Any direction to the next steps of analysis would guide me. Thanks for your time in advance :)

151293750 + 0 in total (QC-passed reads + QC-failed reads)  
0 + 0 secondary   
201784 + 0 supplementary   
0 + 0 duplicates   
149562179 +0 mapped (98.86% : N/A)   
151091966 + 0 paired in sequencing  
75545983 + 0 read1  
75545983 + 0 read2  
148458680 + 0 properly paired (98.26% : N/A)  
149256604 + 0 with itself and mate mapped 
103791 + 0 singletons (0.07%: N/A)   
539982 + 0 with mate mapped to a different chr   
423655 + 0 with mate mapped to a different chr (mapQ>=5)
variant-calling exome-seq • 563 views
Entering edit mode

It is better to use the properly paired reads, unless you are looking for a rare (low frequency variant)

Entering edit mode

Thanks JC. I will do a first pass using properly paired reads and then come back and do a second take to screen rare variants.

Entering edit mode

I would still use mark duplicates, I am not convinced there are absolutely no duplicates. If anyone has thoughts on this please feel free to comment.

Entering edit mode

Ok, couple of updates on this from what I posted last time:

  1. only if mark duplicates is done using picard tools or gatk pipeline (which uses picard to mark duplicates anyways) then it shows on the duplicates lines and the (0+0 duplicates) would change to show a number of duplicates, if marked duplicates. Here what I have posted is just bwa mapping as is, not carried through any mark duplicates workflow.
  2. GATK has a nice pipeline to take through variant analysis using the mapped reads or one can start from unmapped reads and go through their pipeline.

Just FYI for anyone who is looking for a followup on this post.


Login before adding your answer.

Traffic: 2412 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6