Question

How To Set Filter For Frequency Of Reads And Hapmap Exome Sample Results:

0

Entering edit mode

12.1 years ago

Angel ▴ 220

Hey All,

I only used so far three filters for my whole exome pipeline (aligning to hg19) for a HapMap sample. I tried it on the NA19240 Hapmap sample from paper below (Table 3) which shows ~196 variants (SNPs and INDELs). http://www.nature.com/nmeth/journal/v9/n2/extref/nmeth.1810-S1.pdf

However, using my filters as below I get = ~15000 (just NONSYNONYMOUSCODING alterations). If you add INDELS, it's going to be much higher number. What am I doing wrong?

My list of filters are:

1) vcfutils varFilter -D1000 2) snpEff -minQ 20 -minCoverage 30

Could they have different filters like frequency of variants etc.? If so, how do I set these up? Any help? What are the default parameters for # of reads (minimum) and frequency in bwa,samtools?

Below is my pipeline:

bwa aln hg19.fa S375R1.fastq > S3751.sai
bwa aln hg19.fa S375R2.fastq > S3752.sai
bwa sampe hg19.fa S3751.sai S3752.sai S375R1.fastq S375R2.fastq > S375NoIndexL007.sam
samtools view -bS S375NoIndexL007.sam > S375NoIndexL007.bam
samtools sort S375NoIndexL007.bam S375NoIndexL007.sorted
Marked duplicates using picard
samtools index S375NoIndexL007.marked.bam
samtools mpileup -uf hg19.fa S375NoIndexL007.marked.bam | bcftools view -bvcg - > S375NoIndexL007.raw.bcf
bcftools view S375NoIndexL007.raw.bcf | vcfutils.pl varFilter -D1000 > S375NoIndexL007vard200.flt.vcf

exome filter hapmap snp indel • 3.7k views

ADD COMMENT • link updated 12.1 years ago by Zev.Kronenberg 12k • written 12.1 years ago by Angel ▴ 220

score 1 · Answer 1 · 2012-02-29

1

Entering edit mode

12.1 years ago

Zev.Kronenberg 12k

Filter your bam on mapping quality -q 20. Filter your variant quality using varfilter via samtools. Filter low depth sites using varfilter via samtools.

Realigning INDELS is very important! You should consider doing it if you care about calling INDELS correctly.

ADD COMMENT • link 12.1 years ago by Zev.Kronenberg 12k

0

Entering edit mode

HI Zev, thanks very much for your reply.

Q1 Does it matter if I re-align after marking duplicates. Just to avoid repeating the duplication?

Q2 How do I filter on maping quality. I found this link and I didn't understand th filter in the example. What filter on mapping quality do you recommend?

ADD REPLY • link 12.1 years ago by Angel ▴ 220

0

Entering edit mode

I would remove the duplicates and then do the realignment; it will go faster. Samtools view: '-q INT Skip alignments with MAPQ smaller than INT [0]'. 20 is what we use. This mean those reads that map with a quality below twenty are eliminated.

ADD REPLY • link 12.1 years ago by Zev.Kronenberg 12k

0

Entering edit mode

Thanks very much Zevm will try it. Currently, I am getting a couple of errors with re-alignment but will keep trying.