20% SNP/Indel in a distance of less than 20bp with others
0
0
Entering edit mode
9.2 years ago
dadoudou ▴ 10

I am doing a resequencing work of 150 populations. After calling SNP and Indel with GATK and filteration, I found that 20% SNP/Indels locate less than 20bp around others. I don't know whether it is reasonable. If not reasonable, what should I do next?

I also think about using "vcftools -thin" to thin SNP/Indels. But it seems too simple and rude.

SNP sequencing genome • 2.8k views
ADD COMMENT
0
Entering edit mode

Did you try GATK IndelRealigner? It realigns reads around indels to minimize false positives mismatches that can be called as SNPs by a variant caller. Check this link: https://www.broadinstitute.org/gatk/gatkdocs/org_broadinstitute_gatk_tools_walkers_indels_IndelRealigner.php. A sloppy alternative would be to remove SNPs within 10 or 20 bp of Indels from the vcf file. I would prefer realigning around the indels first and then calling for variants.

ADD REPLY
0
Entering edit mode

Thank you for your comments. Yeah, before calling, I have done realigning with GATK IndelRealigner. Your suggestion inspired me. I found bcftools have two parameters --SnpGap and --IndelGap. But How big the parameters are suggested?

ADD REPLY
0
Entering edit mode

I answered a similar post before. These thresholds are subjective. You can find them here: what is the properties of filtering the vcf files

ADD REPLY
0
Entering edit mode

150 populations. How many samples? With a few thousands of samples from diverse populations, an average distance ~50bp is expected.

ADD REPLY

Login before adding your answer.

Traffic: 2898 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6