Low quality reads causing false heterozygosity in SNP calling?
1
0
Entering edit mode
9.4 years ago
j.a.yearsley ▴ 10

Hello,

Using a consesnsus sequence I am looking at individuals from a population in order to detect SNPS then create haplotypes. The SNP caller I am using is Freebayes. Freebayes is calling multiple SNPS and noticing that at that SNP all the invidivudals are heterozygous. The consensus sequence at that base is C and when I look at the mapping in IGV it is clear that half of the bases are A. All the A bases are of very low quality and fall at the end of read. The average ratio is roughly 25:75, A:C. This is seen in all individuals at this locus. The organism is diploid which would make this very strange. Is this just an artifact? Is this just an error or could this be a SNP?

The data is paired-end illumina data.

Thanks!

SNP Illumina paired-end • 1.9k views
ADD COMMENT
0
Entering edit mode
9.4 years ago

You might check to see if all the reads indicating one allele are mapped to the plus strand and vice versa. Most likely the assembly is wrong, and there's a collapsed repeat or other error at that location.

ADD COMMENT
0
Entering edit mode

Would it be possible to just ignore this SNP and use the others witht he expected base ratio? Some of them have a frequency of around 30:70 and a number of SNPs further down a single read match nicely in the heterozygous 50:50 format. The coverage is around 90

ADD REPLY
1
Entering edit mode

Mapping typically has some degree of reference-bias; the degree depends on the mapper, read length, read quality, genomic repeat content (including pseudogenes), presence of other nearby mutations or misassemblies, and various other factors. As a result, it's not necessarily uncommon to see het variants that differ from 50:50. If all individuals in a population appear het at the same locus, that's clearly not real and you should treat it as invalid, though note that due to a probable misassembly, it's likely that other variant calls nearby or in homologous regions might be off as well.

For other highly biased variants, you could consider trying a different aligner to see if that improves things. Also, it could be worthwhile to try to generate a better assembly or polish the existing one, if that particular locus is important.

ADD REPLY
0
Entering edit mode

It'd be rather crude to filter your SNPs on an arbitrary allele fraction, you could however try cranking up the minimum base and mapping qualities for reads to be used in SNP calling in your variant caller.

ADD REPLY

Login before adding your answer.

Traffic: 2006 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6