Snps And Indels In Homozygous Genomes
Entering edit mode
12.0 years ago

I have a genomic sequence of an inbred mouse strain i.e. the genome is homozygous for a given position or same alleles are present on both of the strands. Now for variant calling, I have to compare it with reference genome and only select homozygous SNPs and Indels (SNPs and Indels that are present on both the strands). If I use this criteria i.e. SNPs and Indels to be present on both the strands, then I am afraid that i might loose some real SNPs and indels because of low coverage (unequal representation of both the strands in the sequencing data) of my data. On the other hand using only one strand may call lot of false positive variants because of sequencing error.

Questions: Is there any variant caller designed for homozygous diploid genome that identifies homozygous SNPs and Indels.
I have option of using Freebayes, GATK or Samtools. Which one would be better to use in my case?


snp indel • 4.1k views
Entering edit mode
12.0 years ago
lh3 33k

I keep seeing this question, which I used to reply to the maq-help and samtools-help.

The answer is for a single genome, you should call in the diploid mode and then suppress all the heterozygotes afterwards. Most of heterozygotes are caused by structural variations. If you force the caller to treat such regions as haploid, you will get the calls wrong. The maq paper showed an example. Maq called three clustered heterozygotes for a bateria strain. It turned out that it is an extra copy of tRNA absent from the reference genome.

For multi-sample low-coverage calling, it may be preferred to inform the caller the ploidy. The caller will gain power for low-coverage data.

Entering edit mode
11.4 years ago
Erik Garrison ★ 2.4k

As lh3 suggests, you can call as diploid and suppress heterozygotes afterwards.

However, I think that you will generally get better performance if you run with a caller (like freebayes) that can model the sample as a haploid (or equivalently, homozygous diploid). To improve your specificity for homozygotes, you can also set the input filters to require a relatively high fraction of alternate allele observations in order to consider an allele (see -F --min-alternate-fraction).


Login before adding your answer.

Traffic: 2155 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6