Question: Snps And Indels In Homozygous Genomes
0
gravatar for Ashutosh Pandey
7.4 years ago by
Philadelphia
Ashutosh Pandey11k wrote:

I have a genomic sequence of an inbred mouse strain i.e. the genome is homozygous for a given position or same alleles are present on both of the strands. Now for variant calling, I have to compare it with reference genome and only select homozygous SNPs and Indels (SNPs and Indels that are present on both the strands). If I use this criteria i.e. SNPs and Indels to be present on both the strands, then I am afraid that i might loose some real SNPs and indels because of low coverage (unequal representation of both the strands in the sequencing data) of my data. On the other hand using only one strand may call lot of false positive variants because of sequencing error.

Questions: Is there any variant caller designed for homozygous diploid genome that identifies homozygous SNPs and Indels.
I have option of using Freebayes, GATK or Samtools. Which one would be better to use in my case?

Thanks.

indel snp • 3.0k views
ADD COMMENTlink modified 6.8 years ago by Erik Garrison2.2k • written 7.4 years ago by Ashutosh Pandey11k
2
gravatar for lh3
7.4 years ago by
lh331k
United States
lh331k wrote:

I keep seeing this question, which I used to reply to the maq-help and samtools-help.

The answer is for a single genome, you should call in the diploid mode and then suppress all the heterozygotes afterwards. Most of heterozygotes are caused by structural variations. If you force the caller to treat such regions as haploid, you will get the calls wrong. The maq paper showed an example. Maq called three clustered heterozygotes for a bateria strain. It turned out that it is an extra copy of tRNA absent from the reference genome.

For multi-sample low-coverage calling, it may be preferred to inform the caller the ploidy. The caller will gain power for low-coverage data.

ADD COMMENTlink modified 11 days ago by RamRS24k • written 7.4 years ago by lh331k
1
gravatar for Erik Garrison
6.8 years ago by
Erik Garrison2.2k
Napoli, IT / UCSC
Erik Garrison2.2k wrote:

As lh3 suggests, you can call as diploid and suppress heterozygotes afterwards.

However, I think that you will generally get better performance if you run with a caller (like freebayes) that can model the sample as a haploid (or equivalently, homozygous diploid). To improve your specificity for homozygotes, you can also set the input filters to require a relatively high fraction of alternate allele observations in order to consider an allele (see -F --min-alternate-fraction).

ADD COMMENTlink modified 11 days ago by RamRS24k • written 6.8 years ago by Erik Garrison2.2k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 872 users visited in the last hour