number of heterozygous and homozygous SNPs across lineages
0
0
Entering edit mode
3.6 years ago

Hello,

I have mapped several Illumina paired-end libraries on the reference genome of a protist (unicellular eukaryote; 100Mb genome with few repeats but very low AT, high ploidie) using Bwa-mem. All libraries correspond to different lineages of the species of the reference genome.

After calling heterozygous and homozygous SNPs using samtools (only primary alignments: -F 260) and bcftools (only good quality alignments: -q 30), I obtained the following results:

  • lineages nearly identical to the ref genome: 1K homo SNPs + 1K hetero SNPs
  • lineages a bit more distant from the ref genome: 0.5M homo SNPs + 10K hetero SNPs
  • lineages very distant from the ref genome: 1M homo SNPs + 50K hetero SNPs

nb: a SNP was called if the number of alternate reads (DP4) was at least 5 reads, and was called 'heterozygous' when the number of reference reads was below 5.

The fact that I observe more homozygous SNPs in distant lineages than in lineages close to the reference genome makes perfectly sense to me. But why the number of heterozygous SNPs also increases with the phylogenetic/evolutionary distance ? I would expect it to be constant across lineages. Does it suggest that there is a problem with my analyses ?

many thanks if you could help me making sense of these results.

SNP bwa samtools bcftools • 644 views
ADD COMMENT
0
Entering edit mode

I am not sure if your system to detect homozygous and heterozygous SNPs is common (using a five read threshold) but perhaps you want to try using GATK to call variants instead?

ADD REPLY
0
Entering edit mode

thanks Samuel. I'll give GATK or Freibades a try. But my problem will likely persist with these programs.

ADD REPLY

Login before adding your answer.

Traffic: 2751 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6