calculating within population nucleotide diversity for a virus population
3
1
Entering edit mode
8.9 years ago
vjmorley ▴ 30

I'm interested in using my next-gen sequencing data to calculate within population nucleotide diversity in RNA virus populations. My samples were sequenced on Illumina Hiseq, and each sample represents a diverse virus population. I have come across several packages for calculating nucleotide diversity, but they all seem to assume that each sample represents one individual rather than a population of individuals. Can anyone recommend software to use? It would be great if I could input a .bam or .mpileup file and get back diversity statistics for the population.

population genetics next-gen metagenomics • 4.4k views
ADD COMMENT
0
Entering edit mode
8.9 years ago
skbrimer ▴ 740

Actually, trying to due the same thing. There are a couple of quasispecies reconstruction programs out there. For example there is shoRAH (here), vispa (here), and an older one VICUNA (here). I also believe the freebayes will also call haplotypes although I'm having some trouble with it going from my vcf file to a haplotype list.

ADD COMMENT
0
Entering edit mode

Did you ever get freebayes to work for calling haplotypes. By the documentation and posts like this , it sure seems like it should be possible but I have yet to discover quite how and all my requests for help, though highly viewed, have gone unanswered.

ADD REPLY
0
Entering edit mode

Sadly, no. I tried playing around with the allele frequency since different populations would show up as low quality SNPs and you can do that with the -F flag I believe the default is 0.1 but I was trying as low as 0.005. Maybe I didn't go far enough? I'm sorry I'm not much help.

Also this project got pushed back for us so I haven't been working on it very much. I was going to try emailing Erik and ask for some guidance but never did.

Sorry again and good luck.

ADD REPLY
0
Entering edit mode
8.9 years ago
apelin20 ▴ 480

I would recommend PoPoolation. Allows you to calculate Theta, Pi and Tajima's D from NGS per population (.mpileup) and even allows you to calculate differentiation between populations, see which genes are undergoing adaptation.

ADD COMMENT
0
Entering edit mode
8.5 years ago
vjmorley ▴ 30

Update: In the end I found SNPGenie to be most useful for calculating Pi from RNA virus data. I tried PoPoolation, but found that it had trouble with the high coverage and large population sizes associated with RNA virus data.

ADD COMMENT

Login before adding your answer.

Traffic: 2156 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6