Question

calculating within population nucleotide diversity for a virus population

1

Entering edit mode

8.5 years ago

vjmorley ▴ 30

I'm interested in using my next-gen sequencing data to calculate within population nucleotide diversity in RNA virus populations. My samples were sequenced on Illumina Hiseq, and each sample represents a diverse virus population. I have come across several packages for calculating nucleotide diversity, but they all seem to assume that each sample represents one individual rather than a population of individuals. Can anyone recommend software to use? It would be great if I could input a .bam or .mpileup file and get back diversity statistics for the population.

population genetics next-gen metagenomics • 4.2k views

ADD COMMENT • link 8.1 years ago by vjmorley ▴ 30

Ram · Answer 1 · 2015-11-09

0

Entering edit mode

8.5 years ago

skbrimer ▴ 740

Actually, trying to due the same thing. There are a couple of quasispecies reconstruction programs out there. For example there is shoRAH (here), vispa (here), and an older one VICUNA (here). I also believe the freebayes will also call haplotypes although I'm having some trouble with it going from my vcf file to a haplotype list.

ADD COMMENT • link updated 4.4 years ago by Ram 43k • written 8.5 years ago by skbrimer ▴ 740

0

Entering edit mode

Did you ever get freebayes to work for calling haplotypes. By the documentation and posts like this , it sure seems like it should be possible but I have yet to discover quite how and all my requests for help, though highly viewed, have gone unanswered.

ADD REPLY • link 8.1 years ago by mark.rose ▴ 50

0

Entering edit mode

Sadly, no. I tried playing around with the allele frequency since different populations would show up as low quality SNPs and you can do that with the -F flag I believe the default is 0.1 but I was trying as low as 0.005. Maybe I didn't go far enough? I'm sorry I'm not much help.

Also this project got pushed back for us so I haven't been working on it very much. I was going to try emailing Erik and ask for some guidance but never did.

Sorry again and good luck.

ADD REPLY • link 8.1 years ago by skbrimer ▴ 740

score 0 · Answer 2 · 2015-11-09

0

Entering edit mode

8.5 years ago

apelin20 ▴ 480

I would recommend PoPoolation. Allows you to calculate Theta, Pi and Tajima's D from NGS per population (.mpileup) and even allows you to calculate differentiation between populations, see which genes are undergoing adaptation.

ADD COMMENT • link 8.5 years ago by apelin20 ▴ 480

score 0 · Answer 3 · 2016-03-28

0

Entering edit mode

8.1 years ago

vjmorley ▴ 30

Update: In the end I found SNPGenie to be most useful for calculating Pi from RNA virus data. I tried PoPoolation, but found that it had trouble with the high coverage and large population sizes associated with RNA virus data.

ADD COMMENT • link 8.1 years ago by vjmorley ▴ 30