Here is my situation: I have at disposal sequencing reads from a RAD-Seq protocol performed on several species. The RAD-seq protocol was designed to develop genetic resources for further, classical genotyping (and not for GBS study), consequently; I have only one library per species, each library comprising sequence information from ~20 individuals sampled from various geographical areas and pooled together in a equimolar way.
To develop species-specific molecular resources, assembly/alignment were performed at the species levels and no cross-species alignment was performed.
I have no individual-based information (as I would with classical RAD ou ddRAD-seq) neither populational information (as I would with classical pool-seq), only information at the species levels.
I am not much familiar with DNA-based genetic diversity analyses: indeed, once we developed classical, amplification-based markers from the sequencing protocol; I then performed very classical population genetics analyses using a few SNP markers and computed metrics like heterozygosity or polymorphism rate. However, I would like to see if, and how; T could make the most of my species-level RAD sequences before I assume it to be disposable.
My objective would be to compare levels of molecular diversity between species to know which species are the most diverse. However, I am still not sure this would actually be possible with my kind of data.
More precisely, I was wondering: - Let us imagine I get to find a "sufficient" (which threshold would be enough?) common RAD-tags between all of my species of interest; and that these RAD share some within-species variability. - I know, from separate analyses, that the minor allele counts of a variant within my RAD-tags for a given species, are usually quite congruent with the actual minor allele frequencies computed from the corresponding variant (this variant being individually genotyped through classical amplification-based technique) at the study area scale. Therefore, I may be able to use the depth of coverage of each allele/haplotype as a proxy for its actual frequency within the sampled geographical area.
Do you think I could estimate nucleotide, and/or haplotype diversity; to compare species using these data? Would you have any suggestions on how to perform/which precautions I should take while doing so?
I am looking forward to reading you, Thank you very much in advance for any comment!
All the best!