Question: SNPs detection in transcriptomics to define genetic variation between individuals
0
gravatar for roncalli
19 months ago by
roncalli0
roncalli0 wrote:

Hi,

I am new to the SNP calling analysis and my question regard "expectations". I am trying to compare 3 de novo assemblies for calanoid copepod (non-model) and I would like to use the SNP calling as a way to assess the genetic variation between individuals. I do know from other paper that high genetic diversity is expected in my species and I am trying to link the SNP calling with this. I have seen that for humans 0.1% of SNPs diversity is enough to claim that 2 individuals are from different population but I am not sure if I can use this "expected" value also for copepods.

So my question is: Is there a way, based on % of "shared" and unique SNPs that I can support my theory of genetic diversity? Any "magic number" that would support the difference between individuals?

Thanks for the help,

Vittoria

ADD COMMENTlink modified 19 months ago • written 19 months ago by roncalli0
0
gravatar for colindaven
19 months ago by
colindaven1.2k
Hannover Medical School
colindaven1.2k wrote:

Interesting. I don't know of any magic numbers.

Instead of comparing assemblies directly, which would always ignore heterozygote variation (assemblies are likely to be haploid), I would suggest another approach.

  • a) merge assemblies. Software: cd-hit, supertranscript
  • b) rename contigs in merged transcriptome
  • c) map reads with bwa to merged trancriptome.
  • d) call variants, eg freebayes or bbmap callvariants.sh

This has the advantage of being standard conform.

You could also annotate your merged assembly with Interproscan for example to look at function

ADD COMMENTlink written 19 months ago by colindaven1.2k
0
gravatar for roncalli
19 months ago by
roncalli0
roncalli0 wrote:

Hi,

Thanks for the answer. Interestingly I did followed your suggestion already and this is what I did:

1) generate a merged assembly used as reference 2) mapped back to it each samples (raw reads) from a single individual 3) identified SNPs using samtool for calling variants

Now. How do I interpret the results? I am planning to generate a Venn diagram to see which SNPs are shared and which are "unique". My question now is "What is the % unique SNPs that would claim that individuals are from different population?"

I am very confused by the human #.

Thanks for the help,

Vittoria

ADD COMMENTlink written 19 months ago by roncalli0

This is a very specific biological question. I would guess the ability to detect SNPs would be strongly affected by the number of transcripts expressed in each individual. I don't think anyone here can provide you with a definitive answer, the best bet would be to go through other non-model organisms to find more expectation values. Molecular breeding research might also help.

I would also show haplotypes - eg through visualization - of conserved well known genes (with SNPs) of the three samples.

Multisample SNP calling - through Freebayes etc - would at least give you the three samples together.

Lastly, this leads you very definitely towards molecular phylogenetics, trees and various relationship matrices.

ADD REPLYlink written 19 months ago by colindaven1.2k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1142 users visited in the last hour