Question: Calling de novo SNPs between two species with KisSplice
0
gravatar for MGlenn
10 days ago by
MGlenn0
MGlenn0 wrote:

Hello,

I am thinking about using the KisSplice workflow to call SNPs between 2 closely related species. I have a Trinity assembly, and a transdecoder ORF file.

I have 4 paired-end samples total for 2 species - 2 samples per species. I would like to call SNPs between species 1 and species 2, to highlight the functional differences between these species.

According to this: kissplice -s 1 -k 41 --experimental -r cond1_replicat1_R1.fastq -r cond1_replicat1_R2.fastq -r cond1_replicat2_R1.fastq -r cond1_replicat2_R2.fastq -r cond2_replicat1_R1.fastq -r cond2_replicat1_R2.fastq -r cond2_replicat2_R1.fastq -r cond2_replicat2_R2.fastq

Would the SNPs that KisSplice calls be between cond1 and cond2 only? Or would it call SNPs between all combinations of cond1, replicate 1, replicate 2, cond2, replicate 1, replicate 2 etc?

My goal is to observe the number of SNPs that makes species 1 different from species 2, and which of those SNPs would cause an amino acid change.

Please let me know if this workflow is appropriate for my objective. Thank you.

ADD COMMENTlink modified 1 day ago by vincent.lacroix50 • written 10 days ago by MGlenn0
0
gravatar for leandro.ishi.lima
8 days ago by
leandro.ishi.lima20 wrote:

Dear MGlenn,

The command you described would call SNPs between all combinations of cond1, replicate 1, replicate 2, cond2, replicate 1, replicate 2 etc. In KisSplice, firstly the bubbles corresponding to SNPs are found, and only after they are quantified according to conditions, replicates, etc. Thus, you will have to filter out the SNPs that you are interested on.

These links might interest you:

http://nar.oxfordjournals.org/content/early/2016/07/25/nar.gkw655.abstract

http://kissplice.prabi.fr/TWAS/

Kind regards.

ADD COMMENTlink written 8 days ago by leandro.ishi.lima20
0
gravatar for MGlenn
4 days ago by
MGlenn0
MGlenn0 wrote:

Thanks for getting back to me Leandro, I appreciate it. I should have mentioned that I also ran the kissplice2reftranscriptome step as well and obtained the mainOutput.tsv file of SNPs that were called between all combinations of cond1, rep1, rep2, cond2, rep1, rep2 etc.

So, it is possible to filter the variants in such a way that I can see how two samples from one species differ from the samples of the other species? For example: calling SNPs between condition 1 (or species 1) (rep1,rep2) and condition 2 (or species 2) (rep1,rep2), which would highlight SNPs between my species.

I'm just trying to figure out some post-processing steps that I can take to highlight SNPs between species 1 and species 2. Is there a column that describes the samples that the SNP was called between?

Thanks for you time and your quick reply, very helpful.

M

ADD COMMENTlink modified 4 days ago • written 4 days ago by MGlenn0
0
gravatar for leandro.ishi.lima
4 days ago by
leandro.ishi.lima20 wrote:

Dear MGlenn,

It seems to me that you might be wishing to find condition-specific (or species-specific) SNPs. You could use perhaps kissDE (http://kissplice.prabi.fr/tools/kissDE/): kissDE is a R package, similar to DEseq, but which works on pairs of variants, and tests if a variant is enriched in one condition. It has been developped to work easily with KisSplice output. It can also work with a simple table of counts obtained by any other means. It requires at least two replicates per condition and at least two conditions.

In http://kissplice.prabi.fr/TWAS/ we show how you can intersect KisSplice2RefTranscriptome results with condition-specific SNPs by giving to K2rt the output of kissDE (look for the section "Running KissDE (optional)").

It might happen also that I misunderstood your message and my answer does not help you. In this case, please point this out to me, and I am going to further redirect your message to people that are more used to work with SNPs and the KisSplice pipeline...

Kind regards.

ADD COMMENTlink written 4 days ago by leandro.ishi.lima20
0
gravatar for TrentGenomics
4 days ago by
TrentGenomics20 wrote:

Thanks again for getting back to me so quickly.

I'm concerned with finding SNPs between conditions, rather than within a condition. For example, if I consider each of my two species a separate condition (species 1 = cond 1, species 2 = cond 2) , then I would like to find SNPs between those conditions, rather than within the condition like kissDE does.

kissDE would tell me which SNPs are enriched in one condition (one species), but instead I would like to find SNPs that occur between my two separate species (SNPs between species 1 and species 2), and see which of those SNPs would cause an amino acid change. Basically, I want the SNPs to highlight functional differences between my two species. I would like to see which SNPs would result in an amino acid sequence if a SNP allele from species 1 was introgressed into species 2.

Does that make sense? Sorry if I'm being confusing. Again, thanks for your help.

ADD COMMENTlink modified 2 days ago • written 4 days ago by TrentGenomics20
0
gravatar for vincent.lacroix
1 day ago by
vincent.lacroix50 wrote:

Dear user, Finding divergent sites between two species is what we did for D. mojavensis and D. arizonae in our paper. Like you, we had two paired-end samples for each species. I think that KissDE fits your needs.

Below is an example of the KisSplice output we obtain: >bcc_9|Cycle_0|Type_0a|upper_path_Length_83|C1_0|C2_0|C3_0|C4_0|C5_86|C6_187|C7_102|C8_161|Q1_0|Q2_0|Q3_0|Q4_0|Q5_60|Q6_59|Q7_58|Q8_59|rank_1.00000 CCTCAAGCAATGTCCATGCGACTTATTGAAAAATTTACACTGGAATTAGTCCAAGCGTCCATTGAGAACGACAAAAGACTGCA >bcc_9|Cycle_0|Type_0a|lower_path_Length_83|C1_80|C2_165|C3_100|C4_168|C5_0|C6_0|C7_0|C8_0|Q1_59|Q2_59|Q3_61|Q4_58|Q5_0|Q6_0|Q7_0|Q8_0|rank_1.00000 CCTCAAGCAATGTCCATGCGACTTATTGAAAAATTTACACTAGAATTAGTCCAAGCGTCCATTGAGAACGACAAAAGACTGCA

The "SNP" is G/A. The upper path corresponds to G. The lower path to A. C1 to C4 correspond to read counts in mojavensis. C5 to C8 correspond to read counts in arizonae. G is supported by 0 reads in mojavensis, and by (86+187, 102+161) reads in arizonae. A is supported by ~(80+165, 100+168) reads in mojavensis and 0 reads in arizonae. This "SNP" is clearly species specific. One allele is present in one species. The other allele is present in the other species. If you test it in KissDE, you will find a p-value close to 0, and a Deltaf (difference of allele frequency) of 1.

I would recommend you to run kissDE and then re-run the latest version of kissplice2reftranscriptome (v1.3.2) on your dataset. We added two columns in the output: the p-value and the DeltaF. Selecting the SNPs with a DeltaF of 1 or -1 seems to be what you need.

Note that if your two species diverged very recently, then letting the default value k=41 may be a good starting point. This will however lead you to miss the SNPs located less than 41nt apart. I would then recommend trying to decrease k to 25. This will give you more SNPs, in particular in regions which evolved faster since the speciation. Calibrating k using one gene for which you have some expectation may be a good pragmatic approach.

Best,

Vincent

ADD COMMENTlink written 1 day ago by vincent.lacroix50
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 599 users visited in the last hour