KisSplice/KissDE/kissplice2reftranscriptome filtering advice. Interpretation of the results.
1
0
Entering edit mode
2.3 years ago
aaduu • 0

Hello, everyone!

I couldn't find an explanation online or in the manuals for KisSplice and kissplice2reftranscriptome. The FAQ gave some explanations, but for me some things are still unclear.

First, here is example of the SNPs I got after running KisSplice/kissplice2reftranscriptome/KissDE:

TRINITY_DN12819_c0_g1_i1        bcc_346265|Cycle_0|Type_0a      True    100     188     GTG     GGG     V       G       True    False   False   False   100.0|100.0|100.0|100.0|100.0|100.0|100.0|100.0|100.0|100.0|100.0|100.0|100.0|100.0|100.0|100.0|100.0|100.0|100.0|100.0|0.0|0.0|0.0|0.0|0.0|0.0|0.0|0.0|0.0|0.0|0.0|0.0|0.0|0.0|0.0|0.0|0.0|0.0|0.0|0.0|0.0|0.0|0.0|0.0|0.0|0.0|0.0|0.0|0.0|0.0|0.0|0.0|0.0|0.0|0.0|0.0|0.0|0.0|0.0|0.0 C1_11|C2_9|C3_11|C4_5|C5_8|C6_4|C7_10|C8_8|C9_20|C10_11|C11_5|C12_6|C13_11|C14_5|C15_5|C16_4|C17_10|C18_4|C19_4|C20_6|C21_0|C22_0|C23_0|C24_0|C25_0|C26_0|C27_0|C28_0|C29_0|C30_0|C31_0|C32_0|C33_0|C34_0|C35_0|C36_0|C37_0|C38_0|C39_0|C40_0|C41_0|C42_0|C43_0|C44_0|C45_0|C46_0|C47_0|C48_0|C49_0|C50_0|C51_0|C52_0|C53_0|C54_0|C55_0|C56_0|C57_0|C58_0|C59_0|C60_0   C1_0|C2_0|C3_0|C4_0|C5_0|C6_0|C7_0|C8_0|C9_0|C10_0|C11_0|C12_0|C13_0|C14_0|C15_0|C16_0|C17_0|C18_0|C19_0|C20_0|C21_10|C22_6|C23_0|C24_0|C25_0|C26_0|C27_25|C28_11|C29_0|C30_0|C31_0|C32_0|C33_35|C34_12|C35_14|C36_11|C37_21|C38_6|C39_16|C40_3|C41_22|C42_12|C43_0|C44_0|C45_0|C46_0|C47_36|C48_15|C49_0|C50_0|C51_0|C52_0|C53_23|C54_11|C55_7|C56_1|C57_9|C58_6|C59_12|C60_2  True    1.73635024831182e-13    -1

TRINITY_DN12819_c0_g1_i1        bcc_346265|Cycle_1|Type_0a      True    100     188     GTG     GCG     V       A       True    False   False   False   100.0|100.0|100.0|100.0|57.14|50.0|62.5|61.54|100.0|100.0|100.0|100.0|100.0|100.0|31.25|100.0|66.67|50.0|100.0|100.0|0.0|0.0|0.0|0.0|0.0|0.0|0.0|0.0|0.0|0.0|0.0|0.0|0.0|0.0|0.0|0.0|0.0|0.0|0.0|0.0|0.0|0.0|0.0|0.0|0.0|0.0|0.0|0.0|0.0|0.0|0.0|0.0|0.0|0.0|0.0|0.0|0.0|0.0|0.0|0.0    C1_11|C2_9|C3_11|C4_5|C5_8|C6_4|C7_10|C8_8|C9_20|C10_11|C11_5|C12_6|C13_11|C14_5|C15_5|C16_4|C17_10|C18_4|C19_4|C20_6|C21_0|C22_0|C23_0|C24_0|C25_0|C26_0|C27_0|C28_0|C29_0|C30_0|C31_0|C32_0|C33_0|C34_0|C35_0|C36_0|C37_0|C38_0|C39_0|C40_0|C41_0|C42_0|C43_0|C44_0|C45_0|C46_0|C47_0|C48_0|C49_0|C50_0|C51_0|C52_0|C53_0|C54_0|C55_0|C56_0|C57_0|C58_0|C59_0|C60_0   C1_0|C2_0|C3_0|C4_0|C5_6|C6_4|C7_6|C8_5|C9_0|C10_0|C11_0|C12_0|C13_0|C14_0|C15_11|C16_0|C17_5|C18_4|C19_0|C20_0|C21_8|C22_8|C23_2|C24_1|C25_8|C26_5|C27_0|C28_0|C29_7|C30_5|C31_0|C32_0|C33_0|C34_0|C35_0|C36_0|C37_6|C38_5|C39_13|C40_9|C41_11|C42_8|C43_6|C44_2|C45_8|C46_5|C47_0|C48_0|C49_7|C50_6|C51_0|C52_0|C53_0|C54_0|C55_0|C56_0|C57_9|C58_5|C59_8|C60_3       True    0       -0.8202

TRINITY_DN12819_c0_g1_i1        bcc_346265|Cycle_3|Type_0a      True    100     188     GGG     GCG     G       A       True    False   False   False   0.0|0.0|0.0|0.0|0.0|0.0|0.0|0.0|0.0|0.0|0.0|0.0|0.0|0.0|0.0|0.0|0.0|0.0|0.0|0.0|55.56|42.86|0.0|0.0|0.0|0.0|100.0|100.0|0.0|0.0|0.0|0.0|100.0|100.0|100.0|100.0|77.78|54.55|55.17|25.0|66.67|60.0|0.0|0.0|0.0|0.0|100.0|100.0|0.0|0.0|0.0|0.0|100.0|100.0|100.0|100.0|50.0|54.55|60.0|40.0      C1_0|C2_0|C3_0|C4_0|C5_0|C6_0|C7_0|C8_0|C9_0|C10_0|C11_0|C12_0|C13_0|C14_0|C15_0|C16_0|C17_0|C18_0|C19_0|C20_0|C21_10|C22_6|C23_0|C24_0|C25_0|C26_0|C27_25|C28_11|C29_0|C30_0|C31_0|C32_0|C33_35|C34_12|C35_14|C36_11|C37_21|C38_6|C39_16|C40_3|C41_22|C42_12|C43_0|C44_0|C45_0|C46_0|C47_36|C48_15|C49_0|C50_0|C51_0|C52_0|C53_23|C54_11|C55_7|C56_1|C57_9|C58_6|C59_12|C60_2  C1_0|C2_0|C3_0|C4_0|C5_6|C6_4|C7_6|C8_5|C9_0|C10_0|C11_0|C12_0|C13_0|C14_0|C15_11|C16_0|C17_5|C18_4|C19_0|C20_0|C21_8|C22_8|C23_2|C24_1|C25_8|C26_5|C27_0|C28_0|C29_7|C30_5|C31_0|C32_0|C33_0|C34_0|C35_0|C36_0|C37_6|C38_5|C39_13|C40_9|C41_11|C42_8|C43_6|C44_2|C45_8|C46_5|C47_0|C48_0|C49_7|C50_6|C51_0|C52_0|C53_0|C54_0|C55_0|C56_0|C57_9|C58_5|C59_8|C60_3       True    0.000872411136611906    0.6704

I have several problems/questions regarding this output:

1) Should I consider these as 3 different SNPs at the same location, or these are 3 versions of the same SNP?

What confuses me are the identifiers bcc_346265|Cycle_0|Type_0a, bcc_346265|Cycle_1|Type_0a, bcc_346265|Cycle_3|Type_0a. The BCC part is the same, which makes sense since according to FAQ BCCs are a set of overlapping variations. The Cycle (the bubble identifier) part is different. How to interpret and filter such events? How do I decide which ones to keep?

2) The first two events bcc_346265|Cycle_0 and bcc_346265|Cycle_1 have the same counts, but different frequencies. Why is that? And do these counts represent real number of reads, or some normalised value? I assume that they are normalised with DESeq2 method, but I'm not sure.

3) The first two events have different p-value and different DeltaF. For some reason though, the event with the smallest DeltaF, biggest difference between conditions, has a p-value that is worse than the one with higher DeltaF. How to interpret this and how to decide between such events?

4) I would like to compare KisSplice and GATK. The results that are posted here based on the placement of KisSplice events on a Trinity assembly. I have an annotated genome assembly though, and would like to know whether it makes sense to use kissplice2reftranscriptome with CDS sequences from genome annotation software?

kissde kissDE kissplice2reftranscriptome KisSplice kissplice • 510 views
ADD COMMENT
0
Entering edit mode
2.2 years ago

Dear user,

You found a SNP with 3 alleles. KisSplice is pairwise, so it reports it for each pair of allele (3 times in total). I agree this is not very convenient, but this case of 3 alleles is quite rare and we did not take it into account for now. However, I think that you have all the information in the current output to understand what is going on.

bcc_346265|Cycle_0 is GTG Vs GGG

bcc_346265|Cycle_1 is GTG Vs GCG

bcc_346265|Cycle_3 is GGG Vs GCG

All 3 are at the same location on the transcript (position 188).

The counts in column 15 and 16 are indeed read counts (resp of the first and second allele), and they enable you to derive allele frequencies. The allele frequencies in column 14 are pairwise allele frequencies. In your case, they should not be used. I think you should recalculate the allele frequencies based on the counts of the 3 alleles. You have a total of 60 samples. If we focus on your first sample (C1), then the counts are the following:

GTG: 11 (column 15, cycle 0)

GGG: 0 (column 16, cycle 0)

GCG: 0 (column 16, cycle 1)

If we focus on sample 5 (C5), the counts are the following:

GTG: 8

GGG:0

GCG: 6

The allele frequency seems to change between sample 1 and sample 5. Actually, sample 1-4 are homozygous (100% GTG). Samples 5-8 seem to be heterozygous (50%GTG 50%GCG). I do not know to what biological conditions your samples correspond to. My guess is that the 1st 20 samples are different from the last 40 samples.

Hopefully, with these explanations, you should be able to derive allele frequencies for all your samples and understand what is going on. Using the kissplice2counts function in kissDE should probably help you to manipulate the counts and plot what you want.

Concerning your other question, the answer is yes, I think it is possible to use k2rt with CDS sequences obtained from other means. Running TransDecoder on these sequences should give you the correct format for k2rt.

Best,

Vincent

ADD COMMENT

Login before adding your answer.

Traffic: 2450 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6