Question: kissplice2reftranscriptome interpreting output
17 days ago by
rares_lucaciu


regarding the kissplice2reftranscriptome main output: for eg: I have: TRINITY_DN921_c0_g2_i1 bcc_10004|Cycle_0|Type_0a True 100.0 202 TAT CAT ... As is understandable the TAT is the reference codon and CAT is the alt. codon, and the "TAT" should be the 202-204 squences in the TRINITY_DN921_c0_g2_i1 sequence.

So I did this: samtools faidx 02_Trinity.fasta TRINITY_DN9185_c0_g2_i1:202-204

TRINITY_DN921_c0_g2_i1:202-204 TAA

In other cases for eg.: TRINITY_DN921_c0_g2_i1 bcc_10003|Cycle_0|Type_0a True 100.0 641 GGG GGC

TRINITY_DN9185_c0_g2_i1:639-641 GGC

Can somebody explain me what is the pattern there? Or how to find the exact position of the codon into the transcript? Thank you,

written 17 days ago by rares_lucaciu
14 days ago by
vincent.lacroix

Hi Lucaciu,

All the formats we use in our pipeline (.bed, .psl) are 0-based, hence the SNP position we output in the final table is also 0-based. If you want to use samtools faidx (which is 1-based), you should type :

samtools faidx 02_Trinity.fasta TRINITY_DN9185_c0_g2_i1:201-205

You will obtain 5 nucleotides, the central position being the SNP (202 in 0-based is 203 in 1-based).

For your specific example, since the SNP is in the first position of the codon, the codon should correspond to the last 3 nt of these 5nt, unless your ORF is on the minus strand, in which case your codon should correspond to the reverse complement of the first 3 nt.



written 14 days ago by vincent.lacroix
