Question

Creation of Transcript expression file

0

Entering edit mode

6.7 years ago

mail2steff ▴ 70

Dear team

I am analysing AS events in Arabidopsis thaliana using SUPPA. I predicted AS_events using generateEvents option. For calculating PSI (next step), it requires Transcript expression file. But I do not know from where I can get the Transcript expression file for my sample? Can anyone help me in this issue? Thank you in advance

In SUPPA documentation, They have given the following explanation;

The transcript expression file is a tab separated file where each line provides the estimated abundance of each transcript (ideally in TPM units). This file might contain multiple columns with the expression values in different samples. The expression file must have a header with the naming of the different expression fields, i.e., the sample name of each expression value.

An example of a transcript expression file for one single sample:

sample1
transcript1 <expression>
transcript1 <expression>
transcript1 <expression>

A transcript expression file with multiple samples:

sample1 sample2 sample3 sample4
transcript1 <expression>    <expression>    <expression>    <expression>
transcript2 <expression>    <expression>    <expression>    <expression>
transcript3 <expression>    <expression>    <expression>    <expression>

Transcript expression file SUPPA rna-seq • 2.2k views

ADD COMMENT • link updated 6.7 years ago by Satyajeet Khare ★ 1.6k • written 6.7 years ago by mail2steff ▴ 70

0

Entering edit mode

What does PSI stand for?

ADD REPLY • link 6.7 years ago by Matteo Schiavinato ★ 3.6k

0

Entering edit mode

It refers to the magnitude of splicing change (ΔPSI) in the case of SUPPA

ADD REPLY • link 6.7 years ago by mail2steff ▴ 70

0

Entering edit mode

If you don't have RNAseq reads to map, you don't have an expression profile in TPM. You might get it from microarrays. How is your experiment set up?

ADD REPLY • link 6.7 years ago by Matteo Schiavinato ★ 3.6k

0

Entering edit mode

I have the bam format of my files. I am looking for AE in different organs of At. for this analysis, I am using SUPPA.

ADD REPLY • link 6.7 years ago by mail2steff ▴ 70

0

Entering edit mode

featureCounts, htseq, cufflinks just to name a few!

ADD REPLY • link 6.7 years ago by Matteo Schiavinato ★ 3.6k

0

Entering edit mode

Output of featureCount is

    # Program:featureCounts v1.5.3; Command:"./featureCounts" "-a" "Arabidopsis_thaliana.TAIR10.36.gtf" "-o" "counts.txt" "accepted_hits_Bur-0.bam" 
Geneid  Chr Start   End Strand  Length  accepted_hits_Bur-0.bam
AT1G01010   1;1;1;1;1;1 3631;3996;4486;4706;5174;5439   3913;4276;4605;5095;5326;5899   +;+;+;+;+;+ 1688    1061
AT1G01020   1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1 6788;6788;6788;6788;6788;6788;7157;7157;7157;7157;7157;7157;7384;7384;7384;7384;7564;7564;7564;7564;7564;7564;7762;7762;7762;7762;7762;7942;7942;7942;7942;7942;8236;8236;8236;8236;8236;8236;8417;8417;8417;8417;8571;8571;8571;8594;8594;8594 7069;7069;7069;7069;7069;7069;7232;7232;7232;7450;7232;7450;7450;7450;7450;7450;7649;7649;7649;7649;7649;7649;7835;7835;7835;7835;7835;7987;7987;7987;7987;7987;8325;8464;8325;8325;8464;8325;8464;8464;8464;8464;9130;8737;9130;9130;9130;8737 -;-;-;-;-;-;-;-;-;-;-;-;-;-;-;-;-;-;-;-;-;-;-;-;-;-;-;-;-;-;-;-;-;-;-;-;-;-;-;-;-;-;-;-;-;-;-;- 1571    235

But from this how can I get transcript expression file?

ADD REPLY • link 6.7 years ago by mail2steff ▴ 70

0

Entering edit mode

I am not here to suggest you commands to copy-paste in your terminal: there are manuals, literature, file formats and specifications that you have to read to understand what is needed for you.

Quoting you:

An example of a transcript expression file for one single sample:

sample1
transcript1 <expression>
transcript1 <expression>
transcript1 <expression>

From the output you pasted here you have all you need. Plus, I am pretty sure that there is a function in featureCounts to convert to expression in TPM or FPKM (better the first).

ADD REPLY • link 6.7 years ago by Matteo Schiavinato ★ 3.6k

0

Entering edit mode

Hi mail2steff,

I am doing a similar kind of analysis in rice but am an error after running the following command: python suppa.py generateEvents -i ../../../Splicing/Alternate_Acceptor_and_Donor/all.gff3 -o all.events -e SE SS MX RI FL -f ioe

The error is: Traceback (most recent call last): File "suppa.py", line 14, in <module> import significanceCalculator as diffSplice File "/Backup/Splicing/Suppa/SUPPA-master/significanceCalculator.py", line 15, in <module> from lib.diff_tools import multiple_conditions_analysis File "/Backup/Splicing/Suppa/SUPPA-master/lib/diff_tools.py", line 30 print(prefix, " ", "%d / %d. " % (i+1, lst_len), "%.2f%% completed." % ((i/lst_len)*100), end="\r", flush=True) ^ SyntaxError: invalid syntax

Can you please help me with the same?

ADD REPLY • link 5.3 years ago by rachitasrivastava7 • 0

score 1 · Answer 1 · 2017-08-31

1

Entering edit mode

6.7 years ago

Satyajeet Khare ★ 1.6k

Assuming that you either have alignment output (SAM/BAM) or .fastq files, you can generate a count matrix for transcript expression using prepDE.py or featureCounts. You can calculate normalized raw counts or you can calculate TPMs. featureCounts in R will also create a gene length vector which you can use to calculate TPM. Just make sure that you are using transcript as a feature and not gene.