Question: FPKM matrix after stringtie using ballgown
0
gravatar for Vasu
2.2 years ago by
Vasu490
Vasu490 wrote:

Hi,

After stringtie step, I estimated transcript abundances and created table counts for Ballgown.

In a folder "ballgown", I have the all the sample subdirectories inside this folder "ballgown". Each sample subdirectory have following files:

e2t.ctab  
e_data.ctab  
i2t.ctab  
i_data.ctab  
TCGA-XX-XXX-XX_GRCh38.gtf  
t_data.ctab

To get the FPKM matrix I did like below:

library(ballgown)
my.data <- ballgown(dataDir = "ballgown", samplePattern = "TCGA", meas="all")

Sun Aug 25 14:12:19 2018
Sun Aug 25 14:12:19 2018: Reading linking tables
Sun Aug 25 14:12:29 2018: Reading intron data files
Sun Aug 25 15:00:38 2018: Merging intron data
Killed !

I tried couple of times but I got the same error. What could be wrong here? Did I do something wrong?

Thank you

rna-seq fpkm ballgown stringtie • 1.6k views
ADD COMMENTlink modified 2.2 years ago by Adrian Pelin2.4k • written 2.2 years ago by Vasu490

Hi, See Adrian's answer below. You don't specifically need to run ballgown if you are just interested in FPKM values.

Btw, if you had large no. of samples, ballgown might have died of memory. Recently I was attempting a ballgown run on >80 samples on a desktop (8Gb RAM) where it died, but ran as expected on a server.

ADD REPLYlink written 2.2 years ago by Amitm2.0k

Yes I understand that. I have almost 1200 samples. I didn't use -A argument with stringtie. I have the ballgown outputs. Should I run stringtie again with -A?

ADD REPLYlink written 2.2 years ago by Vasu490

That would be re-running StringTie for all your 1200 samp. again. Probably it could be easier if you could run ballgown on a system with high RAM available. If not and time notwithstanding, you could re-run StringTie with -A param.

There could be other solutions depending what you are question is. Are you looking for transcript isoform level quantification specifically. If not then you could generate count (gene level) data from BAMs using HTSeq-count or featureCounts tools. These would have much less demand for compute resource.

There is another way of checking isoform level quantification without using StringTie (or any other transcriptome assembly tool). You could use tools like kallisto.

ADD REPLYlink written 2.2 years ago by Amitm2.0k
1

Or can I use t_data.ctab which has transcripts and FPKM also. I can cut the two columns for each sample and then join all into a single file?

Like mentioned here FPKM matrix (Check the last comment)

ADD REPLYlink modified 2.2 years ago • written 2.2 years ago by Vasu490

brilliant! That should work. I was in the impression somehow that they are binary files.

ADD REPLYlink written 2.2 years ago by Amitm2.0k

Thankyou, will follow that.

ADD REPLYlink modified 2.2 years ago • written 2.2 years ago by Vasu490
1
gravatar for Adrian Pelin
2.2 years ago by
Adrian Pelin2.4k
Canada
Adrian Pelin2.4k wrote:

Not sure you need ballgown to get FPKMs. My understanding is ballgown is needed for differential analysis, stringtie can output TPM and FPKM.

When I run stringite, I pass the argument "-A output_dir/gene_abundances.tsv", and the gene_abundances.tsv has FPKM and TPM.

Hope this helps, A

ADD COMMENTlink written 2.2 years ago by Adrian Pelin2.4k

But with stringtie I didn't pass that argument. I have the above mentioned files and trying to use ballgown to get FPKM matrix. It is mentioned in this FPKM matrix

ADD REPLYlink written 2.2 years ago by Vasu490
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1056 users visited in the last hour