Question: RNAseq - how to find novel transcripts in under different treatment using stringTie and gffcompare?
1
gravatar for jingjin2203
5 weeks ago by
jingjin220340
jingjin220340 wrote:

Hi all,

I was wondering how to find novel transcripts under different treatment conditions using stringTie?

I have 4 different treatments and 3 replications for each of the treatments in my RNAseq data. I have tried to merge the 12 gtf files generated by stringTie and compared the merged gtf to the reference gtf file using gffcompare. However, I am not sure what I should do if I would like to find out novel transcripts in different treatments. Should I combine the gtf files from the 3 replications for each treatment, and compare the combined gtf to reference gtf file? How can I make a comparison across the 4 different treatments? Does that make the question into finding out DEGs between different treatments?

Hope my silly questions make sense.

Thank you for your attention and help!

transcripts rna-seq • 165 views
ADD COMMENTlink modified 5 weeks ago by rmash0 • written 5 weeks ago by jingjin220340
0
gravatar for rmash
5 weeks ago by
rmash0
rmash0 wrote:

The stringTie pipeline generates a merged GTF which you then use to use as a reference to get transcript count tables for all your sample.

ADD COMMENTlink modified 5 weeks ago • written 5 weeks ago by rmash0

Thank you for your kind help! Really appreciated it! Do I have to use DESeq2 to get transcript count tables?

ADD REPLYlink written 5 weeks ago by jingjin220340

You have to run StringTie again on the merged GTF file. You can see the suggested DE pipeline for StringTie here along with more instructions on how to do the analysis.

ADD REPLYlink modified 5 weeks ago • written 5 weeks ago by kristoffer.vittingseerup2.2k

Thank you, Kristoffer! So, basically the process would be same for identifying novel transcripts and DEGs, is that correct?

ADD REPLYlink written 5 weeks ago by jingjin220340

Yes, except when you run on the merged GTF you run StringTie with the -eB options so that it will only quantify the isoforms in the GTF file (aka not look for new features this time around). Also note that instead of using the python script they provide for extracting quantification you can within R use tximport or IsoformSwitchAnalyzeR's importIsoformExpression().

ADD REPLYlink written 5 weeks ago by kristoffer.vittingseerup2.2k

Thanks a lot for your help! Really appreciated it! Just wanted to make sure I understand everything correctly, since my goal is to detect novel transcripts, -e (only estimate the abundance of given reference transcripts) should not be used, is that correct? Sorry about the silly questions, thanks again for your time and help!

ADD REPLYlink written 5 weeks ago by jingjin220340
1

You are right that the first time you run StringTie you do not want to use the -e option. But not using -e StringTie will predict novel transcripts. Afterwards you use StringTie --merge to concatenate the individual StringTie prediction into one combined set representing the known and novel transcripts from all samples. Then - to ensure you have quantified the same transcripts in all samples (else they are not comparable) - you run StringTie again on each sample using the GTF from the --merge run and with the -e option. The -e option will ensure you only quantify the transcripts in the GTF file - but since the gtf file both contain the known and novel transcripts this is exactly what you want. You can also read more about it here under option "B".

ADD REPLYlink written 5 weeks ago by kristoffer.vittingseerup2.2k

This is really helpful!! Thank you so much for help!!

ADD REPLYlink written 5 weeks ago by jingjin220340

Hi Kristoffer, sorry for keeping bugging you. I have successfully generated ctab files for each of my samples following your advice. I tried to analyze the data using IsoformSwitchAnalyzeR, but encountered some issue. I was wondering if you could help me fix it? I also had a question about which gtf file should be used as isoformExonAnnoation? The original one I used for the very first stringTie run? Or the merged gtf file from stringTie --merge? Thanks!!

aSwitchList <- importRdata( + isoformCountMatrix = stringTieQuant$counts, + isoformRepExpression = stringTieQuant$abundance, + designMatrix = myDesign, + isoformExonAnnoation = "merged.annotated.gtf", + isoformNtFasta = "scaffolds.fasta", + showProgress = FALSE + ) Step 1 of 6: Checking data... Step 2 of 6: Obtaining annotation... importing GTF (this may take a while) Step 3 of 6: Calculating gene expression and isoform fraction... 9520 ( 19.63%) isoforms were removed since they were not expressed in any samples. Error in sample.int(length(x), size, replace, prob) : invalid first argument In addition: Warning message: In importRdata(isoformCountMatrix = stringTieQuant$counts, isoformRepExpression = stringTieQuant$abundance, : No CDS annotation was found in the GTF files meaning ORFs could not be annotated. (But ORFs can still be predicted with the analyzeORF() function)

ADD REPLYlink modified 5 weeks ago • written 5 weeks ago by jingjin220340

I read ?importRdata in R, it says isoformExonAnnoation can either be "A string indicating the full path to the (gziped or unpacked) GTF file which have been quantified". What is a quantified GTF file?

ADD REPLYlink modified 5 weeks ago • written 5 weeks ago by jingjin220340

Hi Jingjin. Since it's not good practice to ask new questions as a comment (makes it impossible for other people to find answers) could you ask this as a new question. Alternatively you can also use the IsoformSwitchAnalyzeR google group. Before you post you do however need to make sure you have IsoformSwitchAnalyzeR >1.5.11.

ADD REPLYlink written 4 weeks ago by kristoffer.vittingseerup2.2k

Thanks, Kristoffer! Will do. And yes, I have IsoformSwitchAnalyzeR 1.6.0 installed.

ADD REPLYlink written 4 weeks ago by jingjin220340
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1433 users visited in the last hour