Question: Different number of introns, exons, transcripts across RNA-Seq samples
gravatar for lakshmi9c
5 weeks ago by
lakshmi9c10 wrote:

Hello everyone,

I am working on RNA-Seq dataset that consists of two conditions- Control and Sample. My goal is to get differentially expressed genes between the two conditions, for which I am using ballgown package on R. My concern is when I try to load the control and samples onto a single ballgown object for DE analysis, it shows an error-

Error: intron ids were either not the same or not in the same order across samples. double check i_data.ctab for each sample.

The object isn't able to load since the IDs are different. Same is the case with exons and transcripts ids. Upon looking into the .ctab files, the exons, introns, transcripts differ across Control- Sample. But are same across the 3 controls. And they are same across the 5 samples. It is like so- Controls: exon 442247, introns 366654, transcripts 181466. Samples: exon 686514, introns 416565, transcripts 247538. The exons, introns, transcripts differ only between the two conditions. Hence, I am not able to progress from here. Can anyone please help me with as to why this is occuring? Why would the exons, introns, transcripts id be different for control & sample, since both control and sample RNA are collected from human patients. Please add a note on how I can solve this issue.

Any help is much appreciated.

sequencing rna-seq next-gen R • 166 views
ADD COMMENTlink modified 5 weeks ago • written 5 weeks ago by lakshmi9c10

Let me ask the same as here before answering: C: "fold change" question when using ballgown

In short, ballgown is for differential transcript expression. For standard DE analysis better use dedicated, better documented and established tools with awesome manuals and example code such as DESeq2, edgeR, limma.

ADD REPLYlink modified 5 weeks ago • written 5 weeks ago by ATpoint40k

Thanks ATpoint! Correct me if I am wrong, but doesn't isoform-level differential expression analysis apply for gene level too? That is the reason I'm using ballgown in my pipeline for differential gene expression analysis. Will I not get accurate/ correct results with ballgown, if my aim is to look for differently expressed genes between two conditions?

ADD REPLYlink written 5 weeks ago by lakshmi9c10

Ballgown still applies the inferior FPKM normalization afaik. I would use any of the above mentioned tools.

ADD REPLYlink written 5 weeks ago by ATpoint40k

I know ballgown uses FPKM normalization, and DESeq2 and edgeR use raw read counts. But what if I apply other normalization methods for these values after creating ballgown object? I am trying to use edgeR's TMM normalization method to get TMM-normalized fpkm values. Can I please get your insight on this?

ADD REPLYlink modified 5 weeks ago • written 5 weeks ago by lakshmi9c10

If you're already using edgeR why switch to Ballgown? Stick with edgeR or follow any of the tutorials on the bioconductor website. I also highly recommend to peruse the Hitchhiker's Guide to RNA-seq analysis

ADD REPLYlink written 5 weeks ago by Friederike6.4k

Hi Friederike, I am only using edgeR package for TMM normalisation to normalise the FPKM values present in the ballgown object. I read that once Ballgown object is created, I can use different packages for downstream analysis step. Isn't that correct? These FPKM values have huge range of variation, so I need to normalise it. Am I following the correct step?

ADD REPLYlink written 4 weeks ago by lakshmi9c10

You cannot "TMM-normalize FPKM values". That makes zero sense since they are already scaled for library size and gene length. Please either stick 100% to the ballgown manual or 100% to the edgeR manual, no custom approaches unless you are certain and confident with what you do. I strongly recommend to simply use edgeR on the raw gene level counts, following the manual and you can be almost certain that results will be solid and in line with current state-of-the-art recommendations. If you prefer ballgown I cannot comment on results but afaik it is for transcript level analysis, so be sure to check literature on whether people used it for gene level analysis.

ADD REPLYlink modified 4 weeks ago • written 4 weeks ago by ATpoint40k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1815 users visited in the last hour