Question: Ambiguous results of RNA-Seq data using stringtie with edgeR
gravatar for ilovesuperheroes1993
4 weeks ago by
ilovesuperheroes19930 wrote:

Hi, I have human RNA-Seq (Total RNA) data of 24 paired samples. My samples are labelled as S1,S2,S3 and so on. S1-S2 form a pair and S3-S4 for another (odd represents Normal samples and even diseased samples). I have used STAR to align the reads and have obtained an overall alignment of > 90% for most cases. I have then used Stringtie for transcript assembly using the following command:

stringtie -p 3 -e -B -G reference_annotation_file -l S1 --rf -o ..S1.transcripts.gtf S1Aligned.sortedByCoord.out.quality.filtered.reads.bam

This command has been used individually for all the samples.

Then I have used the online script to convert the fpkm values of reads to raw counts. (

Thereafter I have used edgeR to perform paired differential analysis for the data.

I am facing several ambiguities with the data:

  1. The annotation file was downloaded from Gencode (hg19 version). I had used the complete annotation including contigs, scaffolds, patch. Now, I have seen that there are several genes which have names but different gene IDs, let's say gene1_ID1 and gene1_ID2. In case of some of these genes, I find that both gene1_ID1 and gene1_ID2 are differential, but their logFC and FDR values are differing significantly.

  2. My set of 24 samples have been classified into 2 groups, based on an additional factor being present and absent in the patients. Differential analysis was performed separately in the two groups and the results compared. In these I am finding that for some genes, gene_ID1 is differential in the total set (24 samples), while the same gene with different ID (gene_ID2) is found to be uniquely differential in one of the two groups (of 12 samples). How can this be?

  3. In case of a gene with 2 unique IDs, I find that one is upregulated and the other is downregulated, though both are significant. I shall give the example here and provide the links for the gene with 2 IDs.



  4. One important gene that is known to be upregulated in my disease, as supported by numerous publications, has been upregulated in my sample set. However the logFC of this gene is only 0.5.

These results have left me confused and wondering whether I have made some mistakes in my workflow. I am relatively new to RNA-Seq data. In case I have not been able to clarify my doubts properly, please feel free to probe me further.

edger rna-seq stringtie • 97 views
ADD COMMENTlink modified 4 weeks ago by h.mon30k • written 4 weeks ago by ilovesuperheroes19930

Could you clarify if you did any transcript assembly with StringTie? Also how did you take into account the paired nature of your samples in your DE analysis? Could you post a genome browser screenshot of the HOX4 gene problem?

ADD REPLYlink written 4 weeks ago by kristoffer.vittingseerup3.4k

Yes I did transcript assembly with StringTie using the command mentioned above. This generated the gtf files for each sample which were used as inputs for converting the fpkms to raw counts using Was I required to do anything else in this step?

The paired nature of the samples was known prior to designing the experiment. So there is no issue regarding that.

As for the Hoxc4 genes, right now I do not have access to my bam files as I am stuck at home due to lockdown. However I can state that the two HOXC4 genes (links given above) had logFC (-1.23 and 2.49) with FDRs (0.0038 and 0.048) respectively.

ADD REPLYlink written 4 weeks ago by ilovesuperheroes19930
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 738 users visited in the last hour