Question: Why is my features different between featurecounts and cuffnorm?
0
gravatar for obizx002
11 months ago by
obizx0020
obizx0020 wrote:

So im very new to this whole deal, and very new to computer science stuff in general. I trying to do RNA seq computation and seem to be running into an unusual problem (i think). I am running Slurm jobs in the terminal and my end results are weird. The job is a whole pipeline using bowtie2, then tophat, then cufflinks, cuff quant, and then featurecounts and cuff norm. The idea is to take the raw counts from featurecounts and use it in edgeR. I run cuffnorms at the end to get FPKM counts, just to get an idea before starting edgeR. I noticed that feature counts is outputting counts with about 25,000 gene or features, yet cuffnorms is outputing 57,000 gene or features. The whole pipeline is using the same .gff3 and .fa files from ensembl (mouse). Does anyone know why this is happening?

rna-seq alignment genome gene • 481 views
ADD COMMENTlink modified 11 months ago by igor8.8k • written 11 months ago by obizx0020
3
gravatar for igor
11 months ago by
igor8.8k
United States
igor8.8k wrote:

In general, some of the tools you are using such as Tophat and Cufflinks have been replaced by newer alternatives. I would suggest you look into some previous discussions here, such as:

Additionally:

The job is a whole pipeline using bowtie2, then tophat, then cufflinks, cuff quant, and then featurecounts and cuff norm.

Some of those steps are actually redundant. You only need Tophat and featureCounts to get the necessary results.

To answer your actual question:

I noticed that feature counts is outputting counts with about 25,000 gene or features, yet cuffnorms is outputing 57,000 gene or features. The whole pipeline is using the same .gff3 and .fa files from ensembl (mouse). Does anyone know why this is happening?

What is probably happening is that cufflinks adds unannotated transcripts.

ADD COMMENTlink written 11 months ago by igor8.8k

Predicting novel transcripts is the whole point of using cufflinks

ADD REPLYlink written 11 months ago by kristoffer.vittingseerup2.6k

Yes, but many people run it even if they are not interested in them. For example, the original poster expects the output to match the original GFF file (known genes only).

ADD REPLYlink written 11 months ago by igor8.8k

If that is the case I strongly reccomend using Salmon or Kallisto. Kallisto can be downloaded from here and the manual for running Kallisto can be found here. Salmon can be downloaded from here and a manual for running Salmon can be found here. I actually wrote a entire section about the considerations for usage of different quantification tools recently.

ADD REPLYlink written 11 months ago by kristoffer.vittingseerup2.6k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1872 users visited in the last hour