Question: How to get read counts on transcript level using featurecounts?
0
gravatar for newbie
8 months ago by
newbie90
newbie90 wrote:

I have used featureCounts to get the read counts one a gene level with the following command:

featureCounts -a gencode.gtf -F GTF -p -s 2 -T 8 -o sample.txt sample.sorted.bam

I'm interested in extracting the counts on a transcript level. My data is paired end and reverse stranded. I'm a bit confused with command I have to use to get the counts on transcript level featureCounts manual

Can anyone please tell me how to get the counts on transcript level? thanq

ADD COMMENTlink modified 8 months ago by Gordon Smyth2.0k • written 8 months ago by newbie90
10
gravatar for Gordon Smyth
8 months ago by
Gordon Smyth2.0k
Australia
Gordon Smyth2.0k wrote:

I am one of the featureCounts authors.

Do you want do conduct analyses for genes, or do you need individual results for all the isoform variations of each gene? The "isoforms" tend to be called "transcripts" in the RNA-seq literature. The featureCounts help page you link to is designed to generate gene-level counts. We have never recommended the use of featureCounts to produce transcript-level counts.

A gene level analysis is easiest to interpret and is appropriate for most analyses where you want to interpret the results in terms of pathways, GO terms, KEGG and so on. featureCounts will give you gene level counts very quickly and easily. Alternatively, featureCounts can give you counts for each exon in each gene.

If you do really want to quantify all the different transcripts (isoforms) for each gene, then this cannot be done by featureCounts or by any other read counting software. While featureCounts can be run with transcript annotation, the results won't be particularly useful. Since the transcripts are heavily overlapping, with many transcripts sharing the same exons, featureCounts cannot tell which transcript to assign each read to. It therefore has to either discard all the multi-assigned reads or alternatively assign them multiple times to all the overlapping transcripts -- and either choice is bad.

If you really do want the (considerable) extra complexity of transcript-level analysis, then I would run Salmon and read the results into edgeR using catchSalmon. Salmon can assign reads to transcripts on a probabilistic basis. Or alternatively, do an exon-level analysis. That allows you see what is going on, although the results are not assembled into transcripts. Exon-level may be a safer approach if your data has expressed transcripts that have not been previously annotated.

ADD COMMENTlink modified 8 months ago • written 8 months ago by Gordon Smyth2.0k
1

The estimation of transcript-level counts is necessary to obtain accurate gene-level abundance estimates, so it should be done even if only interested in genes and not transcripts per se. See, e.g.

ADD REPLYlink written 8 months ago by Lior Pachter540
3
gravatar for Asaf
8 months ago by
Asaf8.4k
Israel
Asaf8.4k wrote:

You'll probably want to use RSEM or stringTie

ADD COMMENTlink written 8 months ago by Asaf8.4k

you mean I can't get transcript level read counts with featureCounts?

ADD REPLYlink written 8 months ago by newbie90
1

If I read their paper right then no, you can't. You'll have to use methods that makes the best guess for a transcript abundance. Take a look at RSEM paper

ADD REPLYlink written 8 months ago by Asaf8.4k

small question. I have the sorted.bam files from Hisat2 alignment. Can I use those bams for RSEM to calculate transcript expression?

ADD REPLYlink modified 8 months ago • written 8 months ago by newbie90
1

You will have needed to align to the transcriptome, which one can certainly do with hisat2 but is rather non-standard. You will also have needed to heavily tweak the settings to prevent splicing and aligning with an InDel, since RSEM can't handle these. Also, you'd need to have Hisat2 produce more alignments than you'll get by default. In other words, while it might be theoretically possible to use RSEM have Hisat2, it'd require a fair bit of work and it'd be easier to just start over from the fastq files.

If you'd like to use RSEM, follow the instructions there for using STAR upstream.

ADD REPLYlink written 8 months ago by Devon Ryan97k

Yes you can. See http://deweylab.biostat.wisc.edu/rsem/rsem-calculate-expression.html

ADD REPLYlink written 8 months ago by Lior Pachter540
2
gravatar for ATpoint
8 months ago by
ATpoint40k
Germany
ATpoint40k wrote:

I vote for quantification of reads against a transcriptome with salmon, followed by analysis e.g. with swish. Salmon is part of the extended "ecosystem" that is maintained by the labs involving Rob Patro (salmon developer) and Mike Love (tximport, DESeq2, FishPond/swish developer/PI), so I guess it is most streamlined to use this one. In any case tools like salmon that use pseudoalignment generally outperform traditional alignment when it comres to RNA-seq quantification accuracy, here is another very recent atricle benchmarking this. https://academic.oup.com/gigascience/article/8/12/giz145/5663671

Here is the salmon paper: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5600148/

ADD COMMENTlink modified 8 months ago • written 8 months ago by ATpoint40k

The paper linked to concludes that:

"Because both Kallisto and Salmon performed highly concordantly with the ground truth, they were also highly concordant with each other, as previously reported [34,35]. They cluster together in the method similarity matrix before clustering with the ground truth (Fig. 4). However, Kallisto was faster than Salmon, used less memory (Fig. 7), and performed better at sample- and gene-level comparison when examining Spearman’s correlation, Euclidean distance, mean squared error, and adjusted R2 values as compared with the ground truth, especially for the 2 reverse-stranded datasets (Additional File 7)."

ADD REPLYlink written 8 months ago by Lior Pachter540

Hey Lior, what is the 'ground truth'?

ADD REPLYlink written 8 months ago by Kevin Blighe66k

Your question is answered in detail in the paper (by the way, to be clear, it is not my paper and I had nothing to do with it).

ADD REPLYlink modified 8 months ago • written 8 months ago by Lior Pachter540

Thanks - I will read it on my next travel itinerary

ADD REPLYlink written 8 months ago by Kevin Blighe66k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1019 users visited in the last hour