Why do most papers with bioinformatic results present genes as though there is only 1 isoform in the results, unless the paper specifically addresses alternate isoforms? (RNA-seq question)
0
0
Entering edit mode
6.3 years ago
baunruh ▴ 10

Hello!

I was recently working on a ribosomal profiling project, essential you use RNA-seq but only on the RNA attached to ribosomes at a given time. So the first thing I did was align this data to mitochondrial DNA and ribosomal DNA to filter these 2 sets of data out. Then I went ahead and aligned this data that didnt map to either of those to the mm9 annotation, from there I used homer to quantify the repeats and get my data. When I quantified and aligned I was using the mm9-UCSC gene annotation GTF file. However, this gives data from a number of alternative splicing events, so if I have Gene A, I have actually 3 sets of data for the gene called Gene A. Now these 3 sets of data in my set are actually very similar and rarely vary in total count. The PROBLEM is when I look at bioinformatics papers that report this kind of data I see Gene A, and it is only listed 1 time with the graph they got from their results no mention of the isoforms or alternate spliced exons. Like what am I missing do they average these 3 genes or are they using some sort of GTF that doesn't include these or am I losing my mind.

Also I am curious about "scoring" and if this is related to my problem, I received data for about 55,000 genes (most of which had multiple splicing events) at varying levels, however I have heard that the data from some of these may not be reliable if they are not scored correctly. How do I fix this or am I just wrong here?

RNA-Seq Assembly alignment next-gen software error • 1.6k views
ADD COMMENT
2
Entering edit mode

Please consider using a more concise title.

Most of the time read counting is done at the exon level but the counts are summarized at gene_id level. That is why you see only one count per gene.

ADD REPLY
0
Entering edit mode

How can I summarize the counts at the gene ID level?

ADD REPLY
1
Entering edit mode

With featureCounts there is an option (-g gene_id) to summarize counts at gene level.

ADD REPLY
0
Entering edit mode

Thanks for your help friend!

ADD REPLY
0
Entering edit mode

Nevermind, I think I figured it out. They just report all three isoforms on one graph and call it a gene_set.

ADD REPLY
1
Entering edit mode

The PROBLEM is when I look at bioinformatics papers

https://xkcd.com/285/

(https://xkcd.com/285/)

ADD REPLY
0
Entering edit mode

https://www.ncbi.nlm.nih.gov/pubmed/29346549 look at the supplemental data table!!!

BELIEVE ME

ADD REPLY

Login before adding your answer.

Traffic: 2165 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6