Question

RNA-Seq analysis for individual genes

0

Entering edit mode

7.8 years ago

sumithra.das ▴ 10

Hello All, I want to see differential expression of an individual gene (a single RNA binding protein), in different cancer vs normal conditions and further look into expression correlation with other splicing transcripts that are targets of this gene.

I want to work with level 3 data, i'm biologist and new to RNASeq. but i understand tools like DESeq 2, EdgeR are recommended. so is it possible to work without using R/bioconductor packages for DEG analysis ? and how should i proceed.

Thank you!

RNA-Seq sequence • 2.5k views

ADD COMMENT • link updated 7.7 years ago by james.lloyd ▴ 100 • written 7.8 years ago by sumithra.das ▴ 10

0

Entering edit mode

If you are only interested in the expression of specific genes you could check those out in one of the TCGA data portals (e.g. http://www.cbioportal.org/ ).

ADD REPLY • link 7.8 years ago by GenoMax 141k

0

Entering edit mode

Thank you for quick reply! Yes, i have looked into both cbioportal as well as cancer browser @ ucsc (https://genome-cancer.ucsc.edu/proj/site/hgHeatmap/). I see some mixed expression in different cancers. Apart from expression i also want to look into correlation analysis with its target splicing variant transcripts.

ADD REPLY • link 7.8 years ago by sumithra.das ▴ 10

score 0 · Answer 1 · 2016-08-02

0

Entering edit mode

7.7 years ago

james.lloyd ▴ 100

So, first off, do you want to look at changes of expression at the gene level or the transcript isoform level? If you only care about the gene level, then you could use DESeq2/EdgeR but I never use them. I think they need something to count reads first to do differential testing on first. But you need to be aware that gene level quantification should take into account changes at the isoform level (as diffecult as they are to estimate) so a tool like Cufflinks might be useful.

Cuffdiff will quantify at both the gene level and the transcript isoform level. Using kallisto/sailfish/salmon and then Sleuth (R package) might also be the way to go for transcript isoform level (and you can use this information to get gene level estimate as well). Using Cufflinks, you can avoid R but might need some programming to get at your answer.

I am not sure if that helps answer your question but hopefully will point you in the right direction.

ADD COMMENT • link 7.7 years ago by james.lloyd ▴ 100

0

Entering edit mode

thank you james, I am looking at transcript level expression, can i use directly the level 3 RSEM isoform values or is it good to take the raw counts and perform cufflinks and other packages as mentioned to understand expression and correlation. which would be a better input (RSEM/RPKM/TPM) for studying isoform expression.

ADD REPLY • link 7.7 years ago by sumithra.das ▴ 10

0

Entering edit mode

I have not personally used RSEM so I cannot comment on what is best to do with that output. I am not sure it can be used as input into Cufflinks. These can probably be inputted into something like DESeq2 or EdgeR but I have never done that analysis.

This raises another variation in tools for RNA-seq (even at isoform level). This is whether reads are mapped to the genome or the transcriptome. Many tools align (or pseudo-align) reads to the transcriptome (fasta file) and do quant on this; these include RSEM, Salmon and Kallisto (the latter two do their own alignment but I think RSEM relies on Bowtie et al to map for it). Others have the reads mapped to the genome (fasta file with GTF/GFF annotation); Cufflinks uses reads from Tophat or STAR mapped to the genome to then quant. Cufflinks then also does differential testing itself. There is some mix and matching with these tools but that gets complex.

Another note is that RSEM is quite old and does not have bootstrapping like Kallisto and Salmon support and if you input them into the R package Sleuth, Sleuth can use the bootstraps to estimate technical variation and gives a better idea of real biological differences in gene expression.

On another note, if you can use TPM for something, then use TPM. Sometimes you might need to input the counts but for reporting expression, it should be TPM. This blog post goes through what all the different values (counts, FPKM and TPM) mean and is really worth reading to get a general idea of what they all are.

https://haroldpimentel.wordpress.com/2014/05/08/what-the-fpkm-a-review-rna-seq-expression-units/

ADD REPLY • link 7.7 years ago by james.lloyd ▴ 100

0

Entering edit mode

Thank you for the link

ADD REPLY • link 7.7 years ago by sumithra.das ▴ 10

0

Entering edit mode

sorry but can you explain wat do you mean by bootstrapping in kallisto/salmon

ADD REPLY • link 7.7 years ago by sumithra.das ▴ 10

0

Entering edit mode

Sorry for not responding sooner. As I understand it, bootstrapping is when a subset of the data (reads) are analysed to estimate the technical variation in quantification level. You can read more in Lior Pachter's blog here

https://liorpachter.wordpress.com/2015/05/10/near-optimal-rna-seq-quantification-with-kallisto/

ADD REPLY • link 7.5 years ago by james.lloyd ▴ 100