Question: Differential Expression based on Sequences from lncRNA
0
gravatar for Joel TM
3.9 years ago by
Joel TM50
Canada
Joel TM50 wrote:

Hi, first of all, thank you for existing ! I am somewhat new to bioinformatics but i've learned much in the last year. I am familiar with running RNAseq pipelines in order to get differentially expressed genes using DEseq, HTSeq, cufflink, EdgeR etc...BUT, I am facing something new; I have total RNAseq data and would like to see  if some long non-coding RNAs are differentially expressed in my patients. I have their positions and their sequences. But they are not "identified" in databases so they're not part of the gene reference file.

My question is: is it possible at all to get differential expression based off of sequences ?

Would I have to manually change my gene.gtf ?.. Any help would be welcome.

Thank you very much

J.

rna-seq sequence • 1.5k views
ADD COMMENTlink modified 3.9 years ago by geek_y9.1k • written 3.9 years ago by Joel TM50
1

You can create a bed file with lncRNA coordinates and count how many reads mapped to each lncRNA  using bedtools multicov and perform DESeq/edgeR analysis.

ADD REPLYlink modified 3.9 years ago • written 3.9 years ago by geek_y9.1k

Ok thank you, I'll be trying that asap.

[EDIT] Works like a charm. Thank you very much

ADD REPLYlink modified 3.9 years ago • written 3.9 years ago by Joel TM50
1
gravatar for geek_y
3.9 years ago by
geek_y9.1k
Barcelona/CRG/London/Imperial
geek_y9.1k wrote:

You can create a bed file with lncRNA coordinates and count how many reads mapped to each lncRNA  usingbedtools multicov and perform DESeq/edgeR analysis.

A small note: You should keep in mind that bedtools do not take care of paired-end data i.e it will count reads per region instead of fragments per region.

 

ADD COMMENTlink written 3.9 years ago by geek_y9.1k

Thanks for the detail! I read something about that too. I was too impatient so I tried it with the Tophat output (a single .bam file). The data we have is indeed paired ends though. I understand the results could be different from reality, but I don't know to what extent. Would utilizing only the forward strand be better ?

ADD REPLYlink written 3.9 years ago by Joel TM50

There would not be much difference. Its just that you need to keep in mind. Still you can create a dummy gtf file with your coordinates and use htseq-count. Something like:

chr1    source    lncRNA    100    200    lncRNA_id="chr1:100-200"
chr1    source    lncRNA    500    600    lncRNA_id="chr1:500-600"

Now try htseq-count with -i lncRNA_id.

If you know what exactly htseq is doing, you can create a dummy gff/gtf and use htseq-count to get fragment level counts.

 

ADD REPLYlink modified 3.9 years ago • written 3.9 years ago by geek_y9.1k

That is good info, thank you for all your help ! :)

ADD REPLYlink written 3.9 years ago by Joel TM50
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1387 users visited in the last hour