Question: Get 3'UTR sequence using RNAseq read coverage
gravatar for joangibert14
8 weeks ago by
joangibert140 wrote:


I have some bam files aligned to mouse reference genome. I want to analyze some motif enrichment in the 3'UTR of this samples but I realize that some mRNA have longer 3'UTR than the reference genome. I double checked this an is something reported in platelets that is what I'm working.

Is there any way to get the full 3'UTR using, for example coverage and expand the preexistent 3'UTR bed file? I was thinking on a kind of loop that check coverage every 100bp and, if coverage is > threshold, append the sequence... Any other ideas? Actually I am not quite sure that the previous idea would work...


rna-seq • 258 views
ADD COMMENTlink modified 7 weeks ago by kristoffer.vittingseerup1.7k • written 8 weeks ago by joangibert140

I think what you mean is that the actual 3'UTR is longer than the annotated in the reference file (GTF). Better use a dedicated tool, such as stringtie to perform reference-guided transcriptome assembly and then check if you indeed find transcripts that extend beyond the annotated 3'UTRs. Still, keep in mind that for motif enrichment your query sequences should not be excessively long to avoid motif matches by chance. Homer for example uses 200bp as a default window.

ADD REPLYlink modified 8 weeks ago • written 8 weeks ago by ATpoint15k

Or use a tool that does comparative analysis: DREAM will allow you to put in a set of negative sequences and find things that are enriched in your postive set vs the negative set, free of length confounding.

ADD REPLYlink written 8 weeks ago by i.sudbery4.3k

I tried stringtie with gencode gtf for mouse. The file I got does not have any line with UTR feature. The gtf contains this information and also when I check the bam files in IGV I see reads in the UTRs. Any idea what is going on?

ADD REPLYlink written 8 weeks ago by joangibert140

stringtie has no way of distinguishing between coding and non-coding exons. Thus is can identify where the exons are, but not which bit of them are coding/non-coding. One solution would be to select transcripts from your new assembly that are identical to reference transcripts, except for the extended first/last exon, and copy across the CDS annotation from the reference transcript.

ADD REPLYlink written 8 weeks ago by i.sudbery4.3k

Uhm, the feature column in the output should. At least is what it claims in the documentation, right?

Anyway, I am not sure what you are referring to. Is something like comparing the two gtf files (the mouse reference and the stringtie output) and get the lines from the second that are not in the first?


ADD REPLYlink written 8 weeks ago by joangibert140

I think you are miss-reading the manual. To me the bit I think you are reffering to is just a cut and paste job of the definition of the GTF format. I've never seen anything come out of stringtie other than transcript and exon lines. How would it know where the CDS/UTRs were?

So, yes I was talking about comparing the GTF files, but you can't do it on a line by line format, you'd need to consider a whole transcript at once, which is multiple lines and compare transcripts by checking their intron chains. You might take a look at cuffcompare which will classify novel transcripts according to how they match known transcripts.

ADD REPLYlink written 8 weeks ago by i.sudbery4.3k
gravatar for kristoffer.vittingseerup
7 weeks ago by
European Union
kristoffer.vittingseerup1.7k wrote:

If you have StringTie output IsoformSwitchAnalyzeR can help you out. It can predict ORF (with high accuracy) of the transcripts whereby you also can get the 3'UTRs. From this it is quite easy to extract both the genomic cooridnates - and even easier to extract the corresponding genomic sequence.

Furthermore if you have conditions IsoformSwitchAnalyzeR cab analyse your data for systematic differences in 3'UTR length between conditions - both in individual genes and on a genome wide level.

ADD COMMENTlink written 7 weeks ago by kristoffer.vittingseerup1.7k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 919 users visited in the last hour