Question: Aligning Rna-Seq To Repetitive Line-1 Elements
gravatar for rd
6.7 years ago by
rd20 wrote:


I would like to check whether L1 repetitive elements are modulated between my treatment and control via RNA-Seq. I have read several papers that have done so but their methods are not clear enough for me as a biologist to reproduce. I have analyzed my data using the Tuxedo suite and have analyzed the "unique" genes. I am wondering what modifications have to be taken into account to accommodate the repetitive nature of LINEs. 1- I have an understanding that some aligners filter out reads that map to several places in the genome. Are my LINE reads being filtered out by tophat? 2- If so, how do I align them? 3- when using cufflinks, intead of using RefSeq, I am assuming I would have to use a repetitive element model?

Thank you!

rna-seq • 5.7k views
ADD COMMENTlink modified 3.2 years ago by ghv80 • written 6.7 years ago by rd20
gravatar for Ryan Dale
6.7 years ago by
Ryan Dale4.8k
Bethesda, MD
Ryan Dale4.8k wrote:

You might get some ideas from a solution described in a paper from Peter Park's lab, Estimating enrichment of repetitive elements from high-throughput sequence data which has an online tool available, with source code (Repeat Enrichment Estimator). It appears to be for ChIP-seq though; not sure how adaptable it would be for RNA-seq.


Edit on Apr 27 2015:

I recently had to revisit this problem and found a useful tool that didn't exist at the time of the original answer:

RepEnrich (paper, github)


ADD COMMENTlink modified 4.4 years ago • written 6.7 years ago by Ryan Dale4.8k
gravatar for biorepine
6.7 years ago by
biorepine1.4k wrote:

If I understand your question correctly, you want to identify the expression levels of of LINE-1 repeats in your RNA-Seq samples ? If that is the case follow these instructions.

1. Make a GTF format file your repeat elements or download them from UCSC/Galaxy and run
2. tophat -G LINE1-repeats.gtf -o treat-rnaseq yourgenome_ebwt_base treat-rnaseq.fastq
3. tophat -G LINE1-repeats.gtf -o control-rnaseq yourgenome_ebwt_base control-rnaseq.fastq
4. cuffdiff -G LINE1-repeats.gtf treat-rnaseq.bam control_rnaseq.bam

Step 2 and 3 do the map the RNA-Seq reads to your repeat elements in the genome. Step 4 calculates the differential expression of your repeat elements in your treatment and control.

ADD COMMENTlink written 6.7 years ago by biorepine1.4k
gravatar for Matt Shirley
6.7 years ago by
Matt Shirley9.1k
Cambridge, MA
Matt Shirley9.1k wrote:

You would probably want to restrict your analysis to LINE elements regions that have sites variant with respect to the consensus LINE sequence. That way you would consider reads that map uniquely to your region of interest.

ADD COMMENTlink written 6.7 years ago by Matt Shirley9.1k
gravatar for Manvendra Singh
4.4 years ago by
Manvendra Singh2.1k
Berlin, Germany
Manvendra Singh2.1k wrote:

I would not suggest to go for tophat, because there are hardly any splice variants for L1 elements. so tophat would also map reads on chimerae and exonized L1 elements

I would go for bowtie

I would allow many mismatches but one allignment per read with --best option

It always works for me

ADD COMMENTlink written 4.4 years ago by Manvendra Singh2.1k
gravatar for ghv8
3.2 years ago by
ghv80 wrote:

Hi All, It is my understanding that it is error-prone to map repetitive sequences. Is this something that tophat2 can take care of by simply tweaking the parameters? for example, I could set the -N/--read-mismatches to 0. Or is there something more 'fancy' that needs to be done?

Also, is it worth it to pay more and do paired-end sequencing to be more accurate in mapping to repetitive regions? Thanks for the advice -G

ADD COMMENTlink written 3.2 years ago by ghv80

Please make a new post to ask this question (and consider deleting this post). That will give you a much better chance of getting a response.

ADD REPLYlink written 3.2 years ago by SES8.2k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2679 users visited in the last hour