Question

Aligning Rna-Seq To Repetitive Line-1 Elements

2

Entering edit mode

11.3 years ago

rd ▴ 20

Hello,

I would like to check whether L1 repetitive elements are modulated between my treatment and control via RNA-Seq. I have read several papers that have done so but their methods are not clear enough for me as a biologist to reproduce. I have analyzed my data using the Tuxedo suite and have analyzed the "unique" genes. I am wondering what modifications have to be taken into account to accommodate the repetitive nature of LINEs. 1- I have an understanding that some aligners filter out reads that map to several places in the genome. Are my LINE reads being filtered out by tophat? 2- If so, how do I align them? 3- when using cufflinks, intead of using RefSeq, I am assuming I would have to use a repetitive element model?

Thank you!

rna-seq • 8.3k views

ADD COMMENT • link updated 7.8 years ago by ghv8 • 0 • written 11.3 years ago by rd ▴ 20

0

Entering edit mode

11.3 years ago

Matt Shirley 10k

You would probably want to restrict your analysis to LINE elements regions that have sites variant with respect to the consensus LINE sequence. That way you would consider reads that map uniquely to your region of interest.

ADD COMMENT • link updated 4.3 years ago by Ram 43k • written 11.3 years ago by Matt Shirley 10k

0

Entering edit mode

9.0 years ago

Manvendra Singh ★ 2.2k

I would not suggest to go for tophat, because there are hardly any splice variants for L1 elements. so tophat would also map reads on chimerae and exonized L1 elements

I would go for bowtie

I would allow many mismatches but one allignment per read with --best option

It always works for me

ADD COMMENT • link 9.0 years ago by Manvendra Singh ★ 2.2k

0

Entering edit mode

7.8 years ago

ghv8 • 0

Hi All, It is my understanding that it is error-prone to map repetitive sequences. Is this something that tophat2 can take care of by simply tweaking the parameters? for example, I could set the -N/--read-mismatches to 0. Or is there something more 'fancy' that needs to be done?

Also, is it worth it to pay more and do paired-end sequencing to be more accurate in mapping to repetitive regions? Thanks for the advice -G

ADD COMMENT • link 7.8 years ago by ghv8 • 0

0

Entering edit mode

Please make a new post to ask this question (and consider deleting this post). That will give you a much better chance of getting a response.

ADD REPLY • link 7.8 years ago by SES 8.6k

Ram · Accepted Answer · 2013-01-07

You might get some ideas from a solution described in a paper from Peter Park's lab, Estimating enrichment of repetitive elements from high-throughput sequence data which has an online tool available, with source code (Repeat Enrichment Estimator). It appears to be for ChIP-seq though; not sure how adaptable it would be for RNA-seq.

Edit on Apr 27 2015:

I recently had to revisit this problem and found a useful tool that didn't exist at the time of the original answer:

RepEnrich (paper, github)

Ram · Accepted Answer · 2013-01-04

If I understand your question correctly, you want to identify the expression levels of of LINE-1 repeats in your RNA-Seq samples? If that is the case follow these instructions.

Make a GTF format file your repeat elements or download them from UCSC/Galaxy and run
tophat -G LINE1-repeats.gtf -o treat-rnaseq yourgenome_ebwt_base treat-rnaseq.fastq
tophat -G LINE1-repeats.gtf -o control-rnaseq yourgenome_ebwt_base control-rnaseq.fastq
cuffdiff -G LINE1-repeats.gtf treat-rnaseq.bam control_rnaseq.bam

Step 2 and 3 do the map the RNA-Seq reads to your repeat elements in the genome.

Step 4 calculates the differential expression of your repeat elements in your treatment and control.