Question: Quantifying repetitive elements from RNA-seq (hisat2 or Salmon)
gravatar for ywchen
9 months ago by
ywchen0 wrote:

Hi everyone: I am interested in quantifying change in repetitive elements ( LTR here) transcription after treatment and I come up with following ideas:

  1. Directly map RNA-seq data to genome with hisat2 and quantify with repetitive element annotation from Repeatmasker, followed by collecting elements from the same class to compare them. But I am not sure about how to set up maximum allowed multiple alignment value (For most RNA-seq it requires to be uniquely mapped but the value would be much higher since repetitive elements happens lots of times).
  2. I got consensus repetitive element sequence fastq from Repbase, is it possible to view these repeat elements as "transcriptome" and use salmon (or similar transcriptome based tools) to map reads on it?

I am not familiar with this area and I would appreciate any suggestions . Thanks for help!

Update: Since I am only interested in LTR, I have modified the question. It looks possible to extract uniquely mapped reads and combine with Repeatmasker annotation. Direct quantification looks like will fail since repetitive elements are abundant in mRNA.

ADD COMMENTlink modified 11 weeks ago by pdeinin10 • written 9 months ago by ywchen0
gravatar for Devon Ryan
9 months ago by
Devon Ryan87k
Freiburg, Germany
Devon Ryan87k wrote:

You can use STAR and then put though TEtranscript. You can allow multiple entries with STAR and it generally produces better alignments than hisat2 (in my experience at least). Our group that works on repeat elements uses this method.

While you can use the consensus repeat sequence, you end up biasing things for how close the expressed repeats are to the consensus. Consensus sequences are mostly useful for showing a profile over a single instance where you can label structure easily.

ADD COMMENTlink modified 9 months ago • written 9 months ago by Devon Ryan87k

Thanks for your answer. I'm concerned about memory usage by STAR and maybe I will start hisat2 with -k 100 to see if it can be used by TEtranscript tool.

ADD REPLYlink written 9 months ago by ywchen0
gravatar for pdeinin
11 weeks ago by
pdeinin10 wrote:

There is a very important distinction here. Are you interested in transcripts generated by the repetitive elements or that include the repetitive elements. Most repetitive elements are simply passengers in longer RNAs that are expressing genes etc... Only a small portion actually come from the promoters of the repetitive elements. There are a lot of approaches that work generically on repetitive elements, but to understand the transcripts relevant to the life cycle of the repetitive elements takes a very careful approach that maps to specific loci and eliminates the background. This is best described in our paper A comprehensive approach to expression of L1 loci Prescott Deininger Maria E. Morales Travis B. White Melody Baddoo Dale J. Hedges Geraldine Servant Sudesh Srivastav Madison E. Smither Monica Concha Dawn L. DeHaro Erik K. Flemington Victoria P. Belancio Nucleic Acids Research, Volume 45, Issue 5, 17 March 2017, Pages e31,

Because of the high background from repeats in genes, our approach only focuses on reads that map to one genomic locus better than any other and eliminates multi-mapped reads. We have found it is also important to have stranded RNA-Seq data and it is better if it comes from cytoplasmic RNA to eliminate as much unspliced material as possible.

ADD COMMENTlink written 11 weeks ago by pdeinin10
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1832 users visited in the last hour