Question: How to get gene names from tophat results
gravatar for sebastianzeki0
3.3 years ago by
United Kingdom
sebastianzeki0110 wrote:

I am new to RNASeq data.

Currently I am looking for repeats in RNASeq data. I am very simply looking for the presence of repeats from an individual sample (not caring where they come from). I do this using the method here:

Aligning Rna-Seq To Repetitive Line-1 Elements

This basically tells tophat to align to the reference I've given it (which it builds from the GTF file of repeats) and then failing that align to the human genome.

I would like to get the names of the repeats it aligns to but obviously the output is a bam file. I then convert this to a bed file (bamtobed from samtools) and then do a bedtools closest against a bed file of repeats to get the names ( with distance=0).

This all seems a bit long winded. Is there an easier way to get the names of repeats (or on any genes for the benefit of others) without the samtools-bedtools bit?

rna-seq rna next-gen tophat • 1.0k views
ADD COMMENTlink modified 3.3 years ago by mark.ziemann1.1k • written 3.3 years ago by sebastianzeki0110
gravatar for mark.ziemann
3.3 years ago by
Australia/Mebourne/Monash University
mark.ziemann1.1k wrote:

You can run repeatmasker on the reads directly, you will find that this is pretty slow, so you might need to limit your analysis to 1 million reads only.

Alternative method is to take the repeat library from RepeatMasker and then use BWA/Bowtie2 to map the reads to the "repeatome", this can be done in a few minutes once you get the repeat library (info here). 

I did a blog post on this a while back.

ADD COMMENTlink written 3.3 years ago by mark.ziemann1.1k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 635 users visited in the last hour