I am new to RNASeq data.
Currently I am looking for repeats in RNASeq data. I am very simply looking for the presence of repeats from an individual sample (not caring where they come from). I do this using the method here:
This basically tells tophat to align to the reference I've given it (which it builds from the GTF file of repeats) and then failing that align to the human genome.
I would like to get the names of the repeats it aligns to but obviously the output is a bam file. I then convert this to a bed file (bamtobed from samtools) and then do a bedtools closest against a bed file of repeats to get the names ( with distance=0).
This all seems a bit long winded. Is there an easier way to get the names of repeats (or on any genes for the benefit of others) without the samtools-bedtools bit?