I have recently run a relatively obscure genome through RepeatMasker and RepeatModeler, and produced a bedfile whereby I have all the instances of a TE annotated on all scaffolds.
This is what is looks like:
scaffold1 4 209 LINE/Penelope 0 +
scaffold1 429 470 Unknown 0 +
scaffold1 1027 1083 Low_complexity 0 +
scaffold1 1444 1484 Low_complexity 0 +
scaffold1 1713 1803 Low_complexity 0 +
scaffold1 1900 1920 Simple_repeat 0 +
scaffold1 3464 3538 Simple_repeat 0 +
scaffold1 4994 5045 Simple_repeat 0 +
scaffold1 7516 7541 Simple_repeat 0 +
scaffold1 10543 10827 DNA/hAT-Charlie 0 -
scaffold1 10823 11007 DNA/hAT-Charlie 0 -
scaffold1 11430 11506 Low_complexity 0 +
scaffold1 11562 11701 LINE/Penelope 0 -
scaffold1 12150 12181 Low_complexity 0 +
I also have a transcriptome, which has the TSS, TES, and exon coordinates annotated. It looks like this:
SCAFFOLD TSS TSS
scaffold1 1206 7550 transcript_2.1 . + transcript . gene_id "transcript2.1"; transcript_id "transcript2.1";
scaffold1 1206 1257 transcript_2.1 . + exon . gene_id "transcript2.1"; transcript_id "transcript2.1";
scaffold1 6257 7550 transcript_2.1 . + exon . gene_id "transcript2.1"; transcript_id "transcript2.1";
If I wanted to annotate the transcriptome for these particular TEs, would it be okay to do a simple bedtools intersect between the annotated genome file and the transcriptome file?
Can there be a biological situation where you have a transcript which is a protein-coding gene but within one of its exons also has an annotation as an LTR for example?