I try to get https://github.com/elzbth/jitterbug running for matched tumor/normal WES data aligned with hg38 reference genome. I encounterd some problems:
Jitterbug requires a gff indicating MEl locations in the human genome. Unfortunately, jitterbug only comes with a hg19 verison of this gff. I thought about and tried different solutions:
A) I could go for a hg19--> hg38 liftover based on this jitterbug gff but it is unlikely that the thereby generated gff will cover all MEl locations identified in hg38. Only the locations change.
B) According to Transposable Elements positions in genome (GTF file) I could also either simply download a hg38 MEl from http://labshare.cshl.edu/shares/mhammelllab/www-data/TEtranscripts/TE_GTF/ . However, the hammelllab also has a separate download section http://hammelllab.labsites.cshl.edu/software/#TEtranscripts where I can download a MEl gff as well which differs in size from the one I could obtain from http://labshare.cshl.edu/shares/mhammelllab/www-data/TEtranscripts/TE_GTF/. This is confusing.
C) Download from UCSC. But in UCSC I see no way to directly access a MEl gff. It seems that I can only download known genes as a gff/gtf.
D) So, I wondered if I can generate such a MEl gff file by myself via Repeatmasker. According to Transform repeatmasker output into gff you can get a gff output directly from RepeatMasker. So do I simply run RepeatMaskere on the reference genome fasta I used for the alignment and thereby get the MEl_hg38.gff?
The second problem is that jitterbug requires a N_annot.gff3 file indicating locations of Ns in the reference genome for MEl call filtering. Can I obtain a file like that via RepeatMasker as well?
Thanks a lot for helping me out!