Hisat2 splice sites extract blank files
2.8 years ago

Hi everybody,

I was trying to extract splices sites of my -gtf file but I had no success with neither "hisat2_splice_sites.py" nor "awk" command. For both options I just had a blank file as output. My final goal is use the splice sites forhisat2-build --ss --exon. For some reason, I could do that with human gtf files but HIV. I tried to download several files of many data site such as NCBI, UCSC genome Browser, Esembl and ENA. Someone could help with another data site for retrieve gff/gtf files, or even how to make a new one by myself? OBs: I downloaded gff files and then converted with gffread command

Could you provide one example of a gtf that doesn't work? And the commands you used?

Hi! Sure.! I used:

gffread -E hiv.gff -T  -o hiv.gtf #Conversion.

hisat2_extract_splice_sites.py hiv.gtf > splice_sites.txt     #Extract


These same commands worked perfectly with other human gff/gtf files and. I got that "splice_sitex.txt" as a blank file ( Zero bytes)

Awk Commands:

awk '{if ($3=="exon") {print$1"\t"$4-1"\t"$5-1}}' hiv.gtf > exonsFile.txt.
awk '{if ($3=="intron") {print$1"\t"$4-2"\t"$5}}' hiv.gtf > ssFile.txt


About gtf, I tested tons of them. Sending some of them below:

https://www.ncbi.nlm.nih.gov/nuccore/KY112585.1

https://www.ncbi.nlm.nih.gov/genome/genomes/10319? (this one contains I list of possibles assemblies)

2.8 years ago
h.mon 34k

The annotation you are using is from a virus, which has an extremely packed genome, and contains no introns. Thus, the ssFile.txt will be empty.

That's fair. So what is the solution? use a txt file made by myself with --known-splicesite-infile <path> ? Because, I think I am not supposed to have novel splice junctions or new junctions annotations if I do not provide any file like those ones.