I was wondering if there is a way I can generate junctions.bed file from hisat2 just like the one in tophat.
Hisat2 documentation says use extract_splice_site.py python script but that gets splice site from gtf file, is there a way to know splice sites and no of reads associated with those sites using hisat2?
Easiest solution will be to use featureCounts and counts reads from exon-exon junctions by providing your aligned BAM files. Check manual page 35 for detailed information.
**-J (juncCounts)** Count the number of reads supporting each exon-exon junction.
Junctions are identified from those exon-spanning reads
(containing ‘N’ in CIGAR string)in input data. The output
result includes names of primary and secondary genes that
overlap at least one of the two splice sites of a junction. Only
one primary gene is reported, but there might be more than
one secondary gene reported. Secondary genes do not overlap
more splice sites than the primary gene. When the primary
and secondary genes overlap same number of splice sites, the
gene with the smallest leftmost base position is selected as
the primary gene. Also included in the output result are the
position information for the left splice site (‘Site1’) and the
right splice site (‘Site2’) of a junction. These include chromosome
name, coordinate and strand of the splice site. In
the last columns of the output, number of supporting reads is
provided for each junction for each library
Thanks, does it also help with reads spanning introns?
My aim to finally use DEXSeq...
I have no idea about reads spanning introns. But I guess you can find explanation in the manual.