I am analyzing paired-end stranded RNAseq data obtained from parasite cells. I would be interested in using HISAT2 for the alignment, and HTseq-count to count the features of the aligned reads.
Except for the intron min/max size, I used the default parameters for HISAT2. After getting the sam file from the alignment (therefore containing both mapped and unmapped reads), I converted sam file to bam and sorted the reads by name using Picard tool:
java -jar ~/Documents/Softwares/picard.jar AddOrReplaceReadGroups INPUT=$FILE/TopHat_vs_Hisat/Hisat/Plasmodium/accepted_hits.sam OUTPUT=$FILE/TopHat_vs_Hisat/Hisat/Plasmodium/accepted_hits_name.bam SORT_ORDER=queryname RGID=1 RGLB=Pberghei RGPL=Illumina RGPU=NextSeq RGSM=Pber-GFP RGCN=IPP
Next, I just used HTseq-count to count the reads:
htseq-count -f bam -r name -s reverse -a 10 -t exon -i Parent -m union $FILE/TopHat_vs_Hisat/Hisat/Plasmodium/accepted_hits_name.bam $FILE/Genome/PlasmoDB-33_PbergheiANKA.gff > $FILE/TopHat_vs_Hisat/Hisat/Plasmodium/counts_hisat.txt
However, I got this error:
38235 GFF lines processed. Error occured when processing SAM input (record #194 in file /Volumes/@home/Plasmodium_strains_RNAseq/Raw_WEHI/Plasmo-SPZ-RNAseq/tests/Aligned/TopHat/Pber1/TopHat_vs_Hisat/Hisat/Plasmodium/accepted_hits_name.bam): my_showwarning() takes from 3 to 5 positional arguments but 6 were given [Exception type: TypeError, raised in warnings.py:99]
I tried to sort the reads from sam file with samtools sort and also got this error. I also tried to remove the unmapped reads from the HISAT2 output (reads with mapq = 0) and it didn't work better. Finally, I used TopHat to align the reads and used the same pipeline (with Picard tools) and here, it works and counted correctly my features.
Does anyone has an idea of the problem with the reads aligned with HISAT2? Are they compatible with HTseq-count?
Thank you very much for any help.