Question: Error HISAT2 output for HTseq-count
0
gravatar for nicolas.merienne
2.2 years ago by
nicolas.merienne0 wrote:

Dear all,

I am analyzing paired-end stranded RNAseq data obtained from parasite cells. I would be interested in using HISAT2 for the alignment, and HTseq-count to count the features of the aligned reads.

Except for the intron min/max size, I used the default parameters for HISAT2. After getting the sam file from the alignment (therefore containing both mapped and unmapped reads), I converted sam file to bam and sorted the reads by name using Picard tool:

java -jar ~/Documents/Softwares/picard.jar AddOrReplaceReadGroups INPUT=$FILE/TopHat_vs_Hisat/Hisat/Plasmodium/accepted_hits.sam OUTPUT=$FILE/TopHat_vs_Hisat/Hisat/Plasmodium/accepted_hits_name.bam SORT_ORDER=queryname RGID=1 RGLB=Pberghei RGPL=Illumina RGPU=NextSeq RGSM=Pber-GFP RGCN=IPP

Next, I just used HTseq-count to count the reads:

htseq-count -f bam -r name -s reverse -a 10 -t exon -i Parent -m union $FILE/TopHat_vs_Hisat/Hisat/Plasmodium/accepted_hits_name.bam $FILE/Genome/PlasmoDB-33_PbergheiANKA.gff > $FILE/TopHat_vs_Hisat/Hisat/Plasmodium/counts_hisat.txt

However, I got this error:

38235 GFF lines processed.
Error occured when processing SAM input (record #194 in file /Volumes/@home/Plasmodium_strains_RNAseq/Raw_WEHI/Plasmo-SPZ-RNAseq/tests/Aligned/TopHat/Pber1/TopHat_vs_Hisat/Hisat/Plasmodium/accepted_hits_name.bam):
  my_showwarning() takes from 3 to 5 positional arguments but 6 were given
  [Exception type: TypeError, raised in warnings.py:99]

I tried to sort the reads from sam file with samtools sort and also got this error. I also tried to remove the unmapped reads from the HISAT2 output (reads with mapq = 0) and it didn't work better. Finally, I used TopHat to align the reads and used the same pipeline (with Picard tools) and here, it works and counted correctly my features.

Does anyone has an idea of the problem with the reads aligned with HISAT2? Are they compatible with HTseq-count?

Thank you very much for any help.

Nicolas

hisat2 rna-seq htseq-count • 1.6k views
ADD COMMENTlink written 2.2 years ago by nicolas.merienne0

I converted sam file to bam and sorted the reads by name using Picard tool

If that was sorted on read names then you would need to co-ordinate sort the file instead. Use featureCounts instead of HTSeq-count. It can sort the BAM files as needed during counting.

ADD REPLYlink modified 2.2 years ago • written 2.2 years ago by genomax71k
1

I'd also recommend STAR for mapping and read summarization. In my experience it is faster and more accurate than HISAT, and produces the same count table as htseq in union mode.

ADD REPLYlink modified 2.2 years ago • written 2.2 years ago by grant.hovhannisyan1.7k

Thank you for your replies. I tested by coordinate sorting but also got an error. I found in others topics that paired-end reads would rather require a name sorting for HTseq-count.

I am currently reading the manual for featureCounts (and, based on what I read before, really consider to use it rather than HTseq). But I was also interested in understanding the origin of this problem with HTseq-count, if someone also encountered it before.

I will also have a look to STAR. Thanks

ADD REPLYlink modified 2.2 years ago • written 2.2 years ago by nicolas.merienne0

Did you look at the record marked to see what, if anything, was wrong with it?

194 in file /Volumes/@home/Plasmodium_strains_RNAseq/Raw_WEHI/Plasmo-SPZ-RNAseq/tests/Aligned/TopHat/Pber1/TopHat_vs_Hisat/Hisat/Plasmodium/accepted_hits_name.bam

ADD REPLYlink modified 2.2 years ago • written 2.2 years ago by genomax71k

Yes, I checked directly on the bam file, including or excluding the header lines (I don't know if it was counted as rows by HTseq-count). In both cases, it is just a read, without anything special. In both cases, it has a mapq = 1 (multiple alignments), so it should not be included in the count (because -a 10 for HTseq-count). But it was not the first read with multiple alignments in the file, so I don't know why it stopped at this one.

ADD REPLYlink written 2.2 years ago by nicolas.merienne0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2090 users visited in the last hour