Question: HT-seq - what is ideal?
Hi guys,

I ran a HT-seq command and I would like to cross check with you guys. wonder what output is ideal? Below is the command

htseq-count -f bam --idattr=gene -r pos /home/user/scratch60/STARresults/SRR7059136Aligned.sortedByCoord.out.bam /home/user/scratch60/NCBI_files/GCF_000001405.26_GRCh38_genomic.gff >/home/user/scratch60/HTseq_annotation/annotated_SRR7059136.txt

and the output is this

12600000 SAM alignment record pairs processed.
Warning: Mate pairing was ambiguous for 22805 records; mate key for first such record: ('SRR7059136.1152992', 'first', 'NC_000001.11', 135867, 'NC_000001.11', 493007, 357290).
12621898 SAM alignment pairs processed.

My questions are:

  1. Should I be concerned about missing mate encountered warnings? Is there an ideal number one should be aiming for?
  2. Am I right to run -r pos because my STAR command included --outSAMtype BAM SortedByCoordinate? I'm trying to understand the logic of this, if someone can explain it, it would be much appreciated!

Thanks guys!

  1. In an ideal world it'd be 0, but losing <1% won't affect anything.
  2. Sure, though you can just have STAR quantify things for your and not have to wait as long.

In general HTSeq-count isn't much used these days because it's quite slow. Either have STAR do the counting for you or use featureCounts and you'll get the results quicker.

Thanks Devon! That makes sense.

