HTSeq error: local variable 'NH' referenced before assignment
1
0
Entering edit mode
5.2 years ago
senowinski ▴ 30

I have some BAM files, which were trimmed using BBDUK:

java -ea -Xmx7174m -Xms7174m -cp jgi.BBDuk in=test.fastq.gz out=test.fastq.gz.trimmed_clean ref=polyA.fa.gz,truseq_rna.fa.gz k=13 ktrim=r useshortkmers=t mink=5 qtrim=r trimq=10 minlength=20

And aligned with STAR using the following example command:

STAR --genomeDir refgenome --readFilesIn test.fastq.gz.trimmed_clean --outFileNamePrefix test.fastq.gz.trimmed_clean_align --sjdbOverhang 73 --runThreadN 11 --outSAMtype BAM SortedByCoordinate  --twopassMode Basic --twopass1readsN -1 --outFilterScoreMinOverLread 0.5 --outFilterMatchNminOverLread 0.5

I sort and add on read groups (WTCHG_524616_7039_1) and after running ValidateSamFile I got a warning

java -jar /data/BCI-EvoCa2/jacob/Central/software/picard.jar ValidateSamFile I=output_test.sorted.bam MODE=SUMMARY

warning:

WARNING:MISSING_TAG_NM  7774993

which I fixed with:

samtools calmd -bAr test.sorted.bam ucsc.hg19.fasta > output_test.sorted.bam

However after when I run HTSeq to call counts using the following command:

python /data/BCI-EvoCa2/salpie/HTSeq-count-master/HTSeq/scripts/count.py -s no --format=sam output_test.sorted.bam /data/BCI-IBD/analysis/RNA/gencode.v19.annotation.gtf > test.sorted.sam.output

After the GTF file is read in I get the following error:

Error occured when processing SAM input (line 7461 of file output_test.sorted.sam):
  local variable 'NH' referenced before assignment
  [Exception type: UnboundLocalError, raised in count.py:201]

When I look convert the file to a sam and then rerun to find the line that's wrong in the sam file, it looks like this. Can anyone tell me what's wrong with it?

K00150:332:HVM5TBBXX:7:2124:9303:4532   16  chrM    221 255 43M10439N16M    *   0   0   GCTTGTAGGACATAATAATAACAATTGAATGTCTGCACAGCCCTAGTCTCAATCTCCAA FJJJFA7JJAJJJFFJJJJFJFFFJJFFFJJJFFJJJFFFFJJJJJJFF<-AJFF7<<A RG:Z:WTCHG_524616_7039_1    NH:i:1  HI:i:1  nM:i:1  AS:i:56 NM:i:1  MD:Z:42G16

Other lines look like this (before and after)

Line above

K00150:332:HVM5TBBXX:7:2124:19573:1930  16  chrM    221 255 52M *   0   0   GCTTGTAGGACATAATAATAACAATTGAATGTCTGCACAGCCGCTTTCCACA    EJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJFJJJJJJJJJJFFFAA    RG:Z:WTCHG_524616_7039_1    NH:i:1  HI:i:1  nM:i:0  AS:i:51 ZQ:Z:E@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@   NM:i:0  MD:Z:52

Line below

K00150:332:HVM5TBBXX:7:2124:11353:4532  16  chrM    221 255 56M *   0   0   GCTTGTAGGACATAATAATAACAATTGAATGTCTGCACAACCGCTTCCCACACAGA    EJJJJJFFJJJJJJJJJJJJJFJJJJJJFJJJJJJFFJ::JJJJ2/-/2FFAF<AA    RG:Z:WTCHG_524616_7039_1    NH:i:1  HI:i:1  nM:i:2  AS:i:51 ZQ:Z:E@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@PL@@@@X[][X@@@@@@@   NM:i:2  MD:Z:39G6T9

The 4th column in the line at fault reads "43M10439N16M" while the other okay lines read "52M " and "56M " is this causing the error and does anyone know how to fix this?

Thank you!

HTSeq RNA-Seq • 1.1k views
ADD COMMENT
1
Entering edit mode
5.2 years ago

This appears to be a bug in the version of HTSeq you're using. Your options are one of the following:

  1. See if there's a newer version and upgrade (if not, please report this bug to the author).
  2. Use a different tool, such as featureCounts from the subread package (this is much faster than htseq-count).
  3. Rerun STAR using quant mode, which can produce BAM files and counts at the same time.

I personally prefer option (2), but to each their own.

ADD COMMENT

Login before adding your answer.

Traffic: 2005 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6