Hi everyone,
I am working on mouse heart RNAseq data (paired-end) and using GSNAP aligner. Here is my commandline:
output of GSNAP: sample.bam
On the output of gsnap I use samtools fixmate to fix mate information
samtools fixmate sample.bam sample_unsortfix.bam
run htseq-count on sample_unsortfix.bam
(using -f 0x0002
for fetching only alignments that are mapped in proper pair)
samtools view -f 0x0002 sample_unsortfix.bam | htseq-count -s reverse - merged.gtf > sample.counts
On running this command I get the following error:
Error occured when processing SAM input (line 1630237):
("Malformed SAM line: MRNM == '*' although flag bit &0x0008 cleared", 'line 1630237')
[Exception type: ValueError, raised in _HTSeq.pyx:1323]
So I printed out the line 1630237 where error is caused:
samtools view -f 0x0002 sample_unsortfix.bam | awk "NR==1630237{print;exit}"
Line 1630237:
D5N1JJN1:213:C4HW1ACXX:6:1101:16916:99540 355 chr2 174330382 2 1S53M3777N10M35H * 0 128 GGCTGCAGAAGGACAAGCAGGTCTACCGGGCCACGCACCGCCTGCTGCTGCTGGGTGCTGGAGA CCFFFFFHHHHHGIIJJJIJJJJJJJJJJJIJJJJJJJJJIJJHHHHHFFFFFEDECBDDDBDC RG:Z:heart_adultmale MD:Z:63 NH:i:3 HI:i:2 NM:i:0 SM:i:2 XQ:i:40 X2:i:40 XO:Z:CM XS:A:-
To remove these lines, I changed my command line to this:
samtools view -f 0x0002 sample_unsortfix.bam | awk '!/\\t\\*\\t/' - | htseq-count -s reverse - merged.gtf > sample.counts
It worked before but now it is not working.
Does anybody know why am I getting this error and how can this be fixed?
UPDATE: I added an extra \
in the awk command before. I removed it and now it is working. This is the new command line that works and fixes the error:
samtools view -f 0x0002 sample_unsortfix.bam | awk '!/\t\*\t/' - | htseq-count -s reverse - merged.gtf > sample.counts
(I have added this as an answer too)
Line 1630237 from
samtools view sample_unsortfix.bam
won't be the same as that line fromsamtools view -f 0x2 sample_unsortfix.bam
.Oh sorry! Let me check and update the question. Thanks for pointing that out.