This is a question for users of htseq-count; a program for counting reads into gene annotation.
I am using version 0.6.1p1. There is an option '--samout' that looks like a simple way to identify actually which reads are used in the counting. The manual definition is:
"write out all SAM alignment records into an output SAM file called SAMOUT, annotating each line with its feature assignment (as an optional field with tag 'XF')"
However, I am only getting a file with the XF tag, e.g.:
XF:Z:rab-11.1 XF:Z:rab-11.1 XF:Z:col-81 XF:Z:col-81
Is anyone familiar with this option and able to comment? Perhaps the idea is to concatenate this to the existing SAM.
I have confirmed that the input SAM file is the same length as the --samout output. Therefore, is the rest of the SAM line missing intentionally?
EDIT 2 (command line used - file names changed for brevity):
htseq-count --mode=union --stranded=no --type=exon --idattr=gene_id --format=bam --order=name --samout=out.sam in.bam genes.gtf > out.txt
Just found that this has already been covered in seqanswers 'http://seqanswers.com/forums/showthread.php?t=45600'. It does appear to be a bug when trying to use a BAM file directly.