I am currently using samtools to sort my bam files by positions (as default), then I used htseq to obtain read counts. Initially, I got massive 'Mate records missing' warnings. Then, I realized that htseq assumed the files were sorted by name, so I included the '-r pos' option and re-run the htseq. Then, I got less 'Mate records missing' warnings but they are still there...So my question would be: 1. Is there a way I can totally eliminate the warnings? 2. Which of the following pipeline is better?
- samtool sort by name + htseq without -r pos
- samtool sort by position + htseq with -r pos
I referred to the developer's posts: https://github.com/simon-anders/htseq/issues/37 but I still couldn't figure out how I should improve the process properly.