Hi everyone
I want to use htseq-count to count the number of reads for the features i have but htseq-count says:
If you have paired-end data, you have to sort the SAM file by read name first.
How can i do that if i don't want to use msort.
Regards V
Hi everyone
I want to use htseq-count to count the number of reads for the features i have but htseq-count says:
If you have paired-end data, you have to sort the SAM file by read name first.
How can i do that if i don't want to use msort.
Regards V
Use samtools for sorting: samtools sort -n file.bam filesortedbyreadname
You can also use SortSam.jar from picardtools
java -jar /path_to_folder_picardtools/SortSam.jar INPUT=yourfile.sam OUTPUT=readSorted.sam SORT_ORDER=queryname VALIDATION_STRINGENCY=lenient
I have the same question, I have sorted the .bam file and convert it to .sam file : didnt work for ht seq, then I have sorted the .sam file and given for ht seq, didnt work..!! Do you have any other suggestions for sorting the sam file (paired end data) by read name?
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Does 'samtools sort -n' properly handle paired end data with multiple alignments per read (e.g. rna-seq reads aligned by TopHat)?
Hi Malachi, i am having exactly the same problem. Did you sort out how to deal with this?
For htseq-count it won't actually matter as multimapping reads will be ignored in any case (htseq-count looks at the NH:i: auxiliary tag).
You have mentioned that SAM file has to be sorted but in the answer you have mentioned a BAM file. ??? Is that a TYPO ???