I am new to RNA seq analysis, and I am at the stage of sorting my bam file after alignment. I need to further use this sorted bam file as an input for HTseq counting. I understand that I can perform the sorting by name or by coordinates according to the samtools documentation. Could someone please explain how to decide between sorting by name or by coordinate? I read that the default is by coordinate, and that the coordinate sorted bam file will work as an input for "samtools index" command, as well as for HTseq counting command. So it seems like coordinate sorted file is the best. However, is there any case where a name sorted file is better? What does one need to consider? Thank you for your help.
However, is there any case where a name sorted file is better?
when one need to quickly retrieve a set of paired reads by their name.
Hi, Suppose I have a bam file and a vcf file containing variant calling result. I want to extract only reads with their mate that support variant allele in the vcf. It would be nice to get those reads in bam format.
I would like to retain the actual SEQ and QUAL fields in the SAM/BAM file after it has been filtered.