Question: The confused result of samtools sort
0
gravatar for younglin113
4 weeks ago by
younglin11310
younglin11310 wrote:

Recently, I was doing an RNA-seq project, and I used spike-in control sequences in my experiment. So I mapped the reads to both human genome and spike-in sequences. And I successfully got the bam file, but when I tried to use the command samtools sort -o aa_sorted.bam aa.bam to sort the bam file according to the chromosome name, and got those confused order, here is the sorted bam file's header:

enter image description here

As you can see, The chromosomes marked with a red frame are placed in front of the chromosome 1. This result is really confusing. Can anyone help me here, thanks in advance.

ADD COMMENTlink modified 4 weeks ago by finswimmer11k • written 4 weeks ago by younglin11310
1
gravatar for Devon Ryan
4 weeks ago by
Devon Ryan89k
Freiburg, Germany
Devon Ryan89k wrote:

samtools sort doesn't change the order of the chromosomes, it just sorts the alignments according to them. There are many tools that require that the order of the chromosomes match between fasta and BAM files, so if you start playing around with that order you're likely to run into problems. Note that you CAN do this, but you'll need to do something like:

# make a new header with the chromosomes in the order you'd like
# name it "header"
cat header <(samtools view original_sorting.bam) | samtools sort -o desired_sorting.bam -
ADD COMMENTlink written 4 weeks ago by Devon Ryan89k

Oh, so it is. Thanks a lot. So I just need to change the header and it can sort as I want it to, is that what you mean ?

ADD REPLYlink written 4 weeks ago by younglin11310

Yes, but don't reheader the BAM file (this will completely screw things up), use something like I outlined where there's a SAM file produced and then piped into samtools sort.

ADD REPLYlink written 4 weeks ago by Devon Ryan89k
1
gravatar for finswimmer
4 weeks ago by
finswimmer11k
Germany
finswimmer11k wrote:

Hello,

samtools sort sort the reads by position for each contig. It doesn't sort the contig itself, because this is usually not necessary.

The order of the contig names you see, is given by the order of the contigs in the reference file you specify during alignment.

fin swimmer

ADD COMMENTlink written 4 weeks ago by finswimmer11k

Thanks a lot. It's really helped my doubts.

ADD REPLYlink written 4 weeks ago by younglin11310
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1090 users visited in the last hour