Question: samtools sorting and indexing
3
gravatar for ggman
15 months ago by
ggman70
United States
ggman70 wrote:

Hi friends,

I am attempting to sort my bam files that I obtained from my bowtie sam files. I am not indexing them appropriate according to this error I am receiving after creating my bam file.

random alignment retrieval only works for indexed BAM or CRAM files.

I understand I am suppose to index the file before sorting them.

    #creating the appropriate files
    samtools view -Sb sample.sam.pair > sample.pair
    samtools view -bt ~/bigdata/refgenome/genome.fa.fai - - | samtools sort sample.pair -o sample.pair.bam

 samtools view -Sb sample.sam.single > sample.single
 samtools view -bt ~/bigdata/refgenome/genome.fa.fai - - | samtools sort sample.single -o sample.single.bam

    #merge
    samtools merge sample.all.bam sample.pair.bam sample.single.bam -@ 2
    rm sample.pair sample.single

    #index the final bam
    samtools index sample.all.bam

Any help would be appreciated.

sort samtools index • 19k views
ADD COMMENTlink modified 15 months ago by John12k • written 15 months ago by ggman70
10
gravatar for John
15 months ago by
John12k
Germany
John12k wrote:

I think you're over-thinking things :)

You can only index BAM files on position, and only when the data is sorted by position to begin with (don't ask...) So to sort by position just do:

samtools sort my.sam > my_sorted.bam

Then index with

samtools index my_sorted.bam

It's as easy as that. If you want to merge the output files from bowtie do that as the very first step, because I don't think samtools performs any optimisations for merging sorted BAMs/SAMs. However, i'd also recommend against bowtie2 in favour of STAR or BWA-MEM, but that's just a personal preference at the end of the day.

ADD COMMENTlink written 15 months ago by John12k
5

With the latest samtools that command should be samtools sort -o sorted.bam initial.bam.

ADD REPLYlink written 14 months ago by genomax56k

Oh they changed the syntax to be explicit!? Finally :D

ADD REPLYlink written 14 months ago by John12k
1

would this take into account my .fai file?

ADD REPLYlink written 15 months ago by ggman70
2

You are still over-thinking, the fasta and bam indexes are two separate and independent things - you don't need one to have the other.

Indexing allows for efficient data access and retrieval. The fasta index (.fai) is used to access and retrieve subsets of the fasta sequence, and the bam index (.bai) to access and retrieve subsets of the bam file.

ADD REPLYlink written 15 months ago by h.mon19k
1

Oh my goodness.... Thank you both for explaining this to me. I really appreciate it! I only keep talking about my .fai file because my PI left me some code that I could base it off of and it has it on there but I couldn't understand how it was implemented. Thank you.

ADD REPLYlink written 14 months ago by ggman70

You're very welcome - if you run into any more complications please don't hesitate to open another question :)

ADD REPLYlink written 14 months ago by John12k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1049 users visited in the last hour