Question: BWA and Samtools indexing
1
gravatar for lcc1844
4.6 years ago by
lcc184430
United Kingdom
lcc184430 wrote:

Is anyone able to explain to me the differences between using BWA to index your reference genome for aligning a FASTQ file compared to indexing the reference using the faidx command in Samtools after you've converted a sam file to bam format. I have used some commands with success in Samtools but am actually struggling to understand what the formatting steps do. Specifically, I don't understand what .fai files are and what sorting and indexing the .bam file does. 

If any is able to help me understand I would be very grateful. 

Many thanks 

 

alignment • 3.2k views
ADD COMMENTlink modified 4.6 years ago • written 4.6 years ago by lcc184430

Thank you for directing me to the previous post. 

I would still like to learn the purpose for this indexing. Why is it necessary to make a ref.fai file? 

I have used BWA to make a SAM file then used this command in samtools to create a bam file: 

samtools view -bS aln.sam > aln.bam

I have then seen protocols which start from this point by making the ref.fai file then converting .sam to .bam as follows: 

samtools import hg19.fa.fai aln.sam aln.bam  

Is this normal? As I thought the conversion had been performed, so what are the steps with .fai files for? 

The protocol then sorts and indexes the bam file. Can the sorting and indexing be done following the first sam to bam conversion? 

Thank you 

ADD REPLYlink written 4.6 years ago by lcc184430

You only need to index a fasta file if you need to random access to sequence that's in the file. Otherwise, it serves little purpose.

The instances you've seen with samtools import are incredibly old and should not be used. Ignore them. The purpose of the fai file in those cases was to act as a substitute for a possibly missing header in the SAM file. Unless your file is missing a header, then there's absolutely no need to include the fasta index (btw, the samtools view version of that is the -t option).

ADD REPLYlink modified 4.6 years ago • written 4.6 years ago by Devon Ryan91k

BTW, you can just pipe everything together:

samtools view -uS aln.sam | samtools sort -o - sorted_file_prefix

You can also pipe the output of your aligner to that to avoid the useless SAM file altogether.
 

ADD REPLYlink written 4.6 years ago by Devon Ryan91k
0
gravatar for Devon Ryan
4.6 years ago by
Devon Ryan91k
Freiburg, Germany
Devon Ryan91k wrote:

The two indices have absolutely nothing to do with each other. This was previously addressed here: Is The Bwa Reference Indexing The Same Thing That Fasta Indexing With Samtools?

ADD COMMENTlink written 4.6 years ago by Devon Ryan91k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2260 users visited in the last hour