Using BWA Index and i get 5 output files, but BWA-mem asks for just 1
1
1
Entering edit mode
6.3 years ago
SaltedPork ▴ 170

I'm using 1000 genomes pilot data, I've downloaded 3 zipped files. And I then use all three in the same command so it looks like this:

bwa index input_1.fastq input_2.fastq input_3.fastq

This produces 5 files, 2 are binary and 3 are fastq again.

But the usage for BWA-mem asks that the index be specified as 1 file, what's going on here? thanks!

bwa BWA bwa-index • 8.4k views
0
Entering edit mode

Hi, I was working to make an index file for bacterial genomes by using BWA command

bwa index -a bwtsw bacterial_genomes.fna

This command was working from last 4 days and was functional due to big database, suddenly due to power outage this command was interrupted. I want to continue this command from last point where it was working before power outage. Please help me regarding this issue.

0
Entering edit mode

No you can't do that. You have to start over. There is no way around this.

3
Entering edit mode
6.3 years ago
GenoMax 125k

One does not index fastq files. One would index the genome fasta that you want to search against using the fastq files.

Is that what are you trying to do?

0
Entering edit mode

I think so, when you say genome, are you referring to the human genome reference file. I have some 454 data, and a file with LINE-1 transduction data so I'm trying to in simple terms, find all the transduction sites in the 454 data, hence why i thought i would need to index that.

0
Entering edit mode

A reference can be any fasta sequence. You would index that fasta file and then map your reads against that index. What format is LINE-1 transduction in? Is it a fasta file?

0
Entering edit mode

its in fasta. When specifying the reference file in the bwa-mem command, do i use the original fasta, or the newly created 5 files?

1
Entering edit mode

You will index first.

bwa index -p index_base my_fasta.fa


Then do bwa mem using the "basename" of the index files (which in this case would be index_base, use some other name that you like)

bwa mem [options] index_base file1.fastq

0
Entering edit mode

Can the reference file for bwa-mem be one file with a list of fasta sequences (i.e. if I wanted to map to 10 diff protein-coding sequences) at once, instead of running 10 different bwa-mem sessions with each different sequence?

0
Entering edit mode

It can be a multi-fasta file with all 10 sequences in one file. If these sequences are very similar then make sure you keep multi-mapping of reads in mind (so you can recover all hits).

0
Entering edit mode

Thank you for your response! Some sequences are similar but they differ in length. I have about 20 individual fastq single-end files for my 20 species. Normally if I run bwa mem for 1 CDS, I can obtain an mpileup file - so in a sense I should expect 10 different mpileup files for each line of CDS in my reference file. My test run hasn't ended yet and it seems to be taking a long time, so I think I need to rework my code a bit but when I tested it earlier for a ref.fa file with 2 lines of CDS, I ended up with 1 mpileup file that looked messed up (like the CDS merged together).

I am not sure if bwa mem is reading each line of CDS as a unique reference sequence, but is trying to map to all at once.