Question: Using BWA Index and i get 5 output files, but BWA-mem asks for just 1
0
gravatar for SaltedPork
3.5 years ago by
SaltedPork110
SaltedPork110 wrote:

I'm using 1000 genomes pilot data, I've downloaded 3 zipped files. And I then use all three in the same command so it looks like this:

bwa index input_1.fastq input_2.fastq input_3.fastq

This produces 5 files, 2 are binary and 3 are fastq again.

But the usage for BWA-mem asks that the index be specified as 1 file, what's going on here? thanks!

bwa bwa-index • 5.3k views
ADD COMMENTlink modified 20 months ago by mirzaabid0 • written 3.5 years ago by SaltedPork110

Hi, I was working to make an index file for bacterial genomes by using BWA command

bwa index -a bwtsw bacterial_genomes.fna

This command was working from last 4 days and was functional due to big database, suddenly due to power outage this command was interrupted. I want to continue this command from last point where it was working before power outage. Please help me regarding this issue.

ADD REPLYlink written 20 months ago by mirzaabid0

No you can't do that. You have to start over. There is no way around this.

ADD REPLYlink written 20 months ago by genomax80k
3
gravatar for genomax
3.5 years ago by
genomax80k
United States
genomax80k wrote:

One does not index fastq files. One would index the genome fasta that you want to search against using the fastq files.

Is that what are you trying to do?

ADD COMMENTlink written 3.5 years ago by genomax80k

I think so, when you say genome, are you referring to the human genome reference file. I have some 454 data, and a file with LINE-1 transduction data so I'm trying to in simple terms, find all the transduction sites in the 454 data, hence why i thought i would need to index that.

ADD REPLYlink written 3.5 years ago by SaltedPork110

A reference can be any fasta sequence. You would index that fasta file and then map your reads against that index. What format is LINE-1 transduction in? Is it a fasta file?

ADD REPLYlink written 3.5 years ago by genomax80k

its in fasta. When specifying the reference file in the bwa-mem command, do i use the original fasta, or the newly created 5 files?

ADD REPLYlink written 3.5 years ago by SaltedPork110
1

You will index first.

bwa index -p index_base my_fasta.fa

Then do bwa mem using the "basename" of the index files (which in this case would be index_base, use some other name that you like)

bwa mem [options] index_base file1.fastq
ADD REPLYlink written 3.5 years ago by genomax80k

Can the reference file for bwa-mem be one file with a list of fasta sequences (i.e. if I wanted to map to 10 diff protein-coding sequences) at once, instead of running 10 different bwa-mem sessions with each different sequence?

ADD REPLYlink written 6 months ago by DNAngel40

It can be a multi-fasta file with all 10 sequences in one file. If these sequences are very similar then make sure you keep multi-mapping of reads in mind (so you can recover all hits).

ADD REPLYlink written 6 months ago by genomax80k

Thank you for your response! Some sequences are similar but they differ in length. I have about 20 individual fastq single-end files for my 20 species. Normally if I run bwa mem for 1 CDS, I can obtain an mpileup file - so in a sense I should expect 10 different mpileup files for each line of CDS in my reference file. My test run hasn't ended yet and it seems to be taking a long time, so I think I need to rework my code a bit but when I tested it earlier for a ref.fa file with 2 lines of CDS, I ended up with 1 mpileup file that looked messed up (like the CDS merged together).

I am not sure if bwa mem is reading each line of CDS as a unique reference sequence, but is trying to map to all at once.

ADD REPLYlink written 6 months ago by DNAngel40
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1159 users visited in the last hour