Question: Mapping Large Fastq Files With Bwa
2
gravatar for Vikas Bansal
7.7 years ago by
Vikas Bansal2.4k
Berlin, Germany
Vikas Bansal2.4k wrote:

I have fastq files for 10 samples. For each sample, I have 2 fastq files (paired end) and average size of compressed fastq file is 4gb and uncompressed is 16gb. It means, I have 20 uncompressed fastq files of size 320gb. Now I want to do mapping using BWA. I have 10 folders containing 2 files each.

I want to know if it is possible to input compressed fastq files in BWA?

What method would you use to map all these files? (fast and easy)

Should I just split each file and then map it?

I have seen some posts like this and tutorial, but did not find any efficient solution and I think there are lot of people here who do this often. I would really appreciate your help.

mapping bwa • 11k views
ADD COMMENTlink written 7.7 years ago by Vikas Bansal2.4k

What compute resources do you have available? A cluster or a single machine?

ADD REPLYlink written 7.7 years ago by Sean Davis26k

I have a single machine with 32GB RAM. I was thinking to do mapping using "screen" (10 screens) at same time for all samples. Or should I do it one by one?

ADD REPLYlink written 7.7 years ago by Vikas Bansal2.4k
1

You are probably better off running one-at-a-time and using multiple threads (approximately as many threads as you have cores), but you may need to experiment. The point, of course, is to have all the cores busy all the time.

ADD REPLYlink written 7.7 years ago by Sean Davis26k

Thanks for your reply. Could you please give me some reason that why running one by one is better? I thought may be if I will run 10 screens, then I could do it for all samples at same time?

ADD REPLYlink written 7.7 years ago by Vikas Bansal2.4k
2

You could run 10 samples at once, each using 1 core, or run the samples one-at-a-time using 10 threads (or more) for each sample. The advantage of the second over the first is that the memory usage will be about 1/10 of the use of the first. The time to complete all 10 samples should be similar.

ADD REPLYlink written 7.7 years ago by Sean Davis26k

Thanks a lot. I will try it.

ADD REPLYlink written 7.7 years ago by Vikas Bansal2.4k
3
gravatar for Leonor Palmeira
7.7 years ago by
Leonor Palmeira3.7k
Li├Ęge, Belgium
Leonor Palmeira3.7k wrote:

[Edited]

Concerning handling compression in bwa, you should find your answer here : http://www.biostars.org/post/show/5474/bwa-index-on-all-human-grch37-sequences

Apart from that, 2Gb files is not that big, so you could process them separately (i.e. parallelization by data) which shouldn't take too long on a multi-thread machine.

ADD COMMENTlink modified 7.7 years ago • written 7.7 years ago by Leonor Palmeira3.7k

Thanks. For compressed fastq files, its clear now. Now I am looking for efficient technique as mentioned in my original post.

ADD REPLYlink written 7.7 years ago by Vikas Bansal2.4k

Thanks a lot Sean and Leonor.

ADD REPLYlink written 7.7 years ago by Vikas Bansal2.4k
0
gravatar for pinkiii1984v
7.7 years ago by
pinkiii1984v20
pinkiii1984v20 wrote:

I too work with compressed files and it is possible to use them with BWA.

ADD COMMENTlink written 7.7 years ago by pinkiii1984v20
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1457 users visited in the last hour