Question: BWA alignment using paired end reads from different lanes
1
gravatar for deepue
3.5 years ago by
deepue110
Finland
deepue110 wrote:

Hi,

I would need some help on to proceed for WES analysis. 

I observed from different posts in the forum that the paired end reads can be aligned separately and the resulting SAM files are merged for further analysis. I would like to know on how to consider the data from different lanes while doing alignment. 

Should i merge the data from different lanes for L{1,2}_R1.fq, L{1,2}_R2.fq into separate files or perform alignment for different read type(1,2) and different lanes specific data separately(4 times L1_R1.fq, L1_R2, L2_R1, L2_R2) and merge the SAM files for further analysis ? 

Please advise.

Thanks

bwa lane alignment paired wes • 4.3k views
ADD COMMENTlink modified 2.8 years ago by Chris Cole670 • written 3.5 years ago by deepue110

it's better to merge fastq and then perform alignment

ADD REPLYlink modified 3.5 years ago • written 3.5 years ago by arno.guille400

You mean, merge data from different lanes L{1,2} of Read1.fq into a single file, L{1,2} of Read2.fq into another file right ? 

Thanks

ADD REPLYlink modified 3.5 years ago • written 3.5 years ago by deepue110
1

exactly   

ADD REPLYlink written 3.5 years ago by arno.guille400

Hello deepue!

It appears that your post has been cross-posted to another site: http://seqanswers.com/forums/showthread.php?t=51448

This is typically not recommended as it runs the risk of annoying people in both communities.

ADD REPLYlink written 3.5 years ago by Pierre Lindenbaum112k

Thanks Pierre for your advise. I am trying to delete the other post and will take care of it from next time.

Could you please advise in this scenario ? Thank you !

ADD REPLYlink written 3.5 years ago by deepue110
2
gravatar for Pierre Lindenbaum
3.5 years ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum112k wrote:

not agree with arno.guille , it's faster to map+sort (L1_R1.fq+L1_R2.fq) and (L2_R1.fq+L2_R2.fq ) in *parallel* and then merge the sorted Bam files.

ADD COMMENTlink modified 3.5 years ago • written 3.5 years ago by Pierre Lindenbaum112k

Thank you Pierre.

I will do the BWA alignment 4 times for each Lane{1,2}_Read{1,2}.fq separately and then generate 2 .sam files for 2 different lanes. Sort the 2 sam files(Lane1, Lane2) separately and merge them into a single sam file. Please advise, if i am correct.

Sorry, I couldn't understand the *parallel* part. Could you please help me.

Thanks

ADD REPLYlink written 3.5 years ago by deepue110
2

parallel= processing two commands at the same time:  see http://en.wikipedia.org/wiki/Parallel_computing, `gnu-parallel` and/or `Makefile option -j`

yes, you can merge both sam/bam file with picard MergeSamFile.

 

ADD REPLYlink written 3.5 years ago by Pierre Lindenbaum112k

If you also need to do MarkDuplicates you can skip the merge and give multipe BAM files to that module.

ADD REPLYlink written 3.5 years ago by Zaag640

You're right it's faster, but in term of memory usage i'm not sure it is optimized. Of course it depends of your hardware specs. In my case, if i run 4 alignments on the same node i will get a nice segmentation fault. 

ADD REPLYlink written 3.5 years ago by arno.guille400

Hi

I have run 4 alignments one by one and they all completed. I am not sure whether it is successfully completed or not and the query is still open in the How to check whether 'bwa aln' succeeded ? . My second issue is, I have performed the 'bwa sampe' on the data from Lane1 which is completed and the data from Lane2 is aborted with Segmentation fault. As you have mentioned about this error in the above comment. Could you please provide more information on this to handle this error ?

Thanks.  

ADD REPLYlink modified 3.5 years ago • written 3.5 years ago by deepue110

You probably get a segmentation fault due to lack of memory. Let's say you have 16 Gb of memory and 4 CPU. Each alignment process takes 6Gb in memory. If you run 4 alignments in the same time, you will need 4*6=24Gb. Unfortunately i have no solution, except to increase the memory space or run your alignments one by one. But in the latter case, you should probably merge the fastq.

ADD REPLYlink written 3.5 years ago by arno.guille400
1
gravatar for abascalfederico
2.8 years ago by
abascalfederico1.0k
Spain
abascalfederico1.0k wrote:

I disagree with some of the answers: you should not merge the different FASTQs from different lanes before aligning them. If you do so, then you will loose "read group" (RG tag) information. I would align each lane separately, then add specific RG tags to each lane and then merge the aligned bams (respecting RG information). RG information is important for downstream analyses.

 

ADD COMMENTlink written 2.8 years ago by abascalfederico1.0k
0
gravatar for Chris Cole
2.8 years ago by
Chris Cole670
Scotland
Chris Cole670 wrote:

You don't have to use GATK for WES analysis, but you should at least read and understand the best practice workflow.

At the very least you should align each lane separately, remove duplicates and then merge. As abascalfederico correctly says you must use the RG (read group) tags to keep track of lane information.

ADD COMMENTlink written 2.8 years ago by Chris Cole670
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 759 users visited in the last hour