Different option in running BWA-MEM
0
0
Entering edit mode
18 months ago

Hi, I'm comparing multiple pipeline, I found that multiple pipelines using BWA-MEM but with different option, and I don't understand all the meaning of its:

1, From CCDG: https://github.com/CCDG/Pipeline-Standardization/blob/master/PipelineStandard.md

bwa mem -Y

(follow by FixMateInformation of Picard, then MarkDuplicate and so on)

2, From GATK Best Practice - in workshop poster: https://drive.google.com/drive/folders/1Nh73FzKde203gUoxyR9CmTd1EcVDMCI5

bwa mem -M

(follow by merging with an unmapped bam file, then MarkDuplicate and so on)

3, Parabrick: https://www.nvidia.com/en-us/docs/parabricks/germline/

bwa mem

(then MarkDuplicate and so on)

From BWA MEM help option:

-Y            use soft clipping for supplementary alignments
-M            mark shorter split hits as secondary

But I don't know why CCDG use -Y with FixMateInformation (while others don't), and why GATK need the unmapped bam? Also, as the document of bwa mem said that -M is necessary for picard MarkDuplicate, both Parabrick and CCDG don't use that option.

Could someone please explain to me these difference? And which one is the most reliable one?

Thank you very much

Assembly alignment • 1.2k views
ADD COMMENT
2
Entering edit mode

When you use -Y on secondary alignments the sequence that aligned to the primary alignment won't get hard clipped but softclipped, in my case I exploited that for split reads when a rearrangement occurred in the middle of the read. This means the clipped sequence will be preserved in the bam line of the secondary alignment and the CIGAR operation is softclip instead of hardclip.

ADD REPLY
0
Entering edit mode

Thank you for clarification of the -Y flag, I'm comparing GATK best practice with Parabrick output, the former one misses some variants (sadly, important ones) so I'm quite worry about the quality of the alignment.

It's quite confusing, because it seems like the GATK best practice pipeline will have more information.

ADD REPLY
0
Entering edit mode

You are good to go with the default settings unless the specific downstream pipeline you want to run explicitely advises you to change them. The -M option was used some years ago when Picard tools was not fully able to deal with the way bwa marked split reads, but this is afaik now solved for quite some time. From what I know the flag is typically now used anymore. Never used -Y myself, as I said unless any tool advises you to do so I would leave it at default settings since tools assume a certain output.

ADD REPLY
0
Entering edit mode

thank you about your comment, so how's about the usage of unmapped bam file? AFAIK, only GATK Best Practice makes use of it.

ADD REPLY
0
Entering edit mode

Hello nguyenhy258!

It appears that your post has been cross-posted to another site: https://bioinformatics.stackexchange.com/questions/13328/different-options-in-running-bwa-mem

This is typically not recommended as it runs the risk of annoying people in both communities.

ADD REPLY

Login before adding your answer.

Traffic: 1378 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6