variant calling after mapping via bwa mem
4.8 years ago
reza ▴ 270

After mapping to reference using "bwa mem", downstream analysis in my project are variant calling using samtools and CNV detection using CNV-seq. In your opinion, default setting in bwa is proper for my goal? i must use -M in "bwa mem" (I want to mark duplicates via Picard after mapping to reference) ?

this project is my first NGS analysis and need to your kindly helps

next-gen snp alignment • 2.2k views
You need to learn how to use google. This is not exactly rocket science and you can definitely find guidance on the internet, for example, this htslib workflow page and this samtools page. If you have more specific questions you can definitely ask them here.

i can use google and i know where i find programs manual, i am learning bioinformatic and i need the experience of bioinformatician more than programs manual. Here, there are people who, regardless of the question and questioner level, just try to respond to questioner and help her/him. (forgive me for my weak English, because it is not my maternal language).

So instead of going through a manual, your prefer that someone here spends time to type it out for you again. That's not how biostars works.

This is a "blinded" question - you are not telling us anything about your experiment or goals. What is the organism? What sequencing platform did you use? What depth did you sequence to, was it PCR-amplified, etc. It's not really possibly to help with no knowledge of the situation.

4.7 years ago
dyollluap ▴ 310

For most standard bwa alignments it is best to run with the default bwa settings unless you have a specific requirement and understand the parameters you're tinkering.

4.7 years ago
kirannbishwa01 ★ 1.3k

Hi @ reza

I think it would benefit you more if you could use the GATK pipeline.

Select the best practices and then go through the workflow-documentation. Also, remember GATK is an empirical method designed with human genome in mind. But, you may need to deviate from the pipeline in terms of parameters and steps, depending upon what your goal is. Also, remember to read the discussion, Q/A and comments to do what you need to do.

Using a empirical pipeline gives you some easiness, but at the same time everyone has different goals; so results you get should be looked with scrutiny - I mean you need to doubt your results you get using empirical pipelines - in biology and genetics anything can happen; that's the main point of doing biology and using bioinformatics.

:)

