samtools variant calling workflow
1
2
Entering edit mode
5.5 years ago
mirza ▴ 180

Hi,

I am working on variant calling on fungal genomes. I have Illumina HiSeq reads. I am new to this and am following this workflow

Step 1: QC of raw reads performed using FastQC tool

Step 2: Preprocessing of raw reads performed using Trimmomatic-v0.36 tool

Step 3: QC of clean reads performed using FastQC tool

Step 4: Alignment to reference genome using bowtie2-v2.2.6 tool

Step 5: SAM to BAM (alignment files) conversion using samtools-v1.3.1

Step 6: Remove duplicates using sambamba-0.6.6 tool

Step 7: coordinate sorting of bam files with samtools

Step 8: Variant calling performed using samtools/bcftools

Step 9: variant filtering with bcftools

In this post Workflow Or Tutorial For Snp Calling? and other variant calling related resources I found 2 additional steps before variant calling i.e. local realignment and base quality recalibration.

1. Are these steps essential?
2. I found that these option are not available in samtools but in GATK. How can I perform these steps for my data?
variant calling samtools realignment • 3.0k views
2
Entering edit mode
5.5 years ago

Local realignment is not needed anymore with GATK latest pipeline using HaplotypeCaller. You may just forget it as it's utility is only in calling SNPs nearby Indels, and there is no definite guideline how much it helps when using with tools other than GATK.

Base quality re-calibration can still give you some advantages, as it claims to rectify the base call probabilities, which are used by almost all of the variant callers to give more confidence to the called variant. It is easy to do, provided you have GATK pipeline set-up at your end.

(TL;DR: See "Creating a recalibrated BAM" in above doc)

PS: Your reads must have RG-tags in bam-files to use most of the GATK protocols/tools. You may search my earlier post regarding that. PPS: If it looks too troublesome, you may ignore it too as this not an obligatory step.

0
Entering edit mode

Hi Santosh,

I have already completed upto Step7 of the work flow above using the tools mentioned and no I don't have GATK installed. What would you suggest in this case? I am supposed to use this work flow maybe coz they have been using this here.

0
Entering edit mode

You may proceed with your current pipeline, also because doing everything GATK way will need some time to learn. However, it is better to learn GATK in long-term as that is more or less standard now. You may start from here for GATK https://software.broadinstitute.org/gatk/best-practices/

0
Entering edit mode

I think I can add these 2 steps in my work flow (using gatk) before the 8th step i.e. actual variant calling. I read someone do it in some post here on Biostar. What would you suggest?

0
Entering edit mode

My opinion is that it will not harm you in any case. So you should try it if you could.

0
Entering edit mode

Thanks, I appreciate.

0
Entering edit mode

@Santosh Hi, GATK requires read group tag @RG. I have aligned my reads using bowie2 (without the use of RG tag, as I never required it earlier) and completed the 7 steps of my workflow. Now, I read that we can add RG using picard but the big question, what values I will give in the command? I have no clue. My bam files look like this,

\$ samtools view  brs11b_dr.bam | head -n 10

ST-E00251:108:HLFTVCCXX:4:1101:14722:2733   83  scaffold17  386238  42  150M    =   386080  -308    AGAGGTCCAAGGGGGTCATGTACATCGTCAAGCAACGCACACCAGGCGCGACATCCAAAAATCAAGACCAGAAATAGAAAAAGTAGATAAAAAAAATGAAAAGAACAGATCGTAACTATGACTGGCCTCCCCGCCTAGAAATAGACCCAA  KKKKKKFA7KFKFKKKKKKKKKKKKKKKKKKKKKKKK<KKFFF<KKKKKKKKKFKKKKKKKFFFKKKFAKKKKKFKKKKKKKKKKKFFKKKKKKKKKKFKKKKKK7FKKFFKKKKKKKKKKKKKKKFKKFKKKKKKKKKKKKKKKFFFAA  AS:i:-17    XN:i:0  XM:i:3  XO:i:0  XG:i:0  NM:i:3  MD:Z:12C13A73G49    YS:i:-5 YT:Z:CP
ST-E00251:108:HLFTVCCXX:4:1101:14722:2733   163 scaffold17  386080  42  150M    =   386238  308 ATGAAACAAGGTAATCAACTGAATTGTGTTGCTATTGTATTGAACAAAGATAGCTAAACCATTTACACTGAAAGCGCGTGAGTCATCCAAAAGACAAACCAAAGCCAAGACGTCAGTAGCGCATTATCAACCTAAAGCACAACATTTACA  AAFFFKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKFKKKKKKKKKKKKKKKKKFKKKKKKKKKAFFKKKFFFFKKKKKKKAFFKKKAFFKKKFFKKKFAAFKKKFKK7AFFKFFAFKKKKKKKKAFKKAFAAKAF<FFKKKKFF  AS:i:-5XN:i:0   XM:i:1  XO:i:0  XG:i:0  NM:i:1  MD:Z:88G61  YS:i:-17    YT:Z:CP
ST-E00251:108:HLFTVCCXX:4:1101:15595:2733   99  scaffold1   2536280 40  95M =   2536545 327 ATCCAAGACTATCTTAAGCGTATCGTCAAGTATACAAATGTTGAGGTTGGTCGGAATAGGATTCCGCAGGTGTTCTAAGGAGCTGACAACCACTT AAFFFKKKKKKKKKKKKKKKAFKKKKKKKKFKKKKFKKKFFAKKA,FKKKKKKKKAKKKKKKKKKAKKFFFKAKK<FKFFF,FKKKKKKKKKKFA AS:i:-5 XN:i:0  XM:i:1XO:i:0    XG:i:0  NM:i:1  MD:Z:20C74  YS:i:-17    YT:Z:CP
ST-E00251:108:HLFTVCCXX:4:1101:15595:2733   147 scaffold1   2536545 40  62M =   2536280 -327    CCCGGGTTAGTAATCCCATCTCTATCTTGATATGGCGAACTCACCCACGTTCGTACCCCTAG  FAAKKKKKKKKKF<KKKKKFKKFAFFKFF7KKKFFKKKKKKKKKFAKKKKKKKKKKKFFFAA  AS:i:-17    XN:i:0  XM:i:3  XO:i:0  XG:i:0  NM:i:3  MD:Z:51T2G2A4   YS:i:-5 YT:Z:CP


and so on...