Variants calling by bcftools
1
0
Entering edit mode
5.4 years ago
Mbillah ▴ 140

By this following command I want to detect SNPs and indel. Can any one check my step please? Alignment with mitochondria:

bwa mem -t 8 \
                 -R '@RG\tID:1\tLB:1\tPL:ILLUMINA\tSM:1' \
                 mt_reference.fasta data1.fq.gz data2.fq.gz > aln-mt.sam

Unmapped read:

samtools view -@ 12 -b -S -f 12 aln-mt.sam > unmapped.bam

Note: By this command, I remove mitochondria reads. If I use samtools flag 4, picard gives a mate unpaired error. So that I select samtools flag 12.

Mapped read:

samtools view -@ 12 -b -S -F 12 aln-mt.sam > mapped.bam

Note: By this command, I collect mitochondria reads. If I use samtools flag 4, picard gives a mate unpaired error. So that I select samtools flag 12.

BAM to Fastq

java -jar ~/Softwares/picard.jar SamToFastq \
                           I=unmapped.bam \
                           F=read_without_mt_1.fq \
                           F2=read_without_mt_2.fq

Alignment with Reads(Without mitochondria):

bwa mem -t 8 -R '@RG\tID:1\tLB:1\tPL:ILLUMINA\tSM:1' \
                           ref.fna \
                           read_without_mt_1.fq \
                           read_without_mt_2.fq >aln-pe.sam

Sort:

java -jar ~/Softwares/picard.jar SortSam \
                           I=aln-pe.sam \
                           O=sorted.bam \
                           SORT_ORDER=coordinate

Delete duplication:

java -jar ~/Softwares/picard.jar MarkDuplicates \
                           INPUT=sorted.bam \
                           OUTPUT=dedup.bam \
                           METRICS_FILE=dedup \
                           REMOVE_DUPLICATES=true \
                           CREATE_INDEX=true \
                           ASSUME_SORTED=true \
                           VALIDATION_STRINGENCY=LENIENT \
                           MAX_FILE_HANDLES=2000

Variants calling

bcftools mpileup -O b -f ref.fna \
                           -Q 20 -q 20 \
                           --annotate FORMAT/AD,FORMAT/DP \
                           --threads 4 dedup.bam | \
                           bcftools call -m -v -O z -o 1.vcf.gz -

Note : Now I am looking for filtering my snps and indel. please suggest the filtering parameter.

Count SNPs:

awk '! /\#/' variants.vcf | awk '{if(length($4) == 1 && length($5) == 1) print}' | wc -l

Count Indels:

awk '! /\#/' variants.vcf | awk '{if(length($4) > 1 || length($5) > 1) print}' | wc -l
SNP indel alignment • 2.2k views
ADD COMMENT
1
Entering edit mode

Hello Mbillah ,

could you please describe why you think that the two alignments are necessary? Why not just align to a reference that include mitochondria genome and "normal" genome?

fin swimmer

ADD REPLY
0
Entering edit mode

I think sometimes mitochondrial genome increase the read depth, so that I deleted it. Thank you

ADD REPLY
2
Entering edit mode
5.4 years ago
ATpoint 82k

There is no guarantee that removing a chromosome from the reference will cause the reads that in reality originate from that chromosome to be unmapped. In fact, the mitochondrial genome has quiet some homologs in the nuclear genome, so you only provoke false-positives. The reason is that the aligner still tries to find acceptable matches and may therefore consider the second best match (given the best match would be chrM) as the final match, which would be a false-positive here. My opinion on that is that you should include into the reference all the contigs that are available (excluding ALT contins, see C: Inclusion of ALT contigs and decoys in bwa alignment ) in the reference assembly. You can also notably shorten your code by using pipelines, see example code here: A: Help with .sam files _1,_2 from bowtie2 output into svDetect. The file you get when using this code snipped can directly go into bcftools. Duplicates are ignores by default if flagged properly (which the command does using samblaster).

ADD COMMENT

Login before adding your answer.

Traffic: 3443 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6