Question

Samtools consensus for WES files

0

Entering edit mode

4.9 years ago

sidjain • 0

Hi all,

I am new to bioinformatics analysis. I have a sorted BAM file which is WES. In order to get chromosome 1 from the given BAM file in fasta format, I use the following set of commands

samtools view -b file_name.bam chr1>chr1.bam

samtools mpileup -E -uf ~/hg38.fa chr1.bam > chr1.mpileup

bcftools call -mv -Oz chr1.mpileup>chr1.vcf.gz

tabix -p vcf chr1.vcf.gz

samtools faidx ~/hg38.fa chr1|bcftools consensus chr1.vcf.gz -o chr1.fa

In the obtained chr1.fa file, it seems that the non-exome parts of the sequence are also there which doesn't make sense. Further, given the human genome is diploid, shouldn't I be getting 2 fasta files??

Assembly Genome sequence alignment • 937 views

ADD COMMENT • link updated 4.9 years ago by swbarnes2 14k • written 4.9 years ago by sidjain • 0

score 0 · Answer 1 · 2019-06-18

0

Entering edit mode

4.9 years ago

swbarnes2 14k

In the obtained chr1.fa file, it seems that the non-exome parts of the sequence are also there which doesn't make sense.

I don't see a place where you filtered anything by exomic position or coverage, so I'm not sure why you would expect non-exomic regions to be filtered away.

Further, given the human genome is diploid, shouldn't I be getting 2 fasta files??

Ahh, no. That's not how it works. For a lot of reasons. bcftools consensus might be putting ambiguous letters in where your SNPs are, you'll have to look.

ADD COMMENT • link 4.9 years ago by swbarnes2 14k

0

Entering edit mode

Thanks. I see. So if the bam file is WES, the vcf file obtained will only have variants from the exome regions, however for the consensus command, we need to put a filter to remove non-exome regions. Am I correct?

ADD REPLY • link 4.9 years ago by sidjain • 0

1

Entering edit mode

You never get 100% reads on target, you will always have some reads that align to non-exome loci, and your software will try to call variants there. You can filter your the bam to only have reads that cross exons, of you could filter your vcf to only have exomic variants. I don't know if you can filter in the consensus command.

ADD REPLY • link 4.9 years ago by swbarnes2 14k