Question

How to identify monomorphic snp position

0

Entering edit mode

7.5 years ago

jeccy.J ▴ 60

Does anyone Help me out: How one could identify the monomorphic snps position from a vcf file? I used Parsnp alignment tool to generate VCF file ? Thanks advance.

snp • 6.7k views

ADD COMMENT • link updated 7.5 years ago by Jorge Amigo 14k • written 7.5 years ago by jeccy.J ▴ 60

0

Entering edit mode

what exactly do you mean with "monomorphic snps" positions? are you working with a multisample vcf file, and in that case do you want to look for positions that have the same alleles in all samples? or are you working with single sample vcf file, and you mean positions where the reference allele is the only one found? would you probably be looking for homozygous positions?

ADD REPLY • link 7.5 years ago by Jorge Amigo 14k

0

Entering edit mode

From my understanding Monomorphic SNPs : "A SNP for which a single form or allele can be identified in the population of interest". I aligned 100 bacterial genome (from different host ) to a reference by using Parsnp and called the snp . Now i would like to know how many monomorphic positions are shared across the host.

ADD REPLY • link 7.5 years ago by jeccy.J ▴ 60

0

Entering edit mode

ok, so you need to know which snps have the same allele in all samples. first step clear.

now we would need to know wether you have a single multisample vcf, or multiple single sample vcfs.

ADD REPLY • link 7.5 years ago by Jorge Amigo 14k

0

Entering edit mode

single multisample vcf

ADD REPLY • link 7.5 years ago by jeccy.J ▴ 60

0

Entering edit mode

then see my answer below

ADD REPLY • link 7.5 years ago by Jorge Amigo 14k

Jorge Amigo · Answer 1 · 2016-11-04

1

Entering edit mode

7.5 years ago

Jorge Amigo 14k

if you have a single multisample vcf file, then you can use bcftools to create a vcf file with 1-allele-only variants:

bcftools view -C1 file.vcf

you can also loop through all samples' genotypes and check if they're all the same. you could then print, for instance, the list of chromosome positions of those monomorphic variants:

grep -v ^# file.vcf \
| perl -lane '%gts = ();
foreach (@F[9..$#F]) { s/:.+//; $gts{$_} = 1 }
print join "\t", @F[0,1]" if scalar(keys %gts) == 1'

ADD COMMENT • link 7.5 years ago by Jorge Amigo 14k

0

Entering edit mode

I tried with your 1st suggestion but got some error about my vcf file format : Do you have any idea what could be the problem?

##FORMAT=<ID=DP,Number=1,Type=Integer,Description="# high-quality bases">
##FORMAT=<ID=DV,Number=1,Type=Integer,Description="# high-quality non-reference bases">
##FORMAT=<ID=SP,Number=1,Type=Integer,Description="Phred-scaled strand bias P-value">
##FORMAT=<ID=PL,Number=G,Type=Integer,Description="List of Phred-scaled genotype likelihoods">
#CHROM  POS ID  REF ALT QUAL    FILTER  INFO    FORMAT
[bcf_sync] incorrect number of fields (0 != 5) at 0:0

ADD REPLY • link updated 7.5 years ago by Jorge Amigo 14k • written 7.5 years ago by jeccy.J ▴ 60

0

Entering edit mode

if you google that error, you will find answers like this one that may give you some hints.

ADD REPLY • link 7.5 years ago by Jorge Amigo 14k

0

Entering edit mode

Here is the header of my vcf file

##FILTER=<ID=IND,Description="Column contains indel">
##FILTER=<ID=N,Description="Column contains N">
##FILTER=<ID=LCB,Description="LCB smaller than 200bp">
##FILTER=<ID=CID,Description="SNP in aligned 100bp window with < 50% column % ID">
##FILTER=<ID=ALN,Description="SNP in aligned 100b window with > 20 indels">
##FILTER=<ID=REC,Description="PhiPack">

ADD REPLY • link updated 7.5 years ago by Jorge Amigo 14k • written 7.5 years ago by jeccy.J ▴ 60

0

Entering edit mode

that is not a full vcf header. I would recommend you to go for the second suggestion, but if the first one gives you an error is probably due to a malformed input vcf. you need to make sure your input is correct before parsing it.

ADD REPLY • link 7.5 years ago by Jorge Amigo 14k