Question: Problem with allele number in vcf
0
gravatar for BAGeno
16 months ago by
BAGeno160
BAGeno160 wrote:

Hi,

I have vcf of 1000 samples. But I am facing the problem that I have different allele number of every site in vcf. There are some samples which have dots instead of 0 and 1 in genotype column. Can any one please tell me how should in correct the problem of different allele number?

allele number genotype vcf • 414 views
ADD COMMENTlink modified 16 months ago by aheinzel110 • written 16 months ago by BAGeno160
1

Hello BAGeno,

you can do this with bcftools:

$ bcftools +fixploidy input.vcf > fixed.vcf

There are more option available that might be useful. Have a look at:

$ bcftools +fixploidy -h

fin swimmer


EDIT:

I moved my post to an comment. Because I first thought that you have . as genotype and wanted ./.. If so the above solution should work (and I can move my post back to an answer). If you already have ./. in your vcf, than see the comment by Ram what this means.

ADD REPLYlink modified 16 months ago • written 16 months ago by finswimmer13k

TIL! I did not know this. What does this do exactly?

ADD REPLYlink written 16 months ago by RamRS25k

I cannot tell much more than what the help file do:

$ bcftools +fixploidy -h   

About: Fix ploidy
Usage: bcftools +fixploidy [General Options] -- [Plugin Options]
Options:
   run "bcftools plugin" for a list of common options

Plugin options:
   -d, --default-ploidy <int>  default ploidy for regions unlisted in -p [2]
   -f, --force-ploidy <int>    ignore -p, set the same ploidy for all genotypes
   -p, --ploidy <file>         space/tab-delimited list of CHROM,FROM,TO,SEX,PLOIDY
   -s, --sex <file>            list of samples, "NAME SEX"
   -t, --tags <list>           VCF tags to fix [GT]

Example:
   # Default ploidy, if -p not given. Unlisted regions have ploidy 2
   X 1 60000 M 1
   X 2699521 154931043 M 1
   Y 1 59373566 M 1
   Y 1 59373566 F 0
   MT 1 16569 M 1
   MT 1 16569 F 1

   # Example of -s file, sex of unlisted samples is "F"
   sampleName1 M

   bcftools +fixploidy in.vcf -- -s samples.txt

So one can use it for example that male sample have gentotypes on X chromosome like 0 and 1 but females 0/0, 0/1, 1/1.

ADD REPLYlink written 16 months ago by finswimmer13k

I have ./. in my vcf. Should I remove these calls from my vcf. I have do different population analysis. I did not called variants so I cannot do anything on that step.

ADD REPLYlink written 16 months ago by BAGeno160

Should I remove these calls from my vcf.

This mainly depends on what exactly is your goal and how many samples have no calls in regions where that variant was found.

Without knowing this there is no general answer.

fin swimmer

ADD REPLYlink written 16 months ago by finswimmer13k

I want to do population analysis. whether a certain disease variants is present in the population or not. Also can you please tell me should you I check this?

how many samples have no calls in regions where that variant was found

ADD REPLYlink written 16 months ago by BAGeno160

how many samples have no calls in regions where that variant was found

One way is to use gatk VariantsToTable.

Or with awk (inspired by Kevin) :

awk -F"\t" 'BEGIN {print "CHR\tPOS\tID\tREF\tALT\tNoCall"} !/^#/ {print $1"\t"$2"\t"$3"\t"$4"\t"$5"\t" gsub(/\.\/\./,"")}' input.vcf

fin swimmer

ADD REPLYlink modified 16 months ago • written 16 months ago by finswimmer13k
1

./. is where the caller could not confidently call a genotype. I don't think there is much you can do computationally to address that, unless you had stringent filters set in the current call.

ADD REPLYlink written 16 months ago by RamRS25k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 688 users visited in the last hour