Question: filtering the SNPs from vcf file
0
gravatar for DL
24 months ago by
DL20
India
DL20 wrote:

Hi,

I want to filter Snps from vcf file but i am confused that which parameter is good for SNPs filtering. In my vcf file i have found several condition that confused me. I show some of lines of vcf output.

                            FORMAT                  INFO                                            
**#CHROM    POS ID  REF ALT QUAL    FILTER  GT  AD  DP  GQ  PL  AC  AF  AN  INFO**                              
Chr01   16434   .   T   A   32.77   .   0/1 53,6    59  61  61,0,2212   AC=1    0.500   2   BaseQRankSum=0.226  ClippingRankSum=0.000   DP=135  ExcessHet=3.0103    FS=1.850    MLEAC=1 MLEAF=0.500 MQ=32.25    MQRankSum=1.082
Chr01   103148  .   C   A   1017.77 .   0/1 25,3    55  99  1046,0,886  AC=1    0.500   2   BaseQRankSum=0.009  ClippingRankSum=0.000   DP=55   ExcessHet=3.0103    FS=0.000    MLEAC=1 MLEAF=0.500 MQ=60.20    MQRankSum=0.949
Chr01   15650   .   C   A   424.77  .   0/1 3,11    14  58  453,0,58    AC=1    0.500   2   BaseQRankSum=0.853  ClippingRankSum=0.000   DP=25   ExcessHet=3.0103    FS=0.000    MLEAC=1 MLEAF=0.500 MQ=49.38    MQRankSum=0.585 QD=30.34    ReadPosRankSum=1.479    SOR=0.760           
Chr01   15651   .   C   A   424.77  .   0/1 3,11    14  58  453,0,58    AC=1    0.500   2   BaseQRankSum=0.763  ClippingRankSum=0.000   DP=25   ExcessHet=3.0103    FS=0.000    MLEAC=1 MLEAF=0.500 MQ=49.38    MQRankSum=0.585 QD=30.34    ReadPosRankSum=1.481    SOR=0.760           

Now if you see this result, in the first line of result AD=53,6. It means 53 reads have same allele like reference and 6 reads have alternate allele. Is it right that i am saying. If not please tell me what is that?? If i am right then it is good snp ?? My second question is : There are some SNPs that have different DP in info and format column. For those what should i do?? And i read about this and i found that DP of info column is total reads depth and DP in format column is allelic depth. So it would be better to select the SNPs on the basis of allelic depth. Please explain me how should i select the SNPs ??

Thanks in advance

snp sequence next-gen genome • 2.7k views
ADD COMMENTlink modified 24 months ago by RamRS22k • written 24 months ago by DL20

Why do you want to filter them? What is your ultimate goal? These are parameters you can use to filter the file, but not unless you're clear on what you need exactly.

ADD REPLYlink written 24 months ago by RamRS22k

Thanks to reply. I want to filter true SNPs.but before it i want to understand the results.

ADD REPLYlink written 24 months ago by DL20

I assume you're looking for true variants and avoid false positives - if you're looking for polymorphisms, you might need to set some criteria based on population allele frequency and also look into phenotypic effects.

ADD REPLYlink written 24 months ago by RamRS22k

For tool to filer you can use SnpSift.
After that been said; first thing first, as said by @Ram why you want to filter and what is the question you are trying to answer?

there is a nice filtering example decision making can be found here http://userweb.eng.gla.ac.uk/cosmika.goswami/snp_calling/SNPCalling.html

section 8

ADD REPLYlink modified 24 months ago • written 24 months ago by Medhat8.4k

Thank you. I used most of tools but every time i have question ; is it true snp or not?? Can you please tell me that why DP value is different in info and format column??

Thanks

ADD REPLYlink written 24 months ago by DL20
1

The difference between DP filed and AD filed is:

AD and DP : Allele depth and depth of coverage. These are complementary fields that represent two important ways of thinking about the depth of the data for this sample at this site. AD is the unfiltered allele depth, i.e. the number of reads that support each of the reported alleles. All reads at the position (including reads that did not pass the variant caller’s filters) are included in this number, except reads that were considered uninformative. Reads are considered uninformative when they do not provide enough statistical evidence to support one allele over another. DP is the filtered depth, at the sample level. This gives you the number of filtered reads that support each of the reported alleles. You can check the variant caller’s documentation to see which filters are applied by default. Only reads that passed the variant caller’s filters are included in this number. However, unlike the AD calculation, uninformative reads are included in DP. See the Tool Documentation for more details on AD (DepthPerAlleleBySample) and DP (Coverage) for more details.

ADD REPLYlink written 24 months ago by Medhat8.4k

Thank you for your informative response. I read about this. Can you please tell me that why AD value is always smaller than DP value in my result file. Actually there is huge difference between AD and DP value in my result file. i read that the sum of AD may be different than the individual sample depth, especially when there are many non-informative reads. So it means when the reads were align to particular position then most of reads are non-informative or did not proper align in my data?? Thanks

ADD REPLYlink written 24 months ago by DL20

Reads that are not used for calling are not counted in the DP measure, but are included in AD

ADD REPLYlink written 24 months ago by Medhat8.4k

It means then AD >= DP ?? am i right or not ?? I am bothering to much but i want to clear my concept in this field because i am new to analysis this type of data. So i apologize for that.

ADD REPLYlink written 24 months ago by DL20

Yes, you understand it right :)

ADD REPLYlink written 24 months ago by Medhat8.4k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1624 users visited in the last hour