Question: How To Distinguish Ref Allele And Null (Not Sequenced) In Ngs Snp Calling?
3
gravatar for H. Won
7.5 years ago by
H. Won30
H. Won30 wrote:

Dear all, Hello.

After I sequence tens of individuals, for example, I would like to summarize the SNP information in a matrix table (x: samples, y: SNPs).

I guess that most tools call only variant alleles for each individuals.

My question is, when some allele for one particular individual is missing, how to know if it is a reference allele OR just missing due to the absence of mapping reads in that area.

I think looking at coverage of mapping reads is one possible option, but expect that other researchers already developed some methods for this issue.

It will be greatly thankful if you suggest any ideas, methods, or reference papers.

Thank you very much.

ADD COMMENTlink modified 7.5 years ago by Pierre Lindenbaum119k • written 7.5 years ago by H. Won30
1

Are you asking how people are representing null or no-calls in data output formats ? Or a method for determining whether an allele is no-called or reference. The former is represented as '.' vs. '0' for ref in VCF. The latter is a matter of debate, however heuristics still seem to be the order of the day.

ADD REPLYlink written 7.5 years ago by Greg Tyrelle70

My question was how to determine null (missing due to low or no coverage) and reference allele? For example, one can have A/A, A/B, or B/B where A is ref and B is var. In the NGS data, A/A and N/N can be distinguishable??? N is missing.

ADD REPLYlink written 7.5 years ago by H. Won30
1
gravatar for Pierre Lindenbaum
7.5 years ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum119k wrote:

I wrote something like that for my (new-old-beta-private-public-i-don't-know )package "variation toolkit". This package contains a program called groupbysnp. Here is an example (here the output has been 'verticalized' )

$  cat sample2vcf.tsv | scanvcf | grep -v "##"  |\ #concatenate all the VCF/sample
   sed 's/^#CHROM/#/' |\ #hack: i want the HEADER at the top after sorting
   sort -t '       ' -k1,1 -k2,2n -k4,4 -k5,5 -k11,11 |\ #sort on CHROM/POS/REF/ALT/SAMPLE
   sed 's/^#/#CHROM/' |\ #hack: i want the HEADER at the top after sorting
   groupbysnp -L 1,2,3,4,5 -T 6,7,8,9,10 --sample 11  -n Sample1,Sample2,Sample3,Sample4  |\#create the pivot table
   verticalize #as it is said...

>>> 2
$1  #CHROM          1
$2  POS             753405
$3  ID              rs61770173
$4  REF             C
$5  ALT             A
$6  Sample1             .
$7  Sample1:QUAL        .
$8  Sample1:FILTER      .
$9  Sample1:INFO        .
$10 Sample1:FORMAT      .
$11 Sample1:CALL        .
$12 Sample2             .
$13 Sample2:QUAL        .
$14 Sample2:FILTER      .
$15 Sample2:INFO        .
$16 Sample2:FORMAT      .
$17 Sample2:CALL        .
$18 Sample3             Sample3
$19 Sample3:QUAL        99
$20 Sample3:FILTER      0
$21 Sample3:INFO        AC=2;DB=3;ST=0:0,3:32;DP=35;NC=-0.76;UM=3;CQ=...
$22 Sample3:FORMAT      GT:GQ:DP:FLT
$23 Sample3:CALL        1/1:99:35:0
$24 Sample4             .
$25 Sample4:QUAL        .
$26 Sample4:FILTER      .
$27 Sample4:INFO        .
$28 Sample4:FORMAT      .
$29 Sample4:CALL        .
$30 count.samples   1
<<< 2

>>> 3
$1  #CHROM          1
$2  POS             876499
$3  ID              rs4372192
$4  REF             A
$5  ALT             G
$6  Sample1             .
$7  Sample1:QUAL        .
$8  Sample1:FILTER      .
$9  Sample1:INFO        .
$10 Sample1:FORMAT      .
$11 Sample1:CALL        .
$12 Sample2             .
$13 Sample2:QUAL        .
$14 Sample2:FILTER      .
$15 Sample2:INFO        .
$16 Sample2:FORMAT      .
$17 Sample2:CALL        .
$18 Sample3             .
$19 Sample3:QUAL        .
$20 Sample3:FILTER      .
$21 Sample3:INFO        .
$22 Sample3:FORMAT      .
$23 Sample3:CALL        .
$24 Sample4             Sample4
$25 Sample4:QUAL        45
$26 Sample4:FILTER      0
$27 Sample4:INFO        AC=2;DB=1;ST=0:0,6:0;DP=6;NC=-3.05;UM=3;CQ=...
$28 Sample4:FORMAT      GT:GQ:DP:FLT
$29 Sample4:CALL        1/1:45:6:0
$30 count.samples   1
<<< 3

>>> 4
$1  #CHROM          1
$2  POS             877831
$3  ID              rs6672356
$4  REF             T
$5  ALT             C
$6  Sample1             .
$7  Sample1:QUAL        .
$8  Sample1:FILTER      .
$9  Sample1:INFO        .
$10 Sample1:FORMAT      .
$11 Sample1:CALL        .
$12 Sample2             .
$13 Sample2:QUAL        .
$14 Sample2:FILTER      .
$15 Sample2:INFO        .
$16 Sample2:FORMAT      .
$17 Sample2:CALL        .
$18 Sample3             .
$19 Sample3:QUAL        .
$20 Sample3:FILTER      .
$21 Sample3:INFO        .
$22 Sample3:FORMAT      .
$23 Sample3:CALL        .
$24 Sample4             Sample4
$25 Sample4:QUAL        39
$26 Sample4:FILTER      0
$27 Sample4:INFO        AC=2;DB=1;ST=0:0,2:2;DP=4;NC=0.40;UM=3;CQ=...
$28 Sample4:FORMAT      GT:GQ:DP:FLT
$29 Sample4:CALL        1/1:39:4:0
$30 count.samples   1
<<< 4

>>> 5
$1  #CHROM          1
$2  POS             879317
$3  ID              rs7523549
$4  REF             C
$5  ALT             T
$6  Sample1             CALL
$7  Sample1:QUAL        71
$8  Sample1:FILTER      0
$9  Sample1:INFO        AC=1;DB=1;ST=2:1,3:2;DP=8;NC=2.16;UM=3;CQ=...
$10 Sample1:FORMAT      GT:GQ:DP:FLT
$11 Sample1:CALL        0/1:34:8:0
$12 Sample2             .
$13 Sample2:QUAL        .
$14 Sample2:FILTER      .
$15 Sample2:INFO        .
$16 Sample2:FORMAT      .
$17 Sample2:CALL        .
$18 Sample3             .
$19 Sample3:QUAL        .
$20 Sample3:FILTER      .
$21 Sample3:INFO        .
$22 Sample3:FORMAT      .
$23 Sample3:CALL        .
$24 Sample4             .
$25 Sample4:QUAL        .
$26 Sample4:FILTER      .
$27 Sample4:INFO        .
$28 Sample4:FORMAT      .
$29 Sample4:CALL        .
$30 count.samples   1
<<< 5
ADD COMMENTlink written 7.5 years ago by Pierre Lindenbaum119k

Thank you for the answer. In your output, dot (.) means reference allele or not sequenced (null or missing)?

ADD REPLYlink written 7.5 years ago by H. Won30

no mutation was called for this variation for the given sample.

ADD REPLYlink written 7.5 years ago by Pierre Lindenbaum119k

OK. Then, we still do not know whether dot(.) is R/R or N/N where R is a ref allele and N is missing due to low coverage.

ADD REPLYlink written 7.5 years ago by H. Won30
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1640 users visited in the last hour