Question: Multiple reference alleles in vcf file
0
gravatar for waqaskhokhar999
19 months ago by
waqaskhokhar999100 wrote:

I am interested to petform splicing QTL analysis (sQTL). In my vcf files at some reference positions, I have more than one allele, should I need to keep them or remove rows containing those snps? For example position 187 and position 194 contains more than one allele so should I need to remove these rows?

#CHROM  POS ID  REF ALT QUAL    FILTER  INFO    FORMAT  108 139
1   73  .   C   A   .   PASS    .   GT  0   0
1   83  .   T   C,A .   PASS    .   GT  1   1
1   187 .   TG  T   .   PASS    .   GT  1   1
1   188 .   G   T   .   PASS    .   GT  0   0
1   189 .   T   C,G .   PASS    .   GT  0   0
1   190 .   G   A   .   PASS    .   GT  0   0
1   194 .   ATT A   .   PASS    .   GT  1   1
1   209 .   C   T   .   PASS    .   GT  0   0
snp • 895 views
ADD COMMENTlink modified 18 months ago by Fabio Marroni2.6k • written 19 months ago by waqaskhokhar999100
2

I don't see anywhere that you have multiple REF alleles. There are multi-allelic sites (with multiple ALT alleles), sure, but no multiple REF alleles. Maybe you're looking at the wrong column header? Here's your data formatted for eyeballing:

#CHROM  POS  ID  REF  ALT  QUAL  FILTER  INFO  FORMAT  108  139
1       73   .   C    A    .     PASS    .     GT      0    0
1       83   .   T    C,A  .     PASS    .     GT      1    1
1       187  .   TG   T    .     PASS    .     GT      1    1
1       188  .   G    T    .     PASS    .     GT      0    0
1       189  .   T    C,G  .     PASS    .     GT      0    0
1       190  .   G    A    .     PASS    .     GT      0    0
1       194  .   ATT  A    .     PASS    .     GT      1    1
1       209  .   C    T    .     PASS    .     GT      0    0
ADD REPLYlink modified 18 months ago • written 18 months ago by _r_am32k

Indeed this is a new thing for me, I have again checked the original file and it contains multiple reference alleles, I have downloaded the vcf file from here.

ADD REPLYlink written 18 months ago by waqaskhokhar999100

Can you please paste a few sample lines? Use this line of code to get the sample records:

awk -F"\t" -v OFS="\t" -v cntr=0 '$4 ~ /,/ { cntr=cntr+1; print; } cntr==10{ exit; }' | column -ts $'\t' vcf_file.vcf
ADD REPLYlink modified 18 months ago • written 18 months ago by _r_am32k
1
gravatar for Fabio Marroni
18 months ago by
Fabio Marroni2.6k
Italy
Fabio Marroni2.6k wrote:

Positions 187 and 194 are deletions. So in your reference you have TG and in the alternative allele you have T (deletion of a G) . Same is true for position 194, were ATT is the reference allele and the alternative allele is A (meaning that TT are deleted).

ADD COMMENTlink written 18 months ago by Fabio Marroni2.6k

Neither of those positions fit the description of "multiple alleles". They are both single multi-base alleles, the standard way of denoting a deletion.

ADD REPLYlink written 18 months ago by _r_am32k

But that notation might explain OPs confusion.

ADD REPLYlink written 18 months ago by WouterDeCoster45k

That makes sense. Just noticed OP referring to these positions specifically, so they should probably read the VCF specification and make sure they understand regular representation versus multi-allelics..

ADD REPLYlink written 18 months ago by _r_am32k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2158 users visited in the last hour
_