long view SNPs in vcf file
1
0
Entering edit mode
2.0 years ago

Hello guys,

I did run freebayes and performed some filtration on the output. - removing indels, and discardmulti-allelic sites. There were long-looking SNP sequences with multiple allelic sites before filtration. For example, ATTGC, ATGGC, TTGGC..

However, even after discarding multiallelic sites, the appearance of vcf file is still same that you can see the picture below. How can I handle it? I will convert my vcf to plink and implement PCA. Would it be problem with those sites?

Also, my another question regarding discarding multi-allelic sites. How is it decided which site is discarded or not ? Is there any way that I should follow and consider the sites when discarding them ? Or can I still go with multi-allelic site for PCA analysis?

Thanks!

long SNPs

bcftools Variant-Calling SNP freebayes vcftools • 1.5k views
ADD COMMENT
1
Entering edit mode

However, even after discarding multiallelic sites,

what was the command line ?

How is it decided which site is discarded or not ?

it depends of the scientific question asked

ADD REPLY
0
Entering edit mode

Thank you for your answer, Pierre.

Not the question, but I have metagenomic data. So multi-allelic sites could make sense in my case. However, I have an error about the multi-allelic when I merge two vcf file to create PLINK format. Therefore, I always discard these sites from VCF, too.

In addition to this, can I directly implement PCA on data which contains multi-allelic sites?

ADD REPLY
4
Entering edit mode
2.0 years ago

You are misunderstanding what "multiallelic" means.

"Multiallelic" does not mean "an allele has multiple bases", which occurs at positions 358 and 399 in your screenshot. It means "there are multiple ALT alleles", e.g. REF is "A" while ALT is "G,T". If there is no comma in the ALT column, the variant is not multiallelic.

As for why the bcftools filters you applied did not remove the variants at positions 358 and 399, that's because those variants are actually regular SNPs, they're just represented weirdly; see https://genome.sph.umich.edu/wiki/Variant_Normalization for some discussion. You can use the "bcftools norm" command with the --fasta-ref flag to left-normalize these weirdly-represented SNPs.

ADD COMMENT
0
Entering edit mode

Thank you. bcftools norm --fasta-ref was exactly what I was looking for, It worked well !

Sorry, I know what multi-allelic means but I think I could not explain my issue very well in the question post. The picture I shared was obtained after multi-allelic sites were discarded.

ADD REPLY
0
Entering edit mode

I got the sample problem. Very helpful!

ADD REPLY

Login before adding your answer.

Traffic: 835 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6