Question: Post-GWAS: Determining whether SNPs represent an indel, given only SNPs
1
gravatar for michael.nagle
6 weeks ago by
michael.nagle100
michael.nagle100 wrote:

A number of SNPs appear to be associated with a trait of interest. I wish to look into all of these SNPs and get an idea of how they affect genes and whether they correspond to insertions or deletion mutations (indels).

I do not have access to genome sequence data for the GWAS population, only SNP files (28M SNPs for ~900 individuals with ~350Mb genome), which I have in a number of formats including .bed, .ped, .tped, and .vcf. The available dataset only includes the SNP identity, which is A, G, C, T or missing. This is from high-coverage genome data and missing SNPs presumably correspond to indels.

Given the information in these files, is there a way for me to find out if SNPs correspond to indels or point mutations? I could just identify SNPs that are missing in some genotypes and assume there's an indel at each specific marker position, but I hope to actually be able to figure out how long the indel is and where it starts and stops, and to do this on a high-throughput scale automatically for hundreds of SNPs.

Is there an existing GUI, command line or R interface for determining if SNPs correspond to indels, or better yet, reconstructing genomes from SNP files so that I can align different alleles and see clearly how SNPs can correspond to indels?

Edit: This is for a non-model plant organism.
Edit 2: Please find below a sample of my .vcf enter image description here

gwas • 175 views
ADD COMMENTlink modified 6 weeks ago • written 6 weeks ago by michael.nagle100

Do you have rsIDs of the variants?

ADD REPLYlink written 6 weeks ago by Emily_Ensembl18k

No, I'm working with a non-model plant organism.

ADD REPLYlink written 6 weeks ago by michael.nagle100

Can you show us a sample of your VCF? The information should be in there. If not, it's not VCF and you need to send it back to whoever made it for you.

ADD REPLYlink written 6 weeks ago by Emily_Ensembl18k

I've edited my post to include a sample of the VCF file. Thanks.

ADD REPLYlink written 6 weeks ago by michael.nagle100

Is your VCF data now in Excel format, or in a R object? - not much good there. To view indels, and if you still have the original VCF, you simply need to do:

bcftools view --types indels my.vcf
ADD REPLYlink modified 6 weeks ago • written 6 weeks ago by Kevin Blighe41k

My VCF is in VCF format, but for the purpose of this post I opened it in a spreadsheet program and took a screenshot so that it would be human-readable.

This bcftools command would work if indels were called in this VCF. As you can see in the screenshot, only SNPs have been called. My question is about determining where indels are, given SNP data. It is common in GWAS for all variation to be represented one base at a time, as SNP calls, without indels being called separately.

My VCF shows gaps at many positions for certain genotypes, followed by series of SNPs, and from this there should be a way to call indels.

ADD REPLYlink written 6 weeks ago by michael.nagle100
1

In VCF indels should be represented like GA A for deletions or G GA for insertions. If you do not have this representation in your VCF then it is not correct VCF. You need to send this back to the source and ask for the correct version.

ADD REPLYlink written 6 weeks ago by Emily_Ensembl18k

I see - thanks. I am not aware of anything that takes a VCF and calls indels from it. Obviously, it would be better to go back to the BAM so that you can take a look at the reads and make indel calls from that stage.

Are you sure that these gaps are not due to other reasons, such as QC and / or probe failure?

ADD REPLYlink written 6 weeks ago by Kevin Blighe41k

I doubt all the gaps are indels. I'll try getting the sequence data. Thanks.

ADD REPLYlink written 6 weeks ago by naglemi0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 775 users visited in the last hour