Question: SnpEff: How to match variants with line ID in 1001 Genomes Project?
1
gravatar for chase.lewis
14 days ago by
chase.lewis40
chase.lewis40 wrote:

Hi,

I'm trying to use data from the 1001 Genomes Project to solve a problem. There are over a 1000 lines that have already been sequenced, mapped, and analyzed with SnpEff. I want to use this data to see all the variants in a certain gene. I found the file 1001genomes_snp-short-indel_only_ACGTN_v3.1.vcf.snpeff.gz located at the link https://1001genomes.org/data/GMI-MPI/releases/v3.1/1001genomes_snpeff_v3.1/

Basically, it is a very large file with all the variant information. All the line IDs are listed at the beginning of the file. After this, the variants are listed. However, looking at the variants, I don't know which specific lines showed the change and which did not. For those who are familiar with SnpEff (which I am new to), is there a way to obtain this information? Am I looking in the right place? Is there some alignment and analysis work I should do on my own?

Any help is appreciated.

sequencing snp snpeff • 84 views
ADD COMMENTlink modified 14 days ago by zx87547.5k • written 14 days ago by chase.lewis40

http://snpeff.sourceforge.net/SnpEff_manual.html#databases Read through & please let me know if you could figure out.

ADD REPLYlink written 14 days ago by vaish01kv0
1

I figured it out. I got base pair numbers of interesting mutations from the main file and used grep to find each number in the individual vcf files (non-snpeff files), which I downloaded from the site. Grep reads back the file name if it finds the string in it, if you grep through multiple files as once (using asterisk).

ADD REPLYlink written 14 days ago by chase.lewis40
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 934 users visited in the last hour