Question: 1000Genomes Mapping Onto Protein Sequences?
5
gravatar for Chris
7.4 years ago by
Chris1.6k
Munich
Chris1.6k wrote:

Hi,

does the 1000genomes project provide any mapping onto protein sequence? Couldn't find any info on their website or data files. Could that information be retrieved somewhere else? I know that dbSNP does this, but data from 1kG may not be included entirely in dbSNP.

Thanks Chris

Edit: To make this clearer: I'm interested in nsSNPs.

genome protein mapping snp • 2.0k views
ADD COMMENTlink modified 7.4 years ago by Laura1.7k • written 7.4 years ago by Chris1.6k

It seems that you are interested in mapping variants discovered by 1000G onto proteins. If this is true, than please add a "SNP" or "genetic-variation" tag to your question. This is a good question and should have the best tags possible.

ADD REPLYlink written 7.4 years ago by Larry_Parnell16k

Larry: Good point. Done.

ADD REPLYlink written 7.4 years ago by Chris1.6k
1
gravatar for Sean Davis
7.4 years ago by
Sean Davis25k
National Institutes of Health, Bethesda, MD
Sean Davis25k wrote:

You can download the VCF files for 1000G from the latest release here. Then, it is pretty straightforward to run snpEff, annovar, or VariantAnnotation (Bioconductor) to get mappings to transcripts and to see the effects variants have on proteins.

ADD COMMENTlink modified 7.4 years ago • written 7.4 years ago by Sean Davis25k
2

And remember that an integrated set of variant calls and phased genotypes including SNPS, short INDELs and Deletions based on low coverage and exome sequencing data across 1092 individuals has been released recently. link: http://www.1000genomes.org/announcements/october-2011-integrated-variant-set-release-ichg2011-2011-10-12

ADD REPLYlink written 7.4 years ago by Jorjial270

Thanks, jorjial. Edited link to reflect your comments.

ADD REPLYlink written 7.4 years ago by Sean Davis25k
1
gravatar for Pierre Lindenbaum
7.4 years ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum118k wrote:

You could try to use the the ensembl effect predictor or snpeff to map those data.

Note: I'm currently writing a set of C++ tools doing this kind of task on the fly:

$ curl -s "ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20100804/ALL.2of4intersection.20100804.sites.vcf.gz"  |\
  gunzip -c |\
  grep -v "##" | normalizechrom -c 1|\
  prediction -f hg19.fa |\
  egrep '(EXON|#CHROM)' | head | verticalize | cut -c 1-90

>>> 2
$1  #CHROM                      chr1
$2  POS                         69511
$3  ID                          rs75062661
$4  REF                         A
$5  ALT                         G
$6  QUAL                        .
$7  FILTER                      PASS
$8  INFO                        DP=607;AF=0.789;CB=UM,BI;EUR_R2=0.054;AFR_R2=0.247
$9  knownGene.name              uc001aal.1
$10 knownGene.strand            +
$11 knownGene.txStart           69090
$12 knownGene.txEnd             70008
$13 knownGene.cdsStart          69090
$14 knownGene.cdsEnd            70008
$15 prediction.type             EXON|EXON_CODING_NON_SYNONYMOUS
$16 prediction.pos_in_cdna      420
$17 prediction.pos_in_protein   141
$18 prediction.exon             Exon 1
$19 prediction.intron           .
$20 prediction.wild.codon       ACA
$21 prediction.mut.codon        GCA
$22 prediction.wild.aa          T
$23 prediction.mut.aa           A
$24 prediction.wild.prot        MVTEFIFLGLSDSQELQTFLFMLFFVFYGGIVFGNLLIVITVVSDSHLHSPMYFLLANLS
$25 prediction.mut.prot         MVTEFIFLGLSDSQELQTFLFMLFFVFYGGIVFGNLLIVITVVSDSHLHSPMYFLLANLS
$26 prediction.wild.rna         ATGGTGACTGAATTCATTTTTCTGGGTCTCTCTGATTCTCAGGAACTCCAGACCTTCCTA
$27 prediction.mut.rna          ATGGTGACTGAATTCATTTTTCTGGGTCTCTCTGATTCTCAGGAACTCCAGACCTTCCTA
$28 prediction.splicing         .
<<< 2

>>> 3
$1  #CHROM                      chr1
$2  POS                         324822
$3  ID                          .
$4  REF                         A
$5  ALT                         T
$6  QUAL                        .
$7  FILTER                      PASS
$8  INFO                        DP=1649;AF=0.005;CB=UM,BI;EUR_R2=0.141;AFR_R2=0.017
$9  knownGene.name              uc009vjk.2
$10 knownGene.strand            +
$11 knownGene.txStart           322036
$12 knownGene.txEnd             326938
$13 knownGene.cdsStart          324342
$14 knownGene.cdsEnd            325605
$15 prediction.type             EXON|EXON_CODING_SYNONYMOUS
$16 prediction.pos_in_cdna      386
$17 prediction.pos_in_protein   129
$18 prediction.exon             Exon 3
$19 prediction.intron           .
$20 prediction.wild.codon       GCA
$21 prediction.mut.codon        GCT
$22 prediction.wild.aa          A
$23 prediction.mut.aa           A
$24 prediction.wild.prot        MLLPPGSLSRPRTFSSQPLQTKLMTHNGLFRPIPYVTAASADEATASQQPPQAQLHRYNG
$25 prediction.mut.prot         MLLPPGSLSRPRTFSSQPLQTKLMTHNGLFRPIPYVTAASADEATASQQPPQAQLHRYNG
$26 prediction.wild.rna         ATGCTCCTACCTCCCGGCAGCCTCTCCAGGCCCAGAACTTTCTCCAGTCAGCCTCTACAG
$27 prediction.mut.rna          ATGCTCCTACCTCCCGGCAGCCTCTCCAGGCCCAGAACTTTCTCCAGTCAGCCTCTACAG
$28 prediction.splicing         .
<<< 3

>>> 4
$1  #CHROM                      chr1
$2  POS                         324822
$3  ID                          .
$4  REF                         A
$5  ALT                         T
$6  QUAL                        .
$7  FILTER                      PASS
$8  INFO                        DP=1649;AF=0.005;CB=UM,BI;EUR_R2=0.141;AFR_R2=0.017
$9  knownGene.name              uc001aau.2
$10 knownGene.strand            +
$11 knownGene.txStart           323891
$12 knownGene.txEnd             328580
$13 knownGene.cdsStart          324342
$14 knownGene.cdsEnd            325605
$15 prediction.type             EXON|EXON_CODING_SYNONYMOUS
$16 prediction.pos_in_cdna      386
$17 prediction.pos_in_protein   129
$18 prediction.exon             Exon 3
$19 prediction.intron           .
$20 prediction.wild.codon       GCA
$21 prediction.mut.codon        GCT
$22 prediction.wild.aa          A
$23 prediction.mut.aa           A
$24 prediction.wild.prot        MLLPPGSLSRPRTFSSQPLQTKLMTHNGLFRPIPYVTAASADEATASQQPPQAQLHRYNG
$25 prediction.mut.prot         MLLPPGSLSRPRTFSSQPLQTKLMTHNGLFRPIPYVTAASADEATASQQPPQAQLHRYNG
$26 prediction.wild.rna         ATGCTCCTACCTCCCGGCAGCCTCTCCAGGCCCAGAACTTTCTCCAGTCAGCCTCTACAG
$27 prediction.mut.rna          ATGCTCCTACCTCCCGGCAGCCTCTCCAGGCCCAGAACTTTCTCCAGTCAGCCTCTACAG
$28 prediction.splicing         .
<<< 4

>>> 5
$1  #CHROM                      chr1
$2  POS                         762085
$3  ID                          .
$4  REF                         G
$5  ALT                         A
$6  QUAL                        .
$7  FILTER                      PASS
$8  INFO                        DP=428;AF=0.028;CB=BC,NCBI
$9  knownGene.name              uc010nxx.1
$10 knownGene.strand            -
$11 knownGene.txStart           761586
$12 knownGene.txEnd             762902
$13 knownGene.cdsStart          762079
$14 knownGene.cdsEnd            762571
$15 prediction.type             EXON|EXON_STOP_GAINED
$16 prediction.pos_in_cdna      486
$17 prediction.pos_in_protein   163
$18 prediction.exon             Exon 1
$19 prediction.intron           .
$20 prediction.wild.codon       CAG
$21 prediction.mut.codon        TAG
$22 prediction.wild.aa          Q
$23 prediction.mut.aa           *
$24 prediction.wild.prot        MWLVFHRPHPRPSWPLRAALGFGRRQSSLRCFPVLPSARPYVSANPTLRGGRLRQDPESE
$25 prediction.mut.prot         MWLVFHRPHPRPSWPLRAALGFGRRQSSLRCFPVLPSARPYVSANPTLRGGRLRQDPESE
$26 prediction.wild.rna         ATGTGGCTTGTCTTCCATCGTCCCCACCCTCGCCCCTCTTGGCCCCTCAGGGCAGCCCTG
$27 prediction.mut.rna          ATGTGGCTTGTCTTCCATCGTCCCCACCCTCGCCCCTCTTGGCCCCTCAGGGCAGCCCTG
$28 prediction.splicing         .
<<< 5

>>> 6
$1  #CHROM                      chr1
$2  POS                         762109
$3  ID                          .
$4  REF                         C
$5  ALT                         T
$6  QUAL                        .
$7  FILTER                      PASS
$8  INFO                        DP=2991;AF=0.009;CB=UM,BI,BC,NCBI;EUR_R2=0.652;AFR_R2=0.744
$9  knownGene.name              uc010nxx.1
$10 knownGene.strand            -
$11 knownGene.txStart           761586
$12 knownGene.txEnd             762902
$13 knownGene.cdsStart          762079
$14 knownGene.cdsEnd            762571
$15 prediction.type             EXON|EXON_CODING_NON_SYNONYMOUS
$16 prediction.pos_in_cdna      462
$17 prediction.pos_in_protein   155
$18 prediction.exon             Exon 1
$19 prediction.intron           .
$20 prediction.wild.codon       GTG
$21 prediction.mut.codon        ATG
$22 prediction.wild.aa          V
$23 prediction.mut.aa           M
$24 prediction.wild.prot        MWLVFHRPHPRPSWPLRAALGFGRRQSSLRCFPVLPSARPYVSANPTLRGGRLRQDPESE
$25 prediction.mut.prot         MWLVFHRPHPRPSWPLRAALGFGRRQSSLRCFPVLPSARPYVSANPTLRGGRLRQDPESE
$26 prediction.wild.rna         ATGTGGCTTGTCTTCCATCGTCCCCACCCTCGCCCCTCTTGGCCCCTCAGGGCAGCCCTG
$27 prediction.mut.rna          ATGTGGCTTGTCTTCCATCGTCCCCACCCTCGCCCCTCTTGGCCCCTCAGGGCAGCCCTG
$28 prediction.splicing         .
<<< 6

>>> 7
$1  #CHROM                      chr1
$2  POS                         762187
$3  ID                          .
$4  REF                         C
$5  ALT                         T
$6  QUAL                        .
$7  FILTER                      PASS
$8  INFO                        DP=2788;AF=0.007;CB=UM,BI,BC,NCBI;AFR_R2=0.906
$9  knownGene.name              uc010nxx.1
$10 knownGene.strand            -
$11 knownGene.txStart           761586
$12 knownGene.txEnd             762902
$13 knownGene.cdsStart          762079
$14 knownGene.cdsEnd            762571
$15 prediction.type             EXON|EXON_CODING_NON_SYNONYMOUS
$16 prediction.pos_in_cdna      384
$17 prediction.pos_in_protein   129
$18 prediction.exon             Exon 1
$19 prediction.intron           .
$20 prediction.wild.codon       GAG
$21 prediction.mut.codon        AAG
$22 prediction.wild.aa          E
$23 prediction.mut.aa           K
$24 prediction.wild.prot        MWLVFHRPHPRPSWPLRAALGFGRRQSSLRCFPVLPSARPYVSANPTLRGGRLRQDPESE
$25 prediction.mut.prot         MWLVFHRPHPRPSWPLRAALGFGRRQSSLRCFPVLPSARPYVSANPTLRGGRLRQDPESE
$26 prediction.wild.rna         ATGTGGCTTGTCTTCCATCGTCCCCACCCTCGCCCCTCTTGGCCCCTCAGGGCAGCCCTG
$27 prediction.mut.rna          ATGTGGCTTGTCTTCCATCGTCCCCACCCTCGCCCCTCTTGGCCCCTCAGGGCAGCCCTG
$28 prediction.splicing         .
<<< 7

>>> 8
$1  #CHROM                      chr1
$2  POS                         762273
$3  ID                          rs3115849
$4  REF                         G
$5  ALT                         A
$6  QUAL                        .
$7  FILTER                      PASS
$8  INFO                        DP=1202;AF=0.555;CB=UM,BI,BC;EUR_R2=0.636;AFR_R2=0.629
$9  knownGene.name              uc010nxx.1
$10 knownGene.strand            -
$11 knownGene.txStart           761586
$12 knownGene.txEnd             762902
$13 knownGene.cdsStart          762079
$14 knownGene.cdsEnd            762571
$15 prediction.type             EXON|EXON_CODING_NON_SYNONYMOUS
$16 prediction.pos_in_cdna      298
$17 prediction.pos_in_protein   100
$18 prediction.exon             Exon 1
$19 prediction.intron           .
$20 prediction.wild.codon       CCT
$21 prediction.mut.codon        CTT
$22 prediction.wild.aa          P
$23 prediction.mut.aa           L
$24 prediction.wild.prot        MWLVFHRPHPRPSWPLRAALGFGRRQSSLRCFPVLPSARPYVSANPTLRGGRLRQDPESE
$25 prediction.mut.prot         MWLVFHRPHPRPSWPLRAALGFGRRQSSLRCFPVLPSARPYVSANPTLRGGRLRQDPESE
$26 prediction.wild.rna         ATGTGGCTTGTCTTCCATCGTCCCCACCCTCGCCCCTCTTGGCCCCTCAGGGCAGCCCTG
$27 prediction.mut.rna          ATGTGGCTTGTCTTCCATCGTCCCCACCCTCGCCCCTCTTGGCCCCTCAGGGCAGCCCTG
$28 prediction.splicing         .
<<< 8

>>> 9
$1  #CHROM                      chr1
$2  POS                         762320
$3  ID                          rs75333668
$4  REF                         C
$5  ALT                         T
$6  QUAL                        .
$7  FILTER                      PASS
$8  INFO                        DP=2030;AF=0.048;CB=UM,BI,BC,NCBI;EUR_R2=0.529;AFR_R2=0.709
$9  knownGene.name              uc010nxx.1
$10 knownGene.strand            -
$11 knownGene.txStart           761586
$12 knownGene.txEnd             762902
$13 knownGene.cdsStart          762079
$14 knownGene.cdsEnd            762571
$15 prediction.type             EXON|EXON_CODING_SYNONYMOUS
$16 prediction.pos_in_cdna      251
$17 prediction.pos_in_protein   84
$18 prediction.exon             Exon 1
$19 prediction.intron           .
$20 prediction.wild.codon       GTG
$21 prediction.mut.codon        GTA
$22 prediction.wild.aa          V
$23 prediction.mut.aa           V
$24 prediction.wild.prot        MWLVFHRPHPRPSWPLRAALGFGRRQSSLRCFPVLPSARPYVSANPTLRGGRLRQDPESE
$25 prediction.mut.prot         MWLVFHRPHPRPSWPLRAALGFGRRQSSLRCFPVLPSARPYVSANPTLRGGRLRQDPESE
$26 prediction.wild.rna         ATGTGGCTTGTCTTCCATCGTCCCCACCCTCGCCCCTCTTGGCCCCTCAGGGCAGCCCTG
$27 prediction.mut.rna          ATGTGGCTTGTCTTCCATCGTCCCCACCCTCGCCCCTCTTGGCCCCTCAGGGCAGCCCTG
$28 prediction.splicing         .
<<< 9

>>> 10
$1  #CHROM                      chr1
$2  POS                         762330
$3  ID                          rs74045217
$4  REF                         G
$5  ALT                         T
$6  QUAL                        .
$7  FILTER                      PASS
$8  INFO                        DP=2132;AF=0.038;CB=UM,BC
$9  knownGene.name              uc010nxx.1
$10 knownGene.strand            -
$11 knownGene.txStart           761586
$12 knownGene.txEnd             762902
$13 knownGene.cdsStart          762079
$14 knownGene.cdsEnd            762571
$15 prediction.type             EXON|EXON_CODING_NON_SYNONYMOUS
$16 prediction.pos_in_cdna      241
$17 prediction.pos_in_protein   81
$18 prediction.exon             Exon 1
$19 prediction.intron           .
$20 prediction.wild.codon       CCA
$21 prediction.mut.codon        CAA
$22 prediction.wild.aa          P
$23 prediction.mut.aa           Q
$24 prediction.wild.prot        MWLVFHRPHPRPSWPLRAALGFGRRQSSLRCFPVLPSARPYVSANPTLRGGRLRQDPESE
$25 prediction.mut.prot         MWLVFHRPHPRPSWPLRAALGFGRRQSSLRCFPVLPSARPYVSANPTLRGGRLRQDPESE
$26 prediction.wild.rna         ATGTGGCTTGTCTTCCATCGTCCCCACCCTCGCCCCTCTTGGCCCCTCAGGGCAGCCCTG
$27 prediction.mut.rna          ATGTGGCTTGTCTTCCATCGTCCCCACCCTCGCCCCTCTTGGCCCCTCAGGGCAGCCCTG
$28 prediction.splicing         .
<<< 10
ADD COMMENTlink written 7.4 years ago by Pierre Lindenbaum118k
0
gravatar for Laura
7.4 years ago by
Laura1.7k
Cambridge UK
Laura1.7k wrote:

The ensembl variant effect predictor will do this for you

http://browser.1000genomes.org/Homo_sapiens/UserData/UploadVariations

ADD COMMENTlink written 7.4 years ago by Laura1.7k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 945 users visited in the last hour