1000Genomes Mapping Onto Protein Sequences?
        3 
    
    
    
        
        
        
        
            
                
                
                    
                        
                    
                
                    
                        Hi,
does the 1000genomes project provide any mapping onto protein sequence? Couldn't find any info on their website or data files. Could that information be retrieved somewhere else? I know that dbSNP does this, but data from 1kG may not be included entirely in dbSNP.
Thanks
Chris
Edit: To make this clearer: I'm interested in nsSNPs.
                    
                 
                 
                
                
                    
                    
    
        
        
            genome
         
        
    
        
        
            protein
         
        
    
        
        
            mapping
         
        
    
        
        
            snp
         
        
    
    
        • 3.7k views
    
 
                
                
    
    • 
link 
    
    
    
    
    
    
        
    
        updated 14.0 years ago by
        
            Laura 
         
        
    
        ★
    
    1.8k
        •
    
        written 14.0 years ago by
        
            Chris 
         
        
    
        ★
    
    1.6k
     
 
            
            
         
     
 
     
    
        
            
                
    
    
    
    
        
        
        
        
            
                
                
                    
                        
                    
                
                    
                        You can download the VCF files for 1000G from the latest release here .  Then, it is pretty straightforward to run snpEff, annovar, or VariantAnnotation (Bioconductor) to get mappings to transcripts and to see the effects variants have on proteins.
                    
                 
                 
                
                
                 
            
            
         
     
 
         
        
            
                
    
    
    
    
        
        
        
        
            
                
                
                    
                        
                    
                
                    
                        You could try to use the the ensembl effect predictor  or snpeff  to map those data.
Note: I'm currently writing a set of C++ tools doing this kind of task  on the fly: 
$ curl -s "ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20100804/ALL.2of4intersection.20100804.sites.vcf.gz "  |\
  gunzip -c |\
  grep -v "##" | normalizechrom -c 1|\
  prediction -f hg19.fa |\
  egrep '(EXON|#CHROM)' | head | verticalize | cut -c 1-90
>>> 2
$1  #CHROM                      chr1
$2  POS                         69511
$3  ID                          rs75062661
$4  REF                         A
$5  ALT                         G
$6  QUAL                        .
$7  FILTER                      PASS
$8  INFO                        DP=607;AF=0.789;CB=UM,BI;EUR_R2=0.054;AFR_R2=0.247
$9  knownGene.name              uc001aal.1
$10 knownGene.strand            +
$11 knownGene.txStart           69090
$12 knownGene.txEnd             70008
$13 knownGene.cdsStart          69090
$14 knownGene.cdsEnd            70008
$15 prediction.type             EXON|EXON_CODING_NON_SYNONYMOUS
$16 prediction.pos_in_cdna      420
$17 prediction.pos_in_protein   141
$18 prediction.exon             Exon 1
$19 prediction.intron           .
$20 prediction.wild.codon       ACA
$21 prediction.mut.codon        GCA
$22 prediction.wild.aa          T
$23 prediction.mut.aa           A
$24 prediction.wild.prot        MVTEFIFLGLSDSQELQTFLFMLFFVFYGGIVFGNLLIVITVVSDSHLHSPMYFLLANLS
$25 prediction.mut.prot         MVTEFIFLGLSDSQELQTFLFMLFFVFYGGIVFGNLLIVITVVSDSHLHSPMYFLLANLS
$26 prediction.wild.rna         ATGGTGACTGAATTCATTTTTCTGGGTCTCTCTGATTCTCAGGAACTCCAGACCTTCCTA
$27 prediction.mut.rna          ATGGTGACTGAATTCATTTTTCTGGGTCTCTCTGATTCTCAGGAACTCCAGACCTTCCTA
$28 prediction.splicing         .
<<< 2
>>> 3
$1  #CHROM                      chr1
$2  POS                         324822
$3  ID                          .
$4  REF                         A
$5  ALT                         T
$6  QUAL                        .
$7  FILTER                      PASS
$8  INFO                        DP=1649;AF=0.005;CB=UM,BI;EUR_R2=0.141;AFR_R2=0.017
$9  knownGene.name              uc009vjk.2
$10 knownGene.strand            +
$11 knownGene.txStart           322036
$12 knownGene.txEnd             326938
$13 knownGene.cdsStart          324342
$14 knownGene.cdsEnd            325605
$15 prediction.type             EXON|EXON_CODING_SYNONYMOUS
$16 prediction.pos_in_cdna      386
$17 prediction.pos_in_protein   129
$18 prediction.exon             Exon 3
$19 prediction.intron           .
$20 prediction.wild.codon       GCA
$21 prediction.mut.codon        GCT
$22 prediction.wild.aa          A
$23 prediction.mut.aa           A
$24 prediction.wild.prot        MLLPPGSLSRPRTFSSQPLQTKLMTHNGLFRPIPYVTAASADEATASQQPPQAQLHRYNG
$25 prediction.mut.prot         MLLPPGSLSRPRTFSSQPLQTKLMTHNGLFRPIPYVTAASADEATASQQPPQAQLHRYNG
$26 prediction.wild.rna         ATGCTCCTACCTCCCGGCAGCCTCTCCAGGCCCAGAACTTTCTCCAGTCAGCCTCTACAG
$27 prediction.mut.rna          ATGCTCCTACCTCCCGGCAGCCTCTCCAGGCCCAGAACTTTCTCCAGTCAGCCTCTACAG
$28 prediction.splicing         .
<<< 3
>>> 4
$1  #CHROM                      chr1
$2  POS                         324822
$3  ID                          .
$4  REF                         A
$5  ALT                         T
$6  QUAL                        .
$7  FILTER                      PASS
$8  INFO                        DP=1649;AF=0.005;CB=UM,BI;EUR_R2=0.141;AFR_R2=0.017
$9  knownGene.name              uc001aau.2
$10 knownGene.strand            +
$11 knownGene.txStart           323891
$12 knownGene.txEnd             328580
$13 knownGene.cdsStart          324342
$14 knownGene.cdsEnd            325605
$15 prediction.type             EXON|EXON_CODING_SYNONYMOUS
$16 prediction.pos_in_cdna      386
$17 prediction.pos_in_protein   129
$18 prediction.exon             Exon 3
$19 prediction.intron           .
$20 prediction.wild.codon       GCA
$21 prediction.mut.codon        GCT
$22 prediction.wild.aa          A
$23 prediction.mut.aa           A
$24 prediction.wild.prot        MLLPPGSLSRPRTFSSQPLQTKLMTHNGLFRPIPYVTAASADEATASQQPPQAQLHRYNG
$25 prediction.mut.prot         MLLPPGSLSRPRTFSSQPLQTKLMTHNGLFRPIPYVTAASADEATASQQPPQAQLHRYNG
$26 prediction.wild.rna         ATGCTCCTACCTCCCGGCAGCCTCTCCAGGCCCAGAACTTTCTCCAGTCAGCCTCTACAG
$27 prediction.mut.rna          ATGCTCCTACCTCCCGGCAGCCTCTCCAGGCCCAGAACTTTCTCCAGTCAGCCTCTACAG
$28 prediction.splicing         .
<<< 4
>>> 5
$1  #CHROM                      chr1
$2  POS                         762085
$3  ID                          .
$4  REF                         G
$5  ALT                         A
$6  QUAL                        .
$7  FILTER                      PASS
$8  INFO                        DP=428;AF=0.028;CB=BC,NCBI
$9  knownGene.name              uc010nxx.1
$10 knownGene.strand            -
$11 knownGene.txStart           761586
$12 knownGene.txEnd             762902
$13 knownGene.cdsStart          762079
$14 knownGene.cdsEnd            762571
$15 prediction.type             EXON|EXON_STOP_GAINED
$16 prediction.pos_in_cdna      486
$17 prediction.pos_in_protein   163
$18 prediction.exon             Exon 1
$19 prediction.intron           .
$20 prediction.wild.codon       CAG
$21 prediction.mut.codon        TAG
$22 prediction.wild.aa          Q
$23 prediction.mut.aa           *
$24 prediction.wild.prot        MWLVFHRPHPRPSWPLRAALGFGRRQSSLRCFPVLPSARPYVSANPTLRGGRLRQDPESE
$25 prediction.mut.prot         MWLVFHRPHPRPSWPLRAALGFGRRQSSLRCFPVLPSARPYVSANPTLRGGRLRQDPESE
$26 prediction.wild.rna         ATGTGGCTTGTCTTCCATCGTCCCCACCCTCGCCCCTCTTGGCCCCTCAGGGCAGCCCTG
$27 prediction.mut.rna          ATGTGGCTTGTCTTCCATCGTCCCCACCCTCGCCCCTCTTGGCCCCTCAGGGCAGCCCTG
$28 prediction.splicing         .
<<< 5
>>> 6
$1  #CHROM                      chr1
$2  POS                         762109
$3  ID                          .
$4  REF                         C
$5  ALT                         T
$6  QUAL                        .
$7  FILTER                      PASS
$8  INFO                        DP=2991;AF=0.009;CB=UM,BI,BC,NCBI;EUR_R2=0.652;AFR_R2=0.744
$9  knownGene.name              uc010nxx.1
$10 knownGene.strand            -
$11 knownGene.txStart           761586
$12 knownGene.txEnd             762902
$13 knownGene.cdsStart          762079
$14 knownGene.cdsEnd            762571
$15 prediction.type             EXON|EXON_CODING_NON_SYNONYMOUS
$16 prediction.pos_in_cdna      462
$17 prediction.pos_in_protein   155
$18 prediction.exon             Exon 1
$19 prediction.intron           .
$20 prediction.wild.codon       GTG
$21 prediction.mut.codon        ATG
$22 prediction.wild.aa          V
$23 prediction.mut.aa           M
$24 prediction.wild.prot        MWLVFHRPHPRPSWPLRAALGFGRRQSSLRCFPVLPSARPYVSANPTLRGGRLRQDPESE
$25 prediction.mut.prot         MWLVFHRPHPRPSWPLRAALGFGRRQSSLRCFPVLPSARPYVSANPTLRGGRLRQDPESE
$26 prediction.wild.rna         ATGTGGCTTGTCTTCCATCGTCCCCACCCTCGCCCCTCTTGGCCCCTCAGGGCAGCCCTG
$27 prediction.mut.rna          ATGTGGCTTGTCTTCCATCGTCCCCACCCTCGCCCCTCTTGGCCCCTCAGGGCAGCCCTG
$28 prediction.splicing         .
<<< 6
>>> 7
$1  #CHROM                      chr1
$2  POS                         762187
$3  ID                          .
$4  REF                         C
$5  ALT                         T
$6  QUAL                        .
$7  FILTER                      PASS
$8  INFO                        DP=2788;AF=0.007;CB=UM,BI,BC,NCBI;AFR_R2=0.906
$9  knownGene.name              uc010nxx.1
$10 knownGene.strand            -
$11 knownGene.txStart           761586
$12 knownGene.txEnd             762902
$13 knownGene.cdsStart          762079
$14 knownGene.cdsEnd            762571
$15 prediction.type             EXON|EXON_CODING_NON_SYNONYMOUS
$16 prediction.pos_in_cdna      384
$17 prediction.pos_in_protein   129
$18 prediction.exon             Exon 1
$19 prediction.intron           .
$20 prediction.wild.codon       GAG
$21 prediction.mut.codon        AAG
$22 prediction.wild.aa          E
$23 prediction.mut.aa           K
$24 prediction.wild.prot        MWLVFHRPHPRPSWPLRAALGFGRRQSSLRCFPVLPSARPYVSANPTLRGGRLRQDPESE
$25 prediction.mut.prot         MWLVFHRPHPRPSWPLRAALGFGRRQSSLRCFPVLPSARPYVSANPTLRGGRLRQDPESE
$26 prediction.wild.rna         ATGTGGCTTGTCTTCCATCGTCCCCACCCTCGCCCCTCTTGGCCCCTCAGGGCAGCCCTG
$27 prediction.mut.rna          ATGTGGCTTGTCTTCCATCGTCCCCACCCTCGCCCCTCTTGGCCCCTCAGGGCAGCCCTG
$28 prediction.splicing         .
<<< 7
>>> 8
$1  #CHROM                      chr1
$2  POS                         762273
$3  ID                          rs3115849
$4  REF                         G
$5  ALT                         A
$6  QUAL                        .
$7  FILTER                      PASS
$8  INFO                        DP=1202;AF=0.555;CB=UM,BI,BC;EUR_R2=0.636;AFR_R2=0.629
$9  knownGene.name              uc010nxx.1
$10 knownGene.strand            -
$11 knownGene.txStart           761586
$12 knownGene.txEnd             762902
$13 knownGene.cdsStart          762079
$14 knownGene.cdsEnd            762571
$15 prediction.type             EXON|EXON_CODING_NON_SYNONYMOUS
$16 prediction.pos_in_cdna      298
$17 prediction.pos_in_protein   100
$18 prediction.exon             Exon 1
$19 prediction.intron           .
$20 prediction.wild.codon       CCT
$21 prediction.mut.codon        CTT
$22 prediction.wild.aa          P
$23 prediction.mut.aa           L
$24 prediction.wild.prot        MWLVFHRPHPRPSWPLRAALGFGRRQSSLRCFPVLPSARPYVSANPTLRGGRLRQDPESE
$25 prediction.mut.prot         MWLVFHRPHPRPSWPLRAALGFGRRQSSLRCFPVLPSARPYVSANPTLRGGRLRQDPESE
$26 prediction.wild.rna         ATGTGGCTTGTCTTCCATCGTCCCCACCCTCGCCCCTCTTGGCCCCTCAGGGCAGCCCTG
$27 prediction.mut.rna          ATGTGGCTTGTCTTCCATCGTCCCCACCCTCGCCCCTCTTGGCCCCTCAGGGCAGCCCTG
$28 prediction.splicing         .
<<< 8
>>> 9
$1  #CHROM                      chr1
$2  POS                         762320
$3  ID                          rs75333668
$4  REF                         C
$5  ALT                         T
$6  QUAL                        .
$7  FILTER                      PASS
$8  INFO                        DP=2030;AF=0.048;CB=UM,BI,BC,NCBI;EUR_R2=0.529;AFR_R2=0.709
$9  knownGene.name              uc010nxx.1
$10 knownGene.strand            -
$11 knownGene.txStart           761586
$12 knownGene.txEnd             762902
$13 knownGene.cdsStart          762079
$14 knownGene.cdsEnd            762571
$15 prediction.type             EXON|EXON_CODING_SYNONYMOUS
$16 prediction.pos_in_cdna      251
$17 prediction.pos_in_protein   84
$18 prediction.exon             Exon 1
$19 prediction.intron           .
$20 prediction.wild.codon       GTG
$21 prediction.mut.codon        GTA
$22 prediction.wild.aa          V
$23 prediction.mut.aa           V
$24 prediction.wild.prot        MWLVFHRPHPRPSWPLRAALGFGRRQSSLRCFPVLPSARPYVSANPTLRGGRLRQDPESE
$25 prediction.mut.prot         MWLVFHRPHPRPSWPLRAALGFGRRQSSLRCFPVLPSARPYVSANPTLRGGRLRQDPESE
$26 prediction.wild.rna         ATGTGGCTTGTCTTCCATCGTCCCCACCCTCGCCCCTCTTGGCCCCTCAGGGCAGCCCTG
$27 prediction.mut.rna          ATGTGGCTTGTCTTCCATCGTCCCCACCCTCGCCCCTCTTGGCCCCTCAGGGCAGCCCTG
$28 prediction.splicing         .
<<< 9
>>> 10
$1  #CHROM                      chr1
$2  POS                         762330
$3  ID                          rs74045217
$4  REF                         G
$5  ALT                         T
$6  QUAL                        .
$7  FILTER                      PASS
$8  INFO                        DP=2132;AF=0.038;CB=UM,BC
$9  knownGene.name              uc010nxx.1
$10 knownGene.strand            -
$11 knownGene.txStart           761586
$12 knownGene.txEnd             762902
$13 knownGene.cdsStart          762079
$14 knownGene.cdsEnd            762571
$15 prediction.type             EXON|EXON_CODING_NON_SYNONYMOUS
$16 prediction.pos_in_cdna      241
$17 prediction.pos_in_protein   81
$18 prediction.exon             Exon 1
$19 prediction.intron           .
$20 prediction.wild.codon       CCA
$21 prediction.mut.codon        CAA
$22 prediction.wild.aa          P
$23 prediction.mut.aa           Q
$24 prediction.wild.prot        MWLVFHRPHPRPSWPLRAALGFGRRQSSLRCFPVLPSARPYVSANPTLRGGRLRQDPESE
$25 prediction.mut.prot         MWLVFHRPHPRPSWPLRAALGFGRRQSSLRCFPVLPSARPYVSANPTLRGGRLRQDPESE
$26 prediction.wild.rna         ATGTGGCTTGTCTTCCATCGTCCCCACCCTCGCCCCTCTTGGCCCCTCAGGGCAGCCCTG
$27 prediction.mut.rna          ATGTGGCTTGTCTTCCATCGTCCCCACCCTCGCCCCTCTTGGCCCCTCAGGGCAGCCCTG
$28 prediction.splicing         .
<<< 10
 
                 
                
                
                 
            
            
         
     
 
         
        
            
        
 
    
    
        
            
                  before adding your answer.
         
    
    
         
        
            
        
     
    
    Traffic: 4102 users visited in the last hour
         
    
    
        
    
    
 
It seems that you are interested in mapping variants discovered by 1000G onto proteins. If this is true, than please add a "SNP" or "genetic-variation" tag to your question. This is a good question and should have the best tags possible.
Larry: Good point. Done.