How to download sequences of one gene from all individuals from 1000 genome (phase III)
2
0
Entering edit mode
6.1 years ago

Hi,

I'm kind of new in this.... could you advice me if there is any tutorial or how can I do that to download sequences from all individuals from 1000 genome project of one gene? I want to study SNPs a bit and I need all sequencies of one specific gene... and also is there any option to download directly amino acid sequence? Is it possible to do it with Biomart or I need some programming with R? Thanks.

gene sequence SNP R 1000 genomes • 2.5k views
ADD COMMENT
0
Entering edit mode

Data slicer may be one option (as there are bound to be others).

BioMart from GRCh37 Ensembl site is also recommended by 1000 genomes project.

ADD REPLY
1
Entering edit mode
6.1 years ago
caggtaagtat ★ 1.9k

Hi,

the 1000 genomes project was integrated in Ensembl, which gives you various informations about human genes, among others. To navigate to the different informations, use the menu bar at the left, once you are on the site of a certain gene (e.g. BRCA2)

You can now access all SNPs, which were found in the 1000 genome project, by looking at the so called "Variant table", which also gives you the respective frequency. You can see what nucleotide variations there are for different genomic positions, including non-coding sequences.

Another way would be the using the dbSNP, also founded on the 1000 genomes project.

From ensembl you can than download the sequence of a certain transcript and than compare the sequence with the SNP tables. Since, as far as I know, SNPs often occure in groups, its difficult to define, which sequence you want to download for a single SNP. Maybe there is a way to download the sequence of a gene with specific allels, which I'm not aware of.

ADD COMMENT
1
Entering edit mode
6.1 years ago
Emily 23k

If you're just looking for protein or CDS sequences, these are available in Ensembl. Search for your gene of interest, pick a transcript, then go to Haplotypes in the menu on the left. Here you'll see a list of all the variants that coincide in a single copy of the gene in all the 1000 genomes individuals (haplotypes), and their frequencies. You can expand a haplotype to see the protein or CDS sequence and the individual identifiers who have that haplotype.

Here's an example page.

You can also click on the button "Export data as JSON" to activate this REST API endpoint. You could also use the REST API endpoint directly and add things like individual identifiers to your output.

ADD COMMENT
0
Entering edit mode

This is a simpler approach to study the SNPs than just slicing the gene from the BAM files or assemblies, OP.

ADD REPLY
0
Entering edit mode

Is there a way to get to the haplotype data, including the frequency data, programmatically by searching for a transcript id instead of searching through the website? maybe the rest API?

ADD REPLY
0
Entering edit mode

Yes, as referred to in the second paragraph of my post.

ADD REPLY

Login before adding your answer.

Traffic: 2411 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6