Question: Find gene on genome from cds
0
gravatar for guillaume.rbt
4.0 years ago by
guillaume.rbt830
France
guillaume.rbt830 wrote:

Hi everyone, I have a cds from several genes, and I would like to find the location of thoses genes on a genome assembly (start,stop, and exon positions). Does anybody have an idea how to do that? Thanks

cds genome • 894 views
ADD COMMENTlink modified 4.0 years ago • written 4.0 years ago by guillaume.rbt830

Did you try blast already?

ADD REPLYlink written 4.0 years ago by Benn8.0k

Yes I have but it doesn't give me the positions of the exons.

ADD REPLYlink written 4.0 years ago by guillaume.rbt830
3
gravatar for Santosh Anand
4.0 years ago by
Santosh Anand5.2k
Santosh Anand5.2k wrote:
  1. Blat your sequence on UCSC genome browser to know which gene is this
  2. Go to UCSC table browser and choose

    group: Genes and Gene Prediction

    table: knownGene

  3. click paste list in "identifiers (names/accessions):", and paste the name of the gene from step 1.

  4. Click 'get output', and voila you are done!
ADD COMMENTlink written 4.0 years ago by Santosh Anand5.2k

thank you very much for your help, do you know if you can do this from a terminal ? (because I have a lot of genes)

ADD REPLYlink written 4.0 years ago by guillaume.rbt830

(because I have a lot of genes)

Hmmm.. Change a cds from a gene in your question accordingly! I'll give you some hints to do it programatically. There are two parts to the solution

  1. Getting the names / location of genes from CDS: Though web-BLAT allows more than 1 sequences at a time, there is still a limit on maximum num of seqs. Instead, you can do megablast (from blast suite). Use the option -m 8 or -m 9 (see manual for details) to get the results in tabular format. Megablast is used to blast highly similar sequences. You need to choose only the top hit for each sequence if there are more than one hits, as they are ordered according to best -> worst. From this tabular result, you can get the chromosomal location of each CDS.

  2. Getting the name of gene and location of exons: you can paste multiple co-ordinates in query field (see step 3 of my answer and click define regions in the table browser). Alternatively, if you don't select any region at all, you can download ALL the gene-table. Then intersect this table with CDS-location table got from step1 using bedtools.

ADD REPLYlink modified 4.0 years ago • written 4.0 years ago by Santosh Anand5.2k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1067 users visited in the last hour