How to retrieve nearest genes in Ciona genome?
2
1
Entering edit mode
9.8 years ago
zhenyisong ▴ 160

I have a batch of genome positions of conserved DNA regions from Ciona genome. I want to know if there exists a program to pin down the nearest gene. Thanks.

The genome position information is from VISTA-Point and looks like the following which is annotated Conserved Non-coding Sequence (CNS).

>C. intestinalis v.2.0 chr_01p:14451-14569 (+)
AAGTTTTCAAAGTGGTGAAGAAATGAAGCACCCTCGTATTTATTAAGTTTAGCACAAGTC
TGTTCCACACAACTTGTCATAGGAGGTGGCCCGCAACTGAAGACGCCAAACTTTTCAAC
>Ciona savignyi Sep. 2005 reftig_173:198261-198379 (-)
AAGTTTTCAAAATGATGCAGGAAGCTTGGACCTTCATACTTGTTGAGTTTGCTGCATGTT
TTGTCCACCTCATTGGTCATTGGAGGAGGACCACAACTGAAGACTCCGAACTTGTCAAC
=  length = 119bp, identity = 71.4%, type = exon
genome nearest-gene ciona • 2.5k views
ADD COMMENT
2
Entering edit mode
9.8 years ago

If your regions and genes are in sorted BED format, or if they can be converted and sorted, you can use BEDOPS closest-features to find the nearest upstream and downstream genes to each region. BEDOPS also includes conversion and sorting utilities, if that functionality is needed to prepare your data.

ADD COMMENT
0
Entering edit mode

BEDOPS is a beautiful program. It seems that I have to prepare all required BED (or starch format) files for genes and target DNA region first and then I can use closest-features to locate the nearest gene. But I want to know who else already done this for us to parse Ciona genome? So I can use them immediately in one command line. Anyway, thanks for the advice.

ADD REPLY
1
Entering edit mode
9.8 years ago
Hugues ▴ 250

The gene list for Ciona intestinalis is available at http://www.ncbi.nlm.nih.gov/gene/?term=ciona%5BOrganism%5D

Click Sent To: > File > Format: > Tabular(text) to save it on the disk (gene_result.txt).

This file is not in BED format so you'll need to convert it first.

I wrote a python script to help you, copy-paste and save it as genelist_to_bed.py

$ python genelist_to_bed.py

Usage: genelist_to_bed.py input_file.txt output_file.bed

Here is the code:

import csv
import os
import sys

if len(sys.argv) < 3:
    sys.exit('Usage: %s input_file.txt output_file.bed' % sys.argv[0])

if not os.path.exists(sys.argv[1]):
    sys.exit('ERROR: Gene list %s was not found!' % sys.argv[1])

fieldnames = ('chromosome',
              'start_position_on_the_genomic_accession',
              'end_position_on_the_genomic_accession',
              'GeneID',)

fi = open(sys.argv[1], 'rb')
fo = open(sys.argv[2], 'wt')

try:
    reader = csv.DictReader(fi, delimiter='\t')
    writer = csv.DictWriter(fo, fieldnames=fieldnames, delimiter='\t')
    headers = dict( (n,n) for n in fieldnames )
    fo.write('# ')    
    writer.writeheader()
    for row in reader:
        writer.writerow({ headers[fieldnames[0]]:"chr"+row[fieldnames[0]],
                          headers[fieldnames[1]]:row[fieldnames[1]],
                          headers[fieldnames[2]]:row[fieldnames[2]],
                          headers[fieldnames[3]]:row[fieldnames[3]],
                          })
finally:
    fi.close()
    fo.close()

Then you can use the tool suggested by Alex Reynolds.

Hope it helps!

ADD COMMENT
0
Entering edit mode

They are different gene assembles, NCBI vs. VISTA data (my raw data from VISTA which contains the genome fragment position information). I once thought it was a small task, but then find it is a project. I have to re-map VISTA coordinates (I don not know which assemble they use) to NCBI. You script let me know what the minimum information the BED file is. Thanks.

ADD REPLY
1
Entering edit mode

Can you edit your question so that it contains the necessary details?

Also, where does the genome positions of conserved DNA regions comes from? Is it C. Intestinalis v.2.0 - Ciona savignyi Sep. 2005 provided in VISTA-point?

FYI the minimum information for the BED file is described in the BEDOPS help.

ADD REPLY
1
Entering edit mode

Your are right. The position information is from VISTA-Point. I can handle it now. The step is to use BLAST to re-map the CNSs to the NCBI Ciona genome assemble as I have no other idea of how to translate VISTA-Point coordinates to NCBI's. Anyway, Thank you and Alex.

ADD REPLY
0
Entering edit mode

You could create a new question: how to map VISTA coordinates to NCBI (ciona)?

and cross-ref it here. I don't know how to tackle that though, but I'd be interested to learn about what you find out.

ADD REPLY

Login before adding your answer.

Traffic: 2392 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6