How will i map exons (FASTA Format) onto human genome to get the genomic coordinates?
2
1
Entering edit mode
10.0 years ago

Hi, I am trying to make multiple exon sanger sequenced gene submission to NCBI and would like to get the exact genomic coordinates of my exons sequence and the CDS. Is there a tool out there that does that? I have thousands of samples, only a few rows are shown below. Thanks.

>SeqX [organism=Homo sapiens] [isolate=ABC] Stromal Antigen 2 (STAG2) gene, Exon3, Exon4, Exon5, Exon6, Exon7
TCCTTTCCGAATATTTTTGGTGCATTTGTAATAAATGTCATTTNTCTCCTTTTTAAAGGAATTGTCTTAGAAGAAAGAAGGCAAGCCACCATTTTACCCACGTAAATATATGAATATATTTCTGACATTGAGGTGTTCCAGAAGATGATAAAGAAATGATAGCAGCTCCAGAAATACCAACTGATTTTAATCTACTACAGTAAGTAAATTATATTCTGATAATTTTTAAATACTTGTTTATTCCACAAAATGGGGAATGCATTAACTTCAGTTAAATTTCCTTCTGCTCGAGAAGATCTAATATATAAAATAGCTTTTATGCTTTGCAAGAGTTTATATCA
>?unk100
GTTTTGGGGAACATCTTAATTACTTATAATGCTAATATGAAGTTTTGTAATGAGTTAACCAAGCCTTTCTTTTAGAAAATATGGCAAAAATTAGAAACTCAATATAAATTTCTAAGGAAGGGTTTTAATTCTTATCTTTCTGTCACAGGGAGTCAGAAACACATTTTTCTTCTGACACAGATTTTGAAGATATCGAAGGAAAAAACCAAAAGCAAGGCAAAGGCAAAGTATGTATCAAATATTTGACTTTATTTTGTTTCCTAAGATCTCACACACACACAGATTTAAGTTATGTCTCAGATAGTTTTATCTTTTAAAAATGGCTTTTTAAGGGGGTGGGAGCTGATTGGTATGGTA
>?unk100
AAGTGGATGGAATTCTTTAGGGCAAGTTTAAGCATGTTATGTACCCTATCAGCTACTTCTACTGTAGCTGTGTTTTGAACTCTCAAGGATAGTGATATAACTTAACCACCTCGTATTTTTTATGCAGACTTGTAAAAAAGGCAAAAAGGGCCCAGCAGAAAAGGGCAAAGGTGGAAATGGAGGAGGAAAACCTCCTTCTGGTCCAAACCGAATGAATGGTCATCACCAACAGAATGGAGTGGAAAACATGATGTTGTTTGAAGTTGTTAAAATGGGCAAGAGTGCTATGCAGGTAAGATTTATGTTGTTCTTCCCAGTTCATTTGTACATTTTAAACTTTAATGAGTTATATAGAGTGTAGCTCTG
>?unk100
AAGTGACTATTTGAGAGCTGCTGATTTCAAAATAAATATATCTTACCTTTACAGCCTGAACACTGAATAAAAAAGTTGATAAGGTCAAGAAGTGCTATATCTCGGTCATGCTTGTATGATTCTATCCAATCATCTACCACCGACTACAGCAGAGGGAAAAAAATAAAATCATTAGCTTCTTCTAATTTTCTCAAAATCAATTAAGTCTGATAAAGTCATAAAATTCAAGATTATATAGTATCACATTACTTTAATATAAATACTTATACACTGAAATTTAAAGTTCAATTTTAACAATAATAAAATAGAATCGAATTCAGTAAAACAATTATCTGATAACACAAAATGACCTATCAATCTTCTATTTATTTTGCATTGAAAAGAATGTG
>?unk100
TAAGTTATCAAAACACTTAAGGTAGTAAGTTACCTCATCGAATTCTTCAGTCATTTTTCGAATTATCTCAGAGTTCTGCATATGTCTAAACATTTCTGCTGTGACAACTCCTGAAATTTGCAAATGTCAGAAGTTAATATATGGTGTGATAAAAAAATAAAGAAAACTTCCAAGTAAGTCTCTAACACTAAGAAGTCTATGGTCACACAATAAAAGGCATACTTCTTCAACCATCATCTAATAATCTTTACCATGATACTCTAATCTATAAATAAAGCACAAACAAATGCTATCTATTCTCAGTATGCACAAGAAAACAGCCCCATACTTCTGACAGATATCTTTTTTCCTAACACAATTAACTTTGGCCATTTCT
sanger exons genome map sequencing • 3.3k views
ADD COMMENT
3
Entering edit mode
10.0 years ago

You could use a tool like BLAT to query your sequences against the human genome. This spits out a PSL file you can convert to BED with psl2bed. Once in BED format you can query against gene annotations with bedops or bedmap, etc.

ADD COMMENT
0
Entering edit mode

@Alex Reynolds ..thanks. Since i have 3000 sequences can you please elaborate on the commandline syntax to connect to UCSC BLAT server and execute the BLAT part to generate the psl file.

ADD REPLY
1
Entering edit mode

You can build and install BLAT locally, so that you don't go through their web server. BLAT is part of the Jim Kent tools, and you'll need the 2bit files for your assembly-of-interest. For hg19, at least, UCSC has a prebuilt 2bit file. At minimum, you'd then run something like:

$ blat hg19.2bit yourQuerySeqs.fa yourSearchResult.psl

There are other options depending on how much stringency you need, or if you want to mask regions, etc.

To convert to BED:

$ psl2bed < yourSearchResult.psl > yourSearchResult.bed
ADD REPLY
0
Entering edit mode

Thanks... i have gotten as far as generating the .bed file from the psl. But i dont see how the bedmap or bedops can help annotate my exon sequences with exon number and start and end of the exon on the original gene (STAG2) and whether its a CDS and if it is a CDS what are the coordinates of that CDS in an exon.

ADD REPLY
0
Entering edit mode

Basically, you need a BED file containing exon and CDS information. Then you can do set operations on those annotations, i.e. map your results to exons or CDSs. I have an answer to another question (Locating SNP's to genes) which suggests how to get GENCODE annotations; perhaps that might help get you started with your analysis. Good luck!

ADD REPLY
1
Entering edit mode
10.0 years ago

you can use gmap with -f 2 options to outputs a gff

ADD COMMENT

Login before adding your answer.

Traffic: 1540 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6