How will i map exons (FASTA Format) onto human genome to get the genomic coordinates?
7.5 years ago

Hi, I am trying to make multiple exon sanger sequenced gene submission to NCBI and would like to get the exact genomic coordinates of my exons sequence and the CDS. Is there a tool out there that does that? I have thousands of samples, only a few rows are shown below. Thanks.

>SeqX [organism=Homo sapiens] [isolate=ABC] Stromal Antigen 2 (STAG2) gene, Exon3, Exon4, Exon5, Exon6, Exon7
TCCTTTCCGAATATTTTTGGTGCATTTGTAATAAATGTCATTTNTCTCCTTTTTAAAGGAATTGTCTTAGAAGAAAGAAGGCAAGCCACCATTTTACCCACGTAAATATATGAATATATTTCTGACATTGAGGTGTTCCAGAAGATGATAAAGAAATGATAGCAGCTCCAGAAATACCAACTGATTTTAATCTACTACAGTAAGTAAATTATATTCTGATAATTTTTAAATACTTGTTTATTCCACAAAATGGGGAATGCATTAACTTCAGTTAAATTTCCTTCTGCTCGAGAAGATCTAATATATAAAATAGCTTTTATGCTTTGCAAGAGTTTATATCA
>?unk100
GTTTTGGGGAACATCTTAATTACTTATAATGCTAATATGAAGTTTTGTAATGAGTTAACCAAGCCTTTCTTTTAGAAAATATGGCAAAAATTAGAAACTCAATATAAATTTCTAAGGAAGGGTTTTAATTCTTATCTTTCTGTCACAGGGAGTCAGAAACACATTTTTCTTCTGACACAGATTTTGAAGATATCGAAGGAAAAAACCAAAAGCAAGGCAAAGGCAAAGTATGTATCAAATATTTGACTTTATTTTGTTTCCTAAGATCTCACACACACACAGATTTAAGTTATGTCTCAGATAGTTTTATCTTTTAAAAATGGCTTTTTAAGGGGGTGGGAGCTGATTGGTATGGTA
>?unk100
AAGTGGATGGAATTCTTTAGGGCAAGTTTAAGCATGTTATGTACCCTATCAGCTACTTCTACTGTAGCTGTGTTTTGAACTCTCAAGGATAGTGATATAACTTAACCACCTCGTATTTTTTATGCAGACTTGTAAAAAAGGCAAAAAGGGCCCAGCAGAAAAGGGCAAAGGTGGAAATGGAGGAGGAAAACCTCCTTCTGGTCCAAACCGAATGAATGGTCATCACCAACAGAATGGAGTGGAAAACATGATGTTGTTTGAAGTTGTTAAAATGGGCAAGAGTGCTATGCAGGTAAGATTTATGTTGTTCTTCCCAGTTCATTTGTACATTTTAAACTTTAATGAGTTATATAGAGTGTAGCTCTG
>?unk100
AAGTGACTATTTGAGAGCTGCTGATTTCAAAATAAATATATCTTACCTTTACAGCCTGAACACTGAATAAAAAAGTTGATAAGGTCAAGAAGTGCTATATCTCGGTCATGCTTGTATGATTCTATCCAATCATCTACCACCGACTACAGCAGAGGGAAAAAAATAAAATCATTAGCTTCTTCTAATTTTCTCAAAATCAATTAAGTCTGATAAAGTCATAAAATTCAAGATTATATAGTATCACATTACTTTAATATAAATACTTATACACTGAAATTTAAAGTTCAATTTTAACAATAATAAAATAGAATCGAATTCAGTAAAACAATTATCTGATAACACAAAATGACCTATCAATCTTCTATTTATTTTGCATTGAAAAGAATGTG
>?unk100
TAAGTTATCAAAACACTTAAGGTAGTAAGTTACCTCATCGAATTCTTCAGTCATTTTTCGAATTATCTCAGAGTTCTGCATATGTCTAAACATTTCTGCTGTGACAACTCCTGAAATTTGCAAATGTCAGAAGTTAATATATGGTGTGATAAAAAAATAAAGAAAACTTCCAAGTAAGTCTCTAACACTAAGAAGTCTATGGTCACACAATAAAAGGCATACTTCTTCAACCATCATCTAATAATCTTTACCATGATACTCTAATCTATAAATAAAGCACAAACAAATGCTATCTATTCTCAGTATGCACAAGAAAACAGCCCCATACTTCTGACAGATATCTTTTTTCCTAACACAATTAACTTTGGCCATTTCT

sanger exons genome map sequencing
7.5 years ago

You could use a tool like BLAT to query your sequences against the human genome. This spits out a PSL file you can convert to BED with psl2bed. Once in BED format you can query against gene annotations with bedops or bedmap, etc.

@Alex Reynolds ..thanks. Since i have 3000 sequences can you please elaborate on the commandline syntax to connect to UCSC BLAT server and execute the BLAT part to generate the psl file.

You can build and install BLAT locally, so that you don't go through their web server. BLAT is part of the Jim Kent tools, and you'll need the 2bit files for your assembly-of-interest. For hg19, at least, UCSC has a prebuilt 2bit file. At minimum, you'd then run something like:

$blat hg19.2bit yourQuerySeqs.fa yourSearchResult.psl  There are other options depending on how much stringency you need, or if you want to mask regions, etc. To convert to BED: $ psl2bed < yourSearchResult.psl > yourSearchResult.bed
Thanks... i have gotten as far as generating the .bed file from the psl. But i dont see how the bedmap or bedops can help annotate my exon sequences with exon number and start and end of the exon on the original gene (STAG2) and whether its a CDS and if it is a CDS what are the coordinates of that CDS in an exon.

Basically, you need a BED file containing exon and CDS information. Then you can do set operations on those annotations, i.e. map your results to exons or CDSs. I have an answer to another question (Locating SNP's to genes) which suggests how to get GENCODE annotations; perhaps that might help get you started with your analysis. Good luck!

7.5 years ago

you can use gmap with -f 2 options to outputs a gff

