I am interested in mapping latest Genome GRCh38 to other standard databases such as ENSEMBL, UCSC and RefSeq.
As of now, i have GRCh38 in hand. I could like to know which all the files i need to have with me from the above database to map back to the complete genome with corresponding genes ID & Sequence, Coding ID & Sequence and Proteins ID & Sequence.
1.Mapping : Chromosome Co-ordinates ---> Gene ID, Gene Sequence and Gene Name
2.Mapping : Gene ID, Gene Sequence and Gene Name ---> CDS ID & Seq / Exon ID & Seq / Start & End of CDS
3.Mapping : CDS ID & Seq / Exon ID & Seq / Start & End of CDS ---> Protein ID & Seq
I also want to use dbsnp and COSMIC for identification of variations in Protein Seq / Exon Seq / Gene Seq / Chromosome Co-ordinates.
I have already check information from ENSEMBL and got to know that it can be possible to work on Biomart if am into R or Bioconductor. But i prefer to do the same manually and program it locally to get the mapping data mention above.
Is there any level of information like GTF file where i can draw the whole mapping information. I will be grateful if there is any possibility of interlink or co-relation among the 3 Database (ENSEMBL,UCSC,NCBI) which will help me to map gene cds and protein in any of the 3 DB.
More detail suggestion will be appreciated and Thanks in advance for your response.