I need to make a file which includes all coordinates for human 37.13 (NCBI) ribosomal RNAs. I have downloaded the NCBI 'ref_GRCh37.p13_scaffolds.gff3.gz' file from this link ftp://ftp.ncbi.nlm.nih.gov/genomes/Homo_sapiens/ARCHIVE/ANNOTATION_RELEASE.105/GFF/
This gff3 file does include the coordinates information for all rRNAs. However, the way these coordinates are presented is the contig format, below are a few example lines:
NT_077402.2 RefSeq region 1 257719 . + . ID=id0;Name=1;Dbxref=taxon:9606;chromosome=1;gbkey=Src;genome=genomic;mol_type=genomic DNA NT_077402.2 BestRefSeq gene 1874 4409 . + . ID=gene0;Name=DDX11L1;Dbxref=GeneID:100287102,HGNC:37102;description=DEAD%2FH %28Asp-Glu-Ala-Asp%2FHis%29 box helicase 11 like 1;gbkey=Gene;gene=DDX11L1;part=1%2F1;pseudo=true NT_077402.2 BestRefSeq transcript 1874 4409 . + . ID=rna0;Name=NR_046018.2;Parent=gene0;Dbxref=GeneID:100287102,Genbank:NR_046018.2,HGNC:37102;gbkey=misc_RNA;gene=DDX11L1;product=DEAD%2FH %28Asp-Glu-Ala-Asp%2FHis%29 box helicase 11 like 1;transcript_id=NR_046018.2
Instead of using 'chr1', 'chr2' in the 1st col, it uses these contig annotations, and I assume the start/end positions in the 4th/5th cols are also relative to the contigs rather than being the absolute positions on a chromosome. Can someone advise how to convert these contig positions to chromosome positions? Or whether this ref_GRCh37.p13_scaffolds.gff3 file is the right one to use, should there be some similar files with chromosome positions downloadable on NCBI FTP site?