Question: How to map NCBI contig positions to chromosome positions
1
gravatar for Jackie
2.1 years ago by
Jackie70
United States
Jackie70 wrote:

I need to make a file which includes all coordinates for human 37.13 (NCBI) ribosomal RNAs. I have downloaded the NCBI 'ref_GRCh37.p13_scaffolds.gff3.gz' file from this link ftp://ftp.ncbi.nlm.nih.gov/genomes/Homo_sapiens/ARCHIVE/ANNOTATION_RELEASE.105/GFF/

This gff3 file does include the coordinates information for all rRNAs. However, the way these coordinates are presented is the contig format, below are a few example lines:

NT_077402.2     RefSeq  region  1       257719  .       +       .       ID=id0;Name=1;Dbxref=taxon:9606;chromosome=1;gbkey=Src;genome=genomic;mol_type=genomic DNA
NT_077402.2     BestRefSeq      gene    1874    4409    .       +       .       ID=gene0;Name=DDX11L1;Dbxref=GeneID:100287102,HGNC:37102;description=DEAD%2FH %28Asp-Glu-Ala-Asp%2FHis%29 box helicase 11 like 1;gbkey=Gene;gene=DDX11L1;part=1%2F1;pseudo=true
NT_077402.2     BestRefSeq      transcript      1874    4409    .       +       .       ID=rna0;Name=NR_046018.2;Parent=gene0;Dbxref=GeneID:100287102,Genbank:NR_046018.2,HGNC:37102;gbkey=misc_RNA;gene=DDX11L1;product=DEAD%2FH %28Asp-Glu-Ala-Asp%2FHis%29 box helicase 11 like 1;transcript_id=NR_046018.2

Instead of using 'chr1', 'chr2' in the 1st col, it uses these contig annotations, and I assume the start/end positions in the 4th/5th cols are also relative to the contigs rather than being the absolute positions on a chromosome. Can someone advise how to convert these contig positions to chromosome positions? Or whether this ref_GRCh37.p13_scaffolds.gff3 file is the right one to use, should there be some similar files with chromosome positions downloadable on NCBI FTP site?

Thanks,

grch37.13 ribosomal contig ncbi • 1.1k views
ADD COMMENTlink modified 13 months ago by tdmurphy160 • written 2.1 years ago by Jackie70
1
gravatar for tdmurphy
13 months ago by
tdmurphy160
tdmurphy160 wrote:

Try using this file instead: ftp://ftp.ncbi.nih.gov/genomes/Homo_sapiens/ARCHIVE/ANNOTATION_RELEASE.105/GFF/ref_GRCh37.p13_top_level.gff3.gz

Or you might want this file, with a more recent re-annotation of the GRCh37 assembly: ftp://ftp.ncbi.nih.gov/genomes/Homo_sapiens/GRCh37.p13_interim_annotation/interim_GRCh37.p13_top_level_2017-01-13.gff3.gz

And use the mappings in the assembly_report file to convert the RefSeq accessions to the format you'd like to use: ftp://ftp.ncbi.nih.gov/genomes/refseq/vertebrate_mammalian/Homo_sapiens/reference/GCF_000001405.37_GRCh38.p11/GCF_000001405.37_GRCh38.p11_assembly_report.txt

ADD COMMENTlink written 13 months ago by tdmurphy160
0
gravatar for madzayasodara
15 months ago by
UCLA
madzayasodara10 wrote:

Hey I'm having the same issue. Did you discover anything about it? thanks

ADD COMMENTlink written 15 months ago by madzayasodara10
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1322 users visited in the last hour