Question: How to obtain the chromosome out of an accession number?
0
gravatar for eidriangm
17 months ago by
eidriangm0
eidriangm0 wrote:

Hello Community.

My problem is the following, I have some bed files whose genomic regions are annotated using the chromosome (chr__ start end ... ...), and I want to use the ncbi gff3 to extract the info but this file is annotated using accession.version numbers. Bedtools oblige me to use the same location nomencaluture thus I need to transform the accession to chr base.

So far I know that the number of the "NC_" prefixed accessions id specify the chromosme, (i.e: NC_000001.11: chr1, NC_000002.12: chr2, ..., NC_000023.11: chrX, NC_000024.10:chrY, NC_012920.1: chrM ). Nevertheless, how can I know which is the chromosome of the accessions prefixed with NW_ or NT_?

Some "NT_ , NW_" are alternative assemblies of NC_ and the info contained is "the same" being placed lines below that NC_, but some others do not and contains genes of interest which I could be loosing when using bedtools i.e https://www.ncbi.nlm.nih.gov/gene/3806. Some do not have a known location but that gene is known to be in the chromosome 19 and I can not deduce it from its accession number.

Is there a way of getting the chromosome from the accession number? Or shall I extract the info from another annotation file?

Thanks

ADD COMMENTlink modified 3 months ago by Solowars50 • written 17 months ago by eidriangm0

Have you tried potential way(s) of linking chromosomes to accession number mentioned in this post: How to get the chromosome numbers from RefSeq accession IDs ?

ADD REPLYlink modified 17 months ago • written 17 months ago by Sej Modha4.5k

I saw it but all the links provided there are not working and the answer with awk + sed only applies with NC_ (already under control). Thanks anyway

ADD REPLYlink written 17 months ago by eidriangm0

you may want to give some example data and expected output.

ADD REPLYlink written 17 months ago by cpad011212k

Well that is already given in the the question, with the Entrez ID gene 3806, which is annotated in the accession NT_113949 and I want to obtain the chromosome which is number 19. I could look for more examples but the idea is basically that, from an accession number prefixed with NT_ NW_ obtain its chromose if it is known.

ADD REPLYlink modified 17 months ago • written 17 months ago by eidriangm0

http://gtamazian.blogspot.com/2013/08/converting-chromosome-accession-numbers.html

ADD REPLYlink modified 3 months ago • written 3 months ago by srijan.verma440

ftp://ftp.ncbi.nlm.nih.gov/genomes/H_sapiens/Assembled_chromosomes/

ADD REPLYlink modified 3 months ago • written 3 months ago by srijan.verma440
0
gravatar for srijan.verma44
3 months ago by
srijan.verma440 wrote:

You could find the chromosomes of the alternative accession numbers (NT_... / NW_...) in this directory.
Download the files with the name :
1. alts_accessions_GRCh38.p12
2. chr_NC_gi
3. chr_accessions_GRCh38.p12
4. unplaced_accessions_GRCh38.p12
5. unlocalized_accessions_GRCh38.p12

Once you download them, you might be prompted to enter some 'Keychain Access' password. The workaround which I found for this is that to convert the downloaded file to a '.txt' format and you'll be able to view whats inside the file.

An extract from the file is given below :

Chromosome RefSeq Accession.version

1 NW_012132914.1
1 NW_015495298.1
9 NW_009646201.1
10 NW_011332692.1
11 NW_015148966.1
Reference : This article.

ADD COMMENTlink modified 3 months ago • written 3 months ago by srijan.verma440
0
gravatar for Solowars
3 months ago by
Solowars50
Brazil/Porto Alegre/UFRGS
Solowars50 wrote:

Perhaps you could do it in R, using rentrez package. Take a look here.

I'm doing something kinda similar, and it is possible to input those identifiers and ask for a summary (using entrez_summary function). In that summary should appear chromosome number/name.

Let me know if you need some more help.

Cheers,

ADD COMMENTlink written 3 months ago by Solowars50
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1882 users visited in the last hour