Journey from gene id to gene sequence
2
0
Entering edit mode
14 months ago
Shweta • 0

Can you tell me how to download gene sequences with 2500 gene ids?

NCBI Gene-id • 656 views
ADD COMMENT
5
Entering edit mode

Don't take this the wrong way but you have been posting single line questions for some time. This shows that you are not putting in enough effort/thought in the question at hand.

Based on tags you added it appears that you are interested in getting this information from NCBI but simply saying "gene id" does not tell us what ID's you are working with. Unless that information is included it is difficult for people to provide answers. Please edit the original question and add some examples. Tell us what you have tried to do so far.

ADD REPLY
0
Entering edit mode
14 months ago
Dave Carlson ★ 1.7k

If you have Ensembl IDs, you could consider using gget:

https://github.com/pachterlab/gget

ADD COMMENT
0
Entering edit mode
14 months ago
MirianT_NCBI ▴ 720

Hi Shweta,
If you are referring to NCBI Gene IDs, you can use NCBI Datasets for that task. To download only gene sequences, you can use the following command:

datasets download gene gene-id --inputfile mylist.txt --include gene

This command will download a zip archive (ncbi_dataset.zip) with the gene sequences of the gene-ids in your list (mylist.txt in the example) plus metadata information about the genes as a JSON-Lines file (data_report.jsonl). Unzipping into a new folder will produce this result:

unzip ncbi_dataset.zip -d mygenes
Archive:  ncbi_dataset.zip
  inflating: mygenes/README.md       
  inflating: mygenes/ncbi_dataset/data/gene.fna  
  inflating: mygenes/ncbi_dataset/data/data_report.jsonl  
  inflating: mygenes/ncbi_dataset/data/dataset_catalog.json

One potential issue is that all genes will be in the same FASTA file (gene.fna). If you want each gene as a separate FASTA, you can loop over the list and download each gene-id as it's own data package:

cat mylist.txt | while read GENEID; do
  datasets download gene gene-id "${GENEID}" --include gene --filename "${GENEID}".zip;
done

I hope it helps!

ADD COMMENT

Login before adding your answer.

Traffic: 2877 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6