Storing genomes locally
2
1
Entering edit mode
9.3 years ago
moranr ▴ 290

Hi,

I will be working with the full CDS genome for a lot of species. I want to store some information about the transcript along with other information about the sequences too. What would be the best way to set this database to make working with them efficient. Would SQL be a way to go ?

Thanks

Genomes Databases SQL Python • 2.3k views
ADD COMMENT
1
Entering edit mode
9.3 years ago
Tariq Daouda ▴ 220

Hi,

If you work with Ensembl you might want to have a look at pyGeno. It will store them for you along with all the annotation information from the GTF (Ids, names, biotypes, ....) into an SQL database, and index it also. You might have to create some datawraps yourself but the operation won't take more than a few minutes.

Best

ADD COMMENT
0
Entering edit mode

I'll look into this further thank you.

ADD REPLY
1
Entering edit mode

Here is how you create a new datawrap. It's basically putting all the URLs from witch the files should be downloaded into one file and compressing into a tar.gz archive. Let me know if you encounter any issues.

ADD REPLY
0
Entering edit mode

Great thank you very much

ADD REPLY
1
Entering edit mode
9.3 years ago

If you just want to store information about transcripts and be able to access that easily then an SQL database would be a simple enough solution (at least if you're already familiar with SQL). Note that I wouldn't try to store the genome in one (you could, but you're better off using an indexed file).

ADD COMMENT
0
Entering edit mode

Great advice thank you.

ADD REPLY

Login before adding your answer.

Traffic: 3022 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6