Question: How to keep my data and analysis up to date across new genome releases
0
gravatar for Sheila
4.8 years ago by
Sheila250
Germany
Sheila250 wrote:

What are the best methods to maintain up-to-date biological database?

I used a gnome from NCBI which was a large gz file. Now there is a newer release available in gz format. Every-time there is a change in database I have to download it completely and reanalysis the data. Same is the problem with some other databases. Is there a way to just only get the information which have been updated only?

 

pdb database uniprotkb ncbi • 1.1k views
ADD COMMENTlink modified 4.8 years ago by Elisabeth Gasteiger1.5k • written 4.8 years ago by Sheila250
3
gravatar for t.candelli
4.8 years ago by
t.candelli60
France
t.candelli60 wrote:

It depends on which kind of analysis you do. if your outputs are simple data representation such as .bed or .bedgraph files the liftOver tool from UCSC can help keep everything up to date. it might require more work in generating the chain files if you work with unusual organisms. this method works for almost any form of feature-based annotation (or even .wig files), the only thing you need is to convert it into a .bed or .bedgraph.

A brief word on how liftOver works. the program is designed to convert a set of genomic coordinates (with respective values if present) between different assemblies of the same organism. in order to do this, liftOver requires a special "chain file", a file containing the differencese between the two target assemblies.

a number of pre-computed chainfiles for a number of different organisms, along with liftOver itself, is downloadable here.

 

ADD COMMENTlink modified 4.8 years ago • written 4.8 years ago by t.candelli60

Thanks. I am interested in gz files or any flat file data. Is there any tool/method of just only downloading changes!

ADD REPLYlink written 4.8 years ago by Sheila250

No, it doesn't look like UCSC publishes diffs. It is probably easier for them to publish compressed files. You may need to download the entire file.

ADD REPLYlink written 4.8 years ago by Alex Reynolds27k

UCSC does publish a number of "diffs", or in this context chain files, between a number of different assemblies in different organisms here. i will update the answer to be more detailed.

ADD REPLYlink written 4.8 years ago by t.candelli60
0
gravatar for Elisabeth Gasteiger
4.8 years ago by
Geneva
Elisabeth Gasteiger1.5k wrote:

For UniProtKB, you could explore the advanced query mechanisms by date and use them programmatically (cf http://www.uniprot.org/faq/28 and http://www.uniprot.org/help/query-fields) , e.g.

lwp-mirror "http://www.uniprot.org/uniprot/?query=reviewed:yes+AND+created:[current+TO+current]&format=txt" new_seq.dat

lwp-mirror
"http://www.uniprot.org/uniprot/?query=reviewed:yes+AND+sequence_modified:[current+TO+current]&format=txt" upd_seq.dat

lwp-mirror "http://www.uniprot.org/uniprot/?query=reviewed:yes+AND+modified:[current+TO+current]&format=txt" upd_ann.dat
ADD COMMENTlink modified 4.8 years ago • written 4.8 years ago by Elisabeth Gasteiger1.5k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1185 users visited in the last hour