I am trying to load pubmed locally. I downloaded all of the pubmed XML files provided by NCBI to create a local copy of pubmed. I searched if anyone has done this before and I found a good paper "Tools for loading MEDLINE into a local relational database" and many other sources that talk about this. Another source I would like to mention is http://biostar.stackexchange.com/questions/10049/how-do-people-go-about-pubmed-text-mining.
I have parsed the XML files into flat files. I decided to try loading a sample data into mysql, try some queries and look how it works.
Here is what I am looking for in the local copy of pubmed:
-I have a dictionary of terms that I want to search in pubmed and get the abstracts.
In order to achieve my goal I am trying to load the data into mysql and use http://dev.mysql.com/doc/refman/5.1/en/fulltext-search.html to query database. I think full-text indexing is a good option as I will be looking through the text in abstract column. I think, this will make my task easy.
Now my concern is:
--Is it the right approach, I am following?
--I read some reviews about full-text indexing and the required computational time. I am afraid how it will work for 18 million pubmed records.
--Is there any better way than this or I am on the right track?
--Can I also include other parser that will help me for natural language processing post-querying the database for abstracts? (I know I can, but is it a good idea to include it now or later?)
--If full-text indexing is a good idea, when shall I do it? while I populate the database or after I populate the database?
--Major concern is how do I query using dictionary of terms? I have two separate dictionaries and I want to use them both with, maybe, Boolean operator. I tried using eutils with perl to get the abstracts, but as the list of terms in dictionary is in thousands it takes much time computationally to get the abstracts. Using perl and eutils I know how to query the database, but how can I do it once I get a local copy of the database? Can I do it using perl?
I hope, I have put my question in more clear sense! Just let me know if I haven't and I will try to improve on it. Any help is greatly appreciated. Thank you.