I am new to text mining studies carried out in bioinformatics. I was learning how to use pubmed.mineR package for text mining using mulit-abstract txt file. I read a lot on RISmed package and easypubmed R packages but it seems they have some disadvantage when you have to use their output in pubmed.mineR.
pubmed.mineR uses a text file containing multiple abstracts in "abstract" format of Pubmed (one format abovet many other pubmed formats such as xml). easyPubmed have such function that can be exploited while RISmed doesn't it seems. On the other hand you can retrieve only 5000 abstracts at a time using easyPubmed while RISmed has no such upper limit.
I randomly chose a cancer type "oral cancer" and it had above 1 lakh PMIDs. RISmed successfully retrieved the abstracts(1.1 GB data) however it's output is incompatible for pubmed.mineR. On the other hand, though easypubmed had a compatible output, it has retrieval limit since it use PubMed API at the backend.
Is there a way using CLI to retrieve all ~1 lakh of abstracts in "abstract" format from pubmed since the website itself after June 2020 update has set a limit to 10,000 abstracts at a time. Here I attach a short code I used with easypubmed
library("easyPubMed") search_topic <- 'oral cancer' my_entrez_id <- get_pubmed_ids(search_topic) my_entrez_id$Count ?fetch_pubmed_data my_abstracts_txt <- fetch_pubmed_data(my_entrez_id, retmax = 142000, format = "abstract") writeLines(my_abstracts_txt, con = "oral_cancer.txt")