I am trying to download "Free PMC Articles"(preferably in PDF format) through FTP service(whose PMCID is collected through Entrez API).
I am aware that PMC files are available under,
-> PMC Open Access Subset
-> Author Manuscript Collection
-> Historical OCR Collection that can be accessed through FTP services.
What I observed(attaching the query along) while experimenting the things are,
1.) We can use "OA Web Service" to discover downloadable resources from the PMC Open Access Subset (in form of FTP Links). So based on the file either .tar.gz/pdf/xml files are available for download.
(The question is whether I am going in a correct path and do I need to download tar.gz file all time because PDF format is not directly available for all documents)
2.) If the file is not in Open Access Subset, I am able to file those files(using PMCID as reference) in Author Manuscript Collection, which is through (https://ftp.ncbi.nlm.nih.gov/pub/pmc/manuscript/).
(The question is while accessing files through Author Manuscript Collection, FILES ARE AVAILABLE ONLY IN XML/TXT FORMAT, my requirement is to get the PDF files. Is there anyway to get the files as PDF)
3.) I found one interesting document whose PMC ID can't be located either in PMC Open Access Subset/Author Manuscript Collection.(https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5743844/)
(May I know in which collection I can find this document)(And will all the FREE PMC ARTICLE be available for download through FTP Service and is it the right way I am travelling through?)
NOTE : I know we can retrieve the documents through Scraping, but NCBI is not encouraging that.
Any of your Support will be highly appreciated...