I am trying to use a text mining tool called NlProt, but for to do so i need to convert a set of pubmed abstracts into NlProt format, which is something like:
plain natural language text (each line = one abstract/paper) lines have to start with number followed by ">" and then the text e.g. 0001>abstract1 abstract1 abstract1 ...
Can anyone help me?
PS: This is an example of the pubmed abstract format
1. Biotechnol Prog. 2017 May 27. doi: 10.1002/btpr.2508. [Epub ahead of print] Enhanced expression of cysteine-rich antimicrobial peptide snakin-1 in Escherichia coli using an aggregation-prone protein coexpression system. Kuddus MR(1)(2), Yamano M(1), Rumi F(1), Kikukawa T(1)(3), Demura M(1)(3), Aizawa T(1)(3). Author information: (1)Graduate School of Life Science, Hokkaido University, Sapporo, Hokkaido, 060-0810, Japan. (2)Dept. of Pharmaceutical Chemistry, Faculty of Pharmacy, University of Dhaka, Dhaka, 1000, Bangladesh. (3)Global Station for Soft Matter, Global Inst. for Collaborative Research and Education, Hokkaido University, Sapporo, Japan. Snakin-1 (SN-1) is a cysteine-rich plant antimicrobial peptide and the first purified member of the snakin family. SN-1 shows potent activity against a wide range of microorganisms, and thus has great biotechnological potential as an antimicrobial agent. Here, we produced recombinant SN-1 in Escherichia coli by a previously developed coexpression method using an aggregation-prone partner protein. Our goal was to increase the productivity of SN-1 via the enhanced formation of insoluble inclusion bodies in E. coli cells. The yield of SN-1 by the coexpression method was better than that by direct expression in E. coli cells. After refolding and purification, we obtained several milligrams of functionally active SN-1, the identity of which was verified by MALDI-TOF MS and NMR studies. The purified recombinant SN-1 showed effective antimicrobial activity against test organisms. Our studies indicate that the coexpression method using an aggregation-prone partner protein can serve as a suitable expression system for the efficient production of functionally active SN-1. © 2017 American Institute of Chemical Engineers Biotechnol. Prog., 2017. © 2017 American Institute of Chemical Engineers. DOI: 10.1002/btpr.2508 PMID: 28556600