I'm trying to obtain protein sequence information for proteins associated to a BioProject using the esearch/efetch tools, part of the Entrez E-utility. Somehow, while using efetch, other formats than fasta seem to give me an incomplete number of entries.
> esearch -db bioproject -query 'PRJEB5710' | elink -target protein | efetch -format fasta
This gives the the correct number of sequences for 4951 proteins.
> esearch -db bioproject -query 'PRJEB5710' | elink -target protein | efetch -format gb -mode xml
This gives me only records for 110 proteins. The same holds true for the 'gp' and 'gss' formats.
Have you got any idea what this could be caused by and how to solve it? Thanks in advance!