Rsync and python for ftp.ncbi
Entering edit mode
8.0 years ago

I'm trying to call rsync within a python loop to get files from NCBI.

After reading the man page on filtering rules and looking here:

I don't understand why the code below doesn't work.

ftp_site = ''
ftp = FTP(ftp_site)
dirs = ftp.nlst()

for organism in dirs:
    latest = os.path.join(organism, "latest_assembly_versions")
    for path in ftp.nlst(latest):
        accession = path.split("/")[-1]
        fasta = accession+"_genomic.fna.gz"['rsync',
                         '-f=+ '+accession+'/*',
                         '-f=+ '+fasta,
                         '-f=- *',
                         'scratch/' + organism])

I also tried '--exclude=*[^'+fasta+']' to try to exclude files that don't match fasta instead of -f=- *

For each directory path within latest/*, I want the file that matches fasta exactly. There will always be exactly one file fasta in the directory latest/path.

EDIT: I am testing this with rsync version 3.1.0 and have seen incompatibility issues with earlier versions.

Here is a link to working code that you should be able to paste into a python interpreter to get the results of a "dry run," which won't download anything onto your machine: it gets EVERYTHING under'+latest, which is not what I want. and if I run that script with '-f=- *' uncommented, it doesn't get anything, which seems to contradict the answer here

In my script above, the variable dirs holds a list of all the organisms you will see at and each one of those directories has a subdirectory latest_assembly_versions/, the contents of which I am looping through.

rsync ftp ncbi • 3.9k views
Entering edit mode

I think you are falling in the trap of the XY problem. Please tell us more about what you are trying to do. Even without the details of what you really want to do, I doubt you need to get python involved here, most probably rsync by itself can do it.


Login before adding your answer.

Traffic: 1319 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6