add text to a path by adding text to it with python (FTP Path NCBI)
0
0
Entering edit mode
3 days ago
Debut ▴ 10

I'm in a bind please. do you know if i can duplicate the last folder in the path please and add "_genomic.fna.gz" to it for example how to change from this """ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/001/316/945/GCA_001316945.3_ASM131694v3"""" to this : """ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/001/316/945/GCA_001316945.3_ASM131694v3/GCA_001316945.3_ASM131694v3_genomic.fna.gz""""" Thanks

python ncbi pandas • 139 views
0
Entering edit mode

What have you tried? Please explain your problem to us in as much detail as you can.

0
Entering edit mode

I have loaded this file: "https://ftp.ncbi.nlm.nih.gov/genomes/refseq/bacteria/assembly_summary.txt" in a dataframe with pandas. I would like to download for example from FTP Path that there is in this table in ".fna.gz" or ".faa.gz" format. for example to download as .fna.gz it is the example that there is above: from this """ ftp://ftp.ncbi.nlm.nih.gov/genomes /all/GCA/001/316/945/GCA_001316945.3_ASM131694v3 """" to this: """ ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/001/316/945/ GCA_001316945.3_ASM131694v3/GCA_001316945.3_ASM131694v3_genomic.fna.gz "" with the second link the download is done. So there are two solutions: the first solution is to add after the ftp_path: +"/"(slash)+duplicated the last folder by adding "_genomic.fna.gz or by adding after the ftp_path : +"/"+ assembly accession (in the table) and then adding "_" then asm_name. This is for each line, i.e. for a given line: its ftp_path with its asm_name and its assembly accession.

I don't know if it's possible, I couldn't find any script lines that would allow me to do this. Please

0
Entering edit mode

You've figured out two viable approaches to solving your problem, try one or both of them on a few entries and if they work, go with it. I don't see where you need any help with this - you've got it!

0
Entering edit mode

While I am not addressing your question there are already tools that allow you to selectively download data from NCBI genomes (sounds like that is where you are headed). Save time and use them: How to download all Pseudomonas aeruginosa Genomes from NCBI Genomes database?