Question: How to extract genome files based on genome ID
1
gravatar for Bioinfonext
5 weeks ago by
Bioinfonext250
Korea
Bioinfonext250 wrote:

I got the good quality genome IDs for 54000 genomes like below:

#genome
G001281285
G000014725
G000775715
G000254175
G001380675
G900057405
G001076295

and I also got all 74000 genome sequence files compressed in fna folder like below

cd fna/

G001284865.fna.bz2  G002910165.fna.bz2  G009390615.fna.bz2
G001284885.fna.bz2  G002910195.fna.bz2  G009390655.fna.bz2

Now could you please help how I can extract the 54000 genome sequence files based on above genome IDs from fna/ folder?

bash linux R • 185 views
ADD COMMENTlink modified 5 weeks ago by bas199340 • written 5 weeks ago by Bioinfonext250
bunzip2 <genome>.fna.bz2

or are you looking for a 'bash script' to process all files automatically? (if so, this is not clear from your post)

ADD REPLYlink modified 5 weeks ago • written 5 weeks ago by lieven.sterck8.5k
1

Thanks lieven, I have updated the post, I want to extract 54000 genome files based on genome ID from fna folders which contains 74000 individual genome files in compressed form.

Many thanks

ADD REPLYlink written 5 weeks ago by Bioinfonext250

thanks lieven, Yes, it will be great if I can have bash script to uncompress all files automatically.

all compress files is in filtered/ folder and I am thinking to use below loop but sure if it correct?

for i in $(cat filtered/ ); do  bunzip2 "$i".fna.bz2; done

or can I used just

bunzip2  *.fna.bz2

Many thanks

ADD REPLYlink modified 5 weeks ago • written 5 weeks ago by Bioinfonext250
1

the latter should normally work indeed. (simplest to use this approach)

the bash loop will not work as it is, change it to:

for i in $(ls filtered/*.fna.bz2 ); do  bunzip2 $i; done
ADD REPLYlink written 5 weeks ago by lieven.sterck8.5k
2
gravatar for bas1993
5 weeks ago by
bas199340
Netherlands
bas199340 wrote:
for i in $(cat list.txt); do mv "$i".fna.bz2 fna/filtered/; done

Where list.txt is your list of high quality genomes and filtered/ is a new directory.

ADD COMMENTlink modified 5 weeks ago by lieven.sterck8.5k • written 5 weeks ago by bas199340

Thanks a lot, all compressed genome files is in fna/ folder, could it be possible to give path for fna/ folder?

thanks for this help.

Many thanks

ADD REPLYlink written 5 weeks ago by Bioinfonext250
1

You can change the command line above with the full path.

 for i in $(cat list.txt); do mv fna/"$i".fna.bz2 fna/filtered/; done

And if you need to uncompress your genome files also then you can use what Lieven Sterck wrote.

ADD REPLYlink written 5 weeks ago by bas199340

thank you so much, above script work well, after I created the filtered directory within the fna/ folder.

Many thanks

ADD REPLYlink modified 5 weeks ago • written 5 weeks ago by Bioinfonext250

Thank you so much for all help.

Now I need to create a bast database by using all fna files, can I use this script for that?

#!/bin/bash

files=$(find . -name "*.fna")
create="cat $files > all.fna"
eval $create

makeblastdb -dbtype nucl -in all.fna -out genome_db

I am not sure should I use this code line in this script or not?

eval $create

Many thanks

ADD REPLYlink written 5 weeks ago by Bioinfonext250

in the script that you showed above I think you need the line with eval.

If you use the command line below you can see what "eval" does:

help eval

But for creating a blast database with all the fna files you don't really need a script as you could also just type out the two lines that you need (the ones with cat and makeblastdb).

ADD REPLYlink modified 27 days ago • written 27 days ago by bas199340

Ok thanks, I am thinking to make single file using below cat command then makeblastdb command to make the database.

cat *.fna > all.fna

makeblastdb -dbtype nucl -in all.fna   -parse_seqids   -out genome_db

many thanks

ADD REPLYlink modified 27 days ago • written 27 days ago by Bioinfonext250
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1122 users visited in the last hour