How to extract genome files based on genome ID
1
1
Entering edit mode
3.7 years ago
Bioinfonext ▴ 460

I got the good quality genome IDs for 54000 genomes like below:

#genome
G001281285
G000014725
G000775715
G000254175
G001380675
G900057405
G001076295

and I also got all 74000 genome sequence files compressed in fna folder like below

cd fna/

G001284865.fna.bz2  G002910165.fna.bz2  G009390615.fna.bz2
G001284885.fna.bz2  G002910195.fna.bz2  G009390655.fna.bz2

Now could you please help how I can extract the 54000 genome sequence files based on above genome IDs from fna/ folder?

linux R BASH • 1.9k views
ADD COMMENT
0
Entering edit mode
bunzip2 <genome>.fna.bz2

or are you looking for a 'bash script' to process all files automatically? (if so, this is not clear from your post)

ADD REPLY
1
Entering edit mode

Thanks lieven, I have updated the post, I want to extract 54000 genome files based on genome ID from fna folders which contains 74000 individual genome files in compressed form.

Many thanks

ADD REPLY
0
Entering edit mode

thanks lieven, Yes, it will be great if I can have bash script to uncompress all files automatically.

all compress files is in filtered/ folder and I am thinking to use below loop but sure if it correct?

for i in $(cat filtered/ ); do  bunzip2 "$i".fna.bz2; done

or can I used just

bunzip2  *.fna.bz2

Many thanks

ADD REPLY
1
Entering edit mode

the latter should normally work indeed. (simplest to use this approach)

the bash loop will not work as it is, change it to:

for i in $(ls filtered/*.fna.bz2 ); do  bunzip2 $i; done
ADD REPLY
2
Entering edit mode
3.7 years ago
bas1993 ▴ 60
for i in $(cat list.txt); do mv "$i".fna.bz2 fna/filtered/; done

Where list.txt is your list of high quality genomes and filtered/ is a new directory.

ADD COMMENT
0
Entering edit mode

Thanks a lot, all compressed genome files is in fna/ folder, could it be possible to give path for fna/ folder?

thanks for this help.

Many thanks

ADD REPLY
1
Entering edit mode

You can change the command line above with the full path.

 for i in $(cat list.txt); do mv fna/"$i".fna.bz2 fna/filtered/; done

And if you need to uncompress your genome files also then you can use what Lieven Sterck wrote.

ADD REPLY
0
Entering edit mode

thank you so much, above script work well, after I created the filtered directory within the fna/ folder.

Many thanks

ADD REPLY
0
Entering edit mode

Thank you so much for all help.

Now I need to create a bast database by using all fna files, can I use this script for that?

#!/bin/bash

files=$(find . -name "*.fna")
create="cat $files > all.fna"
eval $create

makeblastdb -dbtype nucl -in all.fna -out genome_db

I am not sure should I use this code line in this script or not?

eval $create

Many thanks

ADD REPLY
0
Entering edit mode

in the script that you showed above I think you need the line with eval.

If you use the command line below you can see what "eval" does:

help eval

But for creating a blast database with all the fna files you don't really need a script as you could also just type out the two lines that you need (the ones with cat and makeblastdb).

ADD REPLY
0
Entering edit mode

Ok thanks, I am thinking to make single file using below cat command then makeblastdb command to make the database.

cat *.fna > all.fna

makeblastdb -dbtype nucl -in all.fna   -parse_seqids   -out genome_db

many thanks

ADD REPLY

Login before adding your answer.

Traffic: 1398 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6