how to run seqkit in a loop with multiple files
0
0
Entering edit mode
2.2 years ago
Kumar ▴ 170

Hi,

I am trying to use the following command running in a for loop with multiple files. Could anyone help me to write a python or bash script to run the command with multiple files.

seqkit grep -n -f Id.txt S10_scaffolds.fasta -o fetch.fasta

I tried following script but not sure if it is correct.

for file in *.fasta
do 
seqkit grep -n -f  "$file" -o $(basename "$file" .fasta)

done

Thank you,

seqkit scaffolds python • 2.3k views
ADD COMMENT
0
Entering edit mode

Try:

for file in *.fasta
do 
name=$(basename "$file" .fasta)
seqkit grep -n -f Id.txt "$file" -o ${name}_fetch.fasta

done

This will create output files with name appended with rest.

ADD REPLY
0
Entering edit mode

Missed 'id.txt' after -f

ADD REPLY
0
Entering edit mode

I had copied the example OP had. Fixed now.

ADD REPLY
0
Entering edit mode

Sorry for the confusion but I am updating the command.

The .txt are also multiple files compatible to its scaffolds files. I have a set of different 70 .txt files and its .scaffolds file.

seqkit grep -n -f Salm_10_scaffolds_uniq_Id.txt S10_scaffolds.fasta -o fetch.fasta

ADD REPLY
0
Entering edit mode

how do you associate text file with fasta file? Your updated command is incorrect if it is updated and original is also incorrect as it didn't use id.txt in the loop.

ADD REPLY
0
Entering edit mode

The single line command (seqkit grep -n -f Salm_10_scaffolds_uniq_Id.txt S10_scaffolds.fasta -o fetch.fasta) works perfect. I have text file and its associated fasta files as follows:

Salm_10_scaffolds.fasta                Salm_10_scaffolds.txt
Salm_12_scaffolds.fasta                Salm_12_scaffolds.txt
Salm_37_scaffolds.fasta                Salm_37_scaffolds.txt

Each text file contains header lines that are present in fasta files, if I run the seqkit command it greps the sequences from the fasta files. However, I am looking to see if it can be done by a script since I have several text files and its associated fasta files.

I hope I explained my query correctly.

ADD REPLY
1
Entering edit mode

Following should work based on what you described above. I am prepending ${name} to _fetch.fasta so you get. unique file for each name

for file in *.txt
do 
name=$(basename "$file" .txt)
echo seqkit grep -n -f ${name}.txt ${name}.fasta -o ${name}_fetch.fasta

done

Remove the word echo when commands look correct and you want to execute them.

ADD REPLY
0
Entering edit mode

having same file extension for input and output in a loop could be problematic.

ADD REPLY
0
Entering edit mode

I changed the loop to use the .txt files to address this. Hopefully only .txt files in directory are the ones corresponding to fasta.

ADD REPLY
1
Entering edit mode
$ for i in *.fasta; do echo seqkit grep -n -f ${i%%\.fasta}.txt $i -o ${i%%\.fasta}_fetch.fa;done

seqkit grep -n -f Salm_10_scaffolds.txt Salm_10_scaffolds.fasta -o Salm_10_scaffolds_fetch.fa
seqkit grep -n -f Salm_12_scaffolds.txt Salm_12_scaffolds.fasta -o Salm_12_scaffolds_fetch.fa
seqkit grep -n -f Salm_37_scaffolds.txt Salm_37_scaffolds.fasta -o Salm_37_scaffolds_fetch.fa

After checking the output from dry-run, remove echo

Using parallel:

$ parallel --plus --dry-run seqkit grep -n -f {.}.txt {} -o {.}_fetch.fa ::: *.fasta           

seqkit grep -n -f Salm_10_scaffolds.txt Salm_10_scaffolds.fasta -o Salm_10_scaffolds_fetch.fa
seqkit grep -n -f Salm_12_scaffolds.txt Salm_12_scaffolds.fasta -o Salm_12_scaffolds_fetch.fa
seqkit grep -n -f Salm_37_scaffolds.txt Salm_37_scaffolds.fasta -o Salm_37_scaffolds_fetch.fa

Remove --dry-run after checking the output from dry-run

ADD REPLY
0
Entering edit mode

Thank you so much for all the suggestions. These are very helpful.

ADD REPLY

Login before adding your answer.

Traffic: 2656 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6