How I do a loop over multiple files in terminal?
2
1
Entering edit mode
4.5 years ago
zizigolu ★ 4.3k

Hi

I have to run this on 136 .vcf files but manually one by one takes an ages

Can you please help me in any script to do that at the same time in terminal?

[fi1d18@cyan01 snp]$ bcftools query -f '%CHROM\t%POS\t%REF\t%ALT[\t%ID]\n' file.vcf > file.txt
[fi1d18@cyan01 snp]$

Thanks

Linux • 2.0k views
ADD COMMENT
3
Entering edit mode

There are many threads that deal with for loops in shell on Biostars. I will link one to get you going: Bash Script Loop Help

If you have compute resources then you can use parallel in addition. Or ideally submit separate jobs on a compute cluster (if you have access to one).

ADD REPLY
0
Entering edit mode

Thank you we have clutter computing but should not I prepare a txt for submitting each that takes the same time

May be I am wrong though :(

ADD REPLY
1
Entering edit mode

It should take no additional time (well some minutes). I will use the example posted in the answer below to show you how you could submit cluster jobs (I will use SLURM as an example but substitute your own job scheduler as needed).

for i in /your_vcf_folder/*.vcf; do sbatch -p partition -o log.out -e log.err -t time -n num_cores -N 1 --wrap="bcftools query -f '%CHROM\t%POS\t%REF\t%ALT[\t%ID]\n' ${i} > ${i}_output.txt"; done

Looks like the user who had posted the answer chose to take it out. But you could use some variation like so

for i in /your_vcf_folder/*.vcf; do name=$(basename ${i} .vcf); sbatch -p partition -o ${name}_log.out -e ${name}_log.err -t time -n num_cores -N 1 --wrap="bcftools query -f '%CHROM\t%POS\t%REF\t%ALT[\t%ID]\n' ${name}.vcf > ${name}_output.txt"; done

This for loop will submit individual cluster jobs of each of your data files.

ADD REPLY
0
Entering edit mode

Thank you; I provided a .txt file containing

module load bcftools/1.2.1
cd /temp/hgig/fi1d18/TRG45/snp/
for i in /*.vcf; do sbatch -p partition -o log.out -e log.err -t time -n num_cores -N 1 --wrap="bcftools query -f '%CHROM\t%POS\t%REF\t%ALT[\t%ID]\n' $i "$i"_output.txt"; done

Then in terminal I wrote qsub file.txt

After running job error file says

/var/spool/torque/mom_priv/jobs/7568098.blue101.SC: line 3: sbatch: command not found
ADD REPLY
1
Entering edit mode

I had said specifically that it was an example using SLURM. If your cluster uses a different job scheduler this is obviously not going to work.

Since you are using qsub your cluster is using SGE/PBS. You will need to use an appropriate command that works for your job scheduler.

ADD REPLY
1
Entering edit mode

Please edit your post title to make it useful for others. It doesn't currently tell anyone anything about the content.

ADD REPLY
2
Entering edit mode
4.5 years ago
zizigolu ★ 4.3k

Thank you @genomax

Similar to what you suggested, in terminal one should type

for f in *.vcf;  bcftools query -f '%CHROM\t%POS\t%REF\t%ALT[\t%ID]\n' "$f" > ${f%.*}.txt done
ADD COMMENT
4
Entering edit mode
4.5 years ago

gnu parallel is your best friend for things like this if you only need to do a simple task. If you need to do more complex tasks you need to look for a workflow manager like snakemake or nextflow.

Doing this with a progress bar and 8 parallel tasks:

ls *.vcf | parallel --bar -j8 'bcftools query -f "%CHROM\t%POS\t%REF\t%ALT[\t%ID]\n" {} > {.}.txt'
ADD COMMENT
0
Entering edit mode

@WouterDeCoster and @genomax by both of you in the same post I just remembered November 2016 in Germany when I had stuck on a task that the professor asked me and you both helped me until late night :) Now, in November 2019 I am in somewhere else on this globe again seeking help on biostars!

Thank you guys

ADD REPLY

Login before adding your answer.

Traffic: 2807 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6