Question: Categorize sequences with biobloom tool
0
gravatar for Shelle
9 weeks ago by
Shelle0
Shelle0 wrote:

I am trying to use biobloom tool to categorize sequences of sample that I have. I have to use one command like below:

./biobloomcategorizer -e –p /output/prefix –f "filter1.bf filter2.bf filter3.bf" inputReads1_1.fq inputreads1_2.fq

Since I have so many files about 19000, I have to use bash scripting. The command I am using is like one-liner below. The fastq files and all .bf files are in a same directory but when I am writing the script in this way, biobloomcategorizer is not working at all while there is no issue with the tool itself as I tried the command above for only a few files. Can anyone tell me how should I modify the script below to make the tool work for so many files that I have?

for i in *.bf; do biobloomcategorizer -e –p /output/prefix –f  echo \"$i\"  filename_1.fastq  filename_2.fastq; done
ADD COMMENTlink modified 8 weeks ago by Biostar ♦♦ 20 • written 9 weeks ago by Shelle0

biobloomcategorizer is not working at all

Clarify in what way the tool is not working at all. Way you have your loop only one of the *.bf file is going to be used each time. That is not what you have in your first example.

ADD REPLYlink modified 9 weeks ago • written 9 weeks ago by genomax58k

It is giving me this error:

Usage of paired end mode:
BioBloomCategorizer [OPTION]... -f "[FILTER1]..." [FILEPAIR1] [FILEPAIR2]
or BioBloomCategorizer [OPTION]... -f "[FILTER1]..." [PAIREDBAMSAM]

which i know biobloomcategorizer is working as i have tried not to write a for loop and just copying and pasting some of .bf files in a simple format of tool command and it doesn't give me the error above. The one-liner command i am trying to use looks fine but don't know why it gives me the error above.

ADD REPLYlink written 9 weeks ago by Shelle0

How many *.bf files are there? What does the 19000 number refer to? How many fastq files do you have?

ADD REPLYlink written 9 weeks ago by genomax58k

I have tried to do it like below as well but the error is different and says "Argument is too long!" for the line starting with "biobloomcategorizer". I have only two files with fastq extension(_1.fastq _2.fastq) which is in a paired mode. And the number of .bf files is 19000.

#! bin/bash
Array=(*.bf)
biobloomcategorizer -e –p /output/prefix –f  echo \"${Array[*]}\"  filename_1.fastq  filename_2.fastq
ADD REPLYlink written 9 weeks ago by Shelle0

I have not used this specific program but your original loop should work with one bloom filter file at one time. Are you supposed to use the program in this way? One filter at a time? Since you have 19000 of these files.

ADD REPLYlink modified 9 weeks ago • written 9 weeks ago by genomax58k

I have to use all 19000 of bloom filters at one time. Array format was the only way that came to my mind. I even tried to slice the array so that not to use 19000 of filters but a bunch like 2000 filters and still the error "usage of paired end mode" like mentioned in first response.

Array=(*.bf) biobloomcategorizer -e –p /output/prefix –f echo \"${Array[@]:1:2001}\" filename_1.fastq filename_2.fastq

ADD REPLYlink written 9 weeks ago by Shelle0

Did you make these from 19000 complete RefSeq genomes (one for each) and are trying to use these files to categorize reads in your fastq data? If you need to pass all 19000 files at the same time to the program input then you don't need that loop.

NOTE: You may want to look at kraken (and tools in that category) instead to classify reads.

ADD REPLYlink modified 9 weeks ago • written 9 weeks ago by genomax58k

How about this:

biobloomcategorizer -e –p /output/prefix –f  echo \"`ls -1 *.bf | tr '\n' ' '`\"  filename_1.fastq  filename_2.fastq

You may still run into line too long type error because of all file names you will have in one line.

ADD REPLYlink modified 9 weeks ago • written 9 weeks ago by genomax58k

same error as in my first response:

Usage of paired end mode:
BioBloomCategorizer [OPTION]... -f "[FILTER1]..." [FILEPAIR1] [FILEPAIR2]
or BioBloomCategorizer [OPTION]... -f "[FILTER1]..." [PAIREDBAMSAM]
ADD REPLYlink written 9 weeks ago by Shelle0

I am going to refer you back to my comment above: C: Categorize sequences with biobloom tool

ADD REPLYlink written 9 weeks ago by genomax58k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1994 users visited in the last hour