Question: BBMap for loop issue
0
gravatar for mrsmith
23 days ago by
mrsmith10
mrsmith10 wrote:

This question may be better suited for another forum, in which case I am super sorry! I am new to bioinformatics, but I could really use some help in this moment!!

I have struggled to write this for loop to iterate through two text files, I made both of the. The first text file "new-bins_list.txt" is just a list of all of the bins that need to be be used as the ref file. the second text file I would like to iterate through is "new_names_metalist" is the first part of the file names in "new-bins_list.txt". This file is the identifier of the trimmed reads file that I would like to recruit to. Here is the first portion of each of those files, to help me illustrate what i am describing:

$ head new-bins_list.txt 
SRR4101185.maxbin.008.fasta
SRR1633224.maxbin.021.fasta
SRR1986369.maxbin.004.fasta
SRR1971621.maxbin.012.fasta
SRR2058405.maxbin.006.fasta
SRR1636509.maxbin.009.fasta
SRR1636517.maxbin.016.fasta
SRR4048936.maxbin.001.fasta
SRR4101185.maxbin.041.fasta
SRR1995427.maxbin.002.fasta

$ head new_names_metalist
SRR4101185
SRR1633224
SRR1986369
SRR1971621
SRR2058405
SRR1636509
SRR1636517
SRR4048936
SRR4101185
SRR1995427

This is the for loop that I have tried to create to iterate through each file and coinciding trimmed read file. I recognize that I could use a shell script, but we have several hundred which could become extremely tedious to write a shell script for. The for loop works- the issue is that it just keeps iterating though the exact same bin file, SRR4101185.maxbin.008.fasta and just compares it to the different metagenome read files that are listed in the new_names_metalist. while I am excited to get any thing from this loop, I wish it would actually work! Any suggestions that you could share with me would be greatly appreciated.

for bin in $(cat new-bins_list.txt); do for read in $(cat new_names_metalist); do bbmap.sh ref=/storage/sequence_samples/Crump_lab_CoDL/ALL_MAGs/${bin} nodisk in=/storage/sequence_samples/Crump_lab_CoDL/SRA_trim_out/${read}_1.cleaned.fq.gz  in2=/storage/sequence_samples/Crump_lab_CoDL/SRA_trim_out/${read}_2.cleaned.fq.gz covstats=coverage_stats_SRR-SRR/${bin}.vs.SELF ; done ; done

Please let me know if I need to make any clarifications to this post! Thank you so much for taking the time to read this!

genome • 147 views
ADD COMMENTlink modified 23 days ago by genomax68k • written 23 days ago by mrsmith10
2

I think the loop is working fine. Are you sure it is not? Following should produce all the command lines. You can check and verify then remove echo to run.

for bin in $(cat new-bins_list.txt); do for read in $(cat new_names_metalist); do echo bbmap.sh ref=/storage/sequence_samples/Crump_lab_CoDL/ALL_MAGs/${bin} nodisk in=/storage/sequence_samples/Crump_lab_CoDL/SRA_trim_out/${read}_1.cleaned.fq.gz  in2=/storage/sequence_samples/Crump_lab_CoDL/SRA_trim_out/${read}_2.cleaned.fq.gz covstats=coverage_stats_SRR-SRR/${bin}.vs.SELF ; done ; done

Here is an example of what I get

bbmap.sh ref=/storage/sequence_samples/Crump_lab_CoDL/ALL_MAGs/SRR4101185.maxbin.008.fasta nodisk in=/storage/sequence_samples/Crump_lab_CoDL/SRA_trim_out/SRR4101185_1.cleaned.fq.gz in2=/storage/sequence_samples/Crump_lab_CoDL/SRA_trim_out/SRR4101185_2.cleaned.fq.gz covstats=coverage_stats_SRR-SRR/SRR4101185.maxbin.008.fasta.vs.SELF
bbmap.sh ref=/storage/sequence_samples/Crump_lab_CoDL/ALL_MAGs/SRR4101185.maxbin.008.fasta nodisk in=/storage/sequence_samples/Crump_lab_CoDL/SRA_trim_out/SRR1995427_1.cleaned.fq.gz in2=/storage/sequence_samples/Crump_lab_CoDL/SRA_trim_out/SRR1995427_2.cleaned.fq.gz covstats=coverage_stats_SRR-SRR/SRR4101185.maxbin.008.fasta.vs.SELF
bbmap.sh ref=/storage/sequence_samples/Crump_lab_CoDL/ALL_MAGs/SRR1633224.maxbin.021.fasta nodisk in=/storage/sequence_samples/Crump_lab_CoDL/SRA_trim_out/SRR4101185_1.cleaned.fq.gz in2=/storage/sequence_samples/Crump_lab_CoDL/SRA_trim_out/SRR4101185_2.cleaned.fq.gz covstats=coverage_stats_SRR-SRR/SRR1633224.maxbin.021.fasta.vs.SELF
bbmap.sh ref=/storage/sequence_samples/Crump_lab_CoDL/ALL_MAGs/SRR1986369.maxbin.004.fasta nodisk in=/storage/sequence_samples/Crump_lab_CoDL/SRA_trim_out/SRR1995427_1.cleaned.fq.gz in2=/storage/sequence_samples/Crump_lab_CoDL/SRA_trim_out/SRR1995427_2.cleaned.fq.gz covstats=coverage_stats_SRR-SRR/SRR1986369.maxbin.004.fasta.vs.SELF
bbmap.sh ref=/storage/sequence_samples/Crump_lab_CoDL/ALL_MAGs/SRR1971621.maxbin.012.fasta nodisk in=/storage/sequence_samples/Crump_lab_CoDL/SRA_trim_out/SRR4101185_1.cleaned.fq.gz in2=/storage/sequence_samples/Crump_lab_CoDL/SRA_trim_out/SRR4101185_2.cleaned.fq.gz covstats=coverage_stats_SRR-SRR/SRR1971621.maxbin.012.fasta.vs.SELF
bbmap.sh ref=/storage/sequence_samples/Crump_lab_CoDL/ALL_MAGs/SRR2058405.maxbin.006.fasta nodisk in=/storage/sequence_samples/Crump_lab_CoDL/SRA_trim_out/SRR1636517_1.cleaned.fq.gz in2=/storage/sequence_samples/Crump_lab_CoDL/SRA_trim_out/SRR1636517_2.cleaned.fq.gz covstats=coverage_stats_SRR-SRR/SRR2058405.maxbin.006.fasta.vs.SELF
bbmap.sh ref=/storage/sequence_samples/Crump_lab_CoDL/ALL_MAGs/SRR1636517.maxbin.016.fasta nodisk in=/storage/sequence_samples/Crump_lab_CoDL/SRA_trim_out/SRR4101185_1.cleaned.fq.gz in2=/storage/sequence_samples/Crump_lab_CoDL/SRA_trim_out/SRR4101185_2.cleaned.fq.gz covstats=coverage_stats_SRR-SRR/SRR1636517.maxbin.016.fasta.vs.SELF
bbmap.sh ref=/storage/sequence_samples/Crump_lab_CoDL/ALL_MAGs/SRR4048936.maxbin.001.fasta nodisk in=/storage/sequence_samples/Crump_lab_CoDL/SRA_trim_out/SRR1995427_1.cleaned.fq.gz in2=/storage/sequence_samples/Crump_lab_CoDL/SRA_trim_out/SRR1995427_2.cleaned.fq.gz covstats=coverage_stats_SRR-SRR/SRR4048936.maxbin.001.fasta.vs.SELF
bbmap.sh ref=/storage/sequence_samples/Crump_lab_CoDL/ALL_MAGs/SRR4101185.maxbin.041.fasta nodisk in=/storage/sequence_samples/Crump_lab_CoDL/SRA_trim_out/SRR4101185_1.cleaned.fq.gz in2=/storage/sequence_samples/Crump_lab_CoDL/SRA_trim_out/SRR4101185_2.cleaned.fq.gz covstats=coverage_stats_SRR-SRR/SRR4101185.maxbin.041.fasta.vs.SELF
bbmap.sh ref=/storage/sequence_samples/Crump_lab_CoDL/ALL_MAGs/SRR1995427.maxbin.002.fasta nodisk in=/storage/sequence_samples/Crump_lab_CoDL/SRA_trim_out/SRR1995427_1.cleaned.fq.gz in2=/storage/sequence_samples/Crump_lab_CoDL/SRA_trim_out/SRR1995427_2.cleaned.fq.gz covstats=coverage_stats_SRR-SRR/SRR1995427.maxbin.002.fasta.vs.SELF
ADD REPLYlink modified 23 days ago • written 23 days ago by genomax68k

I am not sure what I am doing incorrectly, the for loop works, I previously though it was "stuck" on the first bin, SRR4101185.maxbin.008.fasta. It turns out, I think it might be iterating each bin through every single ${read} file (about 300 of them), which is not what I want either! It just keeps overwriting the output that is named after ${bin} . Ideally, I would like each bin to be recruited to its coinciding set of reads only once. This could be worked around if I could figure out how to only use the first text file, new-bins_list.txt and only use the first part of the file as the identifier for the ${read} but I am not adept enough with these tools to accomplish this.Dang computers doing exactly what I tell them to!

ADD REPLYlink modified 23 days ago • written 23 days ago by mrsmith10
1

Dang computers doing exactly what I tell them to!

People generally never accept that :-)

Someone may help in the meantime otherwise I will look at this again in a bit. Keep hacking. Use echo until you are sure output looks right.

ADD REPLYlink written 23 days ago by genomax68k

Thanks for that suggestion. That made me realize what the real issue was at least!

ADD REPLYlink written 23 days ago by mrsmith10
1
gravatar for h.mon
23 days ago by
h.mon25k
Brazil
h.mon25k wrote:

This could be worked around if I could figure out how to only use the first text file, new-bins_list.txt and only use the first part of the file as the identifier

Something like

read=$(echo $bin | cut -f1 -d".")

should work - I am at my phone now and can't test.

There are several posts around with similar questions:

How can I remove all text after a character in bash?

How to delete everything in a string after a specific character?

Find everything after first comma in lines and remove it

ADD COMMENTlink written 23 days ago by h.mon25k

That worked perfectly! Thank you both so much! I appreciate the help you do in this community so much!

ADD REPLYlink written 23 days ago by mrsmith10
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1092 users visited in the last hour