Hello!
Today I am working on concatenating paired files. In a directory, I have 300 paired files. Here's an example of the file names:
flL1_5495_L1PA6_reactivating_recurring_12_2bit_c.bed.fa.revcom.fa
flL1_5495_L1PA6_reactivating_recurring_12_2bit_plus.bed.fa
flL1_5495_L1PA6_reactivating_recurring_13_2bit_c.bed.fa.revcom.fa
flL1_5495_L1PA6_reactivating_recurring_13_2bit_plus.bed.fa
flL1_5495_L1PA8A_reactivating_recurring_03_2bit_c.bed.fa.revcom.fa
flL1_5495_L1PA8A_reactivating_recurring_03_2bit_plus.bed.fa
flL1_5495_L1PA8A_reactivating_recurring_04_2bit_c.bed.fa.revcom.fa
flL1_5495_L1PA8A_reactivating_recurring_04_2bit_plus.bed.fa
The file names are identical except for a family name (L1PA6 and L1PA8A in this example, but there are a few more), a level (12, 13, 03, and 04 in this example, levels range from 03 to 38 ultimately), and whether they are "plus" or "revcom". There is a matching revcom file for each plus file which I would like to concatenate. I have been working with a nested bash loop like so:
#!/bin/bash
subfamilies=( \
L1HS L1PA2 L1PA3 L1PA4 L1PA5 \
L1PA6 L1PA7 L1PA8 L1PA8A L1PA10 )
recurrence=( \
03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 \
19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 \
35 36 37 38 )
for subfam in ${subfamilies[@]}; do
for level in ${recurrence[@]}; do
cat *${subfamilies[@]}_reactivating_recurring_${recurrence[@]}*.fa \
> ${subfam}${recurrence}.fa
done
done
I am trying to concatenate based on shared family names and levels, but my output is a mess. I get a lot of empty files. I'd like the output name to be simpler, something like "L1PA6_03.fa"
Maybe there's a better way to do this?
Thanks in advance!
Don't hard code variables when you can use regex:
with parallel (in bash):
Thank you, I appreciate the education on variables/regex.
your 'level' from the second loop is not used in your script?
(in stead you still use
${recurrence[@]}
)did not (yet) put much thought in it but shouldn't you need to use
$subfam
and$recurrence
(which should be $level according to your variable names) in stead of ${subfamilies[@]} and${recurrence[@]}
)Moreover, for for instance
flL1_5495_L1PA6
, you only have_12
and_13
, so all other numbers from your recurrence will indeed give empty files.