How to move multiple files to multiple folders in linux/mac?
2
0
Entering edit mode
14 months ago

Hello, I have a folder containing *.fastq.gz files (R1 and R2) for many samples. For example, in the folder "Raw_WGS", I have several files like

Sample_1_R1.fastq.gz, Sample_1_R2.fastq.gz, Sample_2_R1.fastq.gz, Sample_2_R2.fastq.gz, Sample_3_R1.fastq.gz, Sample_3_R2.fastq.gz

I have another folder e.g. "Analysis" where I have different sub-folders according to the name of my sample sequences. I have sub-folders named like

Sample_1, Sample_2, Sample_3

I have a Text file containing all the names of my samples i.e. Sample_1, Sample_2, Sample_3. Now, I want to move the *_R1.fastq.gz and *_R2.fastq.gz files for each sample to their respective sub-folder according to the name in the "Analysis" folder.

Can you please tell me how can I do that for all the samples at once? I can use the "mv" command to move each file at a time. But I have 1000s of files. So, I want to move them all by running a single script. Please let me know if you have any suggestions.

SNP genome next-gen Assembly sequence • 605 views
2
Entering edit mode
14 months ago

see if this works:

$for i in$(ls *R1.fastq.gz); do echo mv $i${i/R1/R2} analysis/${i/_R1.fastq.gz/}/;done  Remove echo if you are okay with dummy run. With parallel try this: $ parallel --plus --dry-run mv {} {=s/R1/R2/=} analysis/{=s/_R1.fastq.gz//=}/ ::: *R1.fastq.gz


Remove dry-run once you are okay with dummy run.

0
Entering edit mode

Oh great. I will run it. See if it works. Thanks a lot.

2
Entering edit mode
14 months ago

Snakemake solution

samples = [line.rstrip() for line in open('samples.txt')]

rule all:
input: analysis_target

rule moveFiles:
shell:
"""
mkdir -p analysis/{wildcards.sample}
mv {input} {output}
"""

0
Entering edit mode

Thank you very much. That is awesome!

0
Entering edit mode

Since snakemake automatically creates directories, mkdir -p analysis/{wildcards.sample} is not necessary. For that matter, even analysis folder it self not necessary to exist. Snakemake automatically crates analysis folder and subsequent folders, from the script.

Format of the samples.txt is:

sample_1
sample_2
sample_3


working script I used:

samples = [line.rstrip() for line in open('samples.txt')]

rule all:
input: analysis_target

rule moveFiles:
shell:  """
mv {input} {output}
"""


Instead of relying in samples.txt, we can use snakemake glob_wildcards function and following is working script (please test on test data before executing the script):

(samples,reads) = glob_wildcards("{sample}_{read}.fastq.gz")
samples=sorted(set(samples))
exts="fastq.gz"
rule all:
input: analysis_target

rule moveFiles:
shell:  """
mv {input} {output}
"""

0
Entering edit mode

That is great. I have not tried snakemake yet. But it seems very useful for reproducible research. Thank you.