How to move multiple files to multiple folders in linux/mac?
2
0
Entering edit mode
3.1 years ago

Hello, I have a folder containing *.fastq.gz files (R1 and R2) for many samples. For example, in the folder "Raw_WGS", I have several files like

Sample_1_R1.fastq.gz, Sample_1_R2.fastq.gz, Sample_2_R1.fastq.gz, Sample_2_R2.fastq.gz, Sample_3_R1.fastq.gz, Sample_3_R2.fastq.gz

I have another folder e.g. "Analysis" where I have different sub-folders according to the name of my sample sequences. I have sub-folders named like

Sample_1, Sample_2, Sample_3

I have a Text file containing all the names of my samples i.e. Sample_1, Sample_2, Sample_3. Now, I want to move the *_R1.fastq.gz and *_R2.fastq.gz files for each sample to their respective sub-folder according to the name in the "Analysis" folder.

Can you please tell me how can I do that for all the samples at once? I can use the "mv" command to move each file at a time. But I have 1000s of files. So, I want to move them all by running a single script. Please let me know if you have any suggestions.

SNP genome next-gen Assembly sequence • 2.0k views
ADD COMMENT
2
Entering edit mode
3.1 years ago

see if this works:

$ for i in $(ls *R1.fastq.gz); do echo mv $i ${i/R1/R2} analysis/${i/_R1.fastq.gz/}/;done

Remove echo if you are okay with dummy run.

With parallel try this:

$ parallel --plus --dry-run mv {} {=s/R1/R2/=} analysis/{=s/_R1.fastq.gz//=}/ ::: *R1.fastq.gz

Remove dry-run once you are okay with dummy run.

ADD COMMENT
0
Entering edit mode

Oh great. I will run it. See if it works. Thanks a lot.

ADD REPLY
2
Entering edit mode
3.1 years ago

Snakemake solution

samples = [line.rstrip() for line in open('samples.txt')]

analysis_target = expand("analysis/{sample}/{sample}_{read}.fastq.gz",sample=samples,read=['R1','R2'])

rule all:
    input: analysis_target

rule moveFiles:
    input: "{sample}_{read}.fastq.gz"
    output: "analysis/{sample}/{sample}_{read}.fastq.gz"
    shell:
        """
        mkdir -p analysis/{wildcards.sample}
        mv {input} {output}
        """
ADD COMMENT
0
Entering edit mode

Thank you very much. That is awesome!

ADD REPLY
0
Entering edit mode

Since snakemake automatically creates directories, mkdir -p analysis/{wildcards.sample} is not necessary. For that matter, even analysis folder it self not necessary to exist. Snakemake automatically crates analysis folder and subsequent folders, from the script.

Format of the samples.txt is:

sample_1
sample_2
sample_3

working script I used:

samples = [line.rstrip() for line in open('samples.txt')]

analysis_target = expand("analysis/{sample}/{sample}_{read}.fastq.gz",sample=samples,read=['R1','R2'])
rule all:
        input: analysis_target

rule moveFiles:
        input: "{sample}_{read}.fastq.gz"
        output: "analysis/{sample}/{sample}_{read}.fastq.gz"
        shell:  """
                mv {input} {output}
                """

Instead of relying in samples.txt, we can use snakemake glob_wildcards function and following is working script (please test on test data before executing the script):

(samples,reads) = glob_wildcards("{sample}_{read}.fastq.gz")
samples=sorted(set(samples))
reads=sorted(set(reads))
exts="fastq.gz"
analysis_target = expand("analysis/{sample}/{sample}_{read}.{ext}",sample=samples,read=reads,ext=exts)
rule all:
        input: analysis_target

rule moveFiles:
        input: "{sample}_{read}.{ext}"
        output: "analysis/{sample}/{sample}_{read}.{ext}"
        shell:  """
                mv {input} {output}
                """
ADD REPLY
0
Entering edit mode

That is great. I have not tried snakemake yet. But it seems very useful for reproducible research. Thank you.

ADD REPLY

Login before adding your answer.

Traffic: 2339 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6