Question: changing the name of files
2
gravatar for Sam
12 days ago by
Sam70
Sam70 wrote:

Dear All

I have about 200 of libs with this naming format ALT1_1_clean.fq.gz but I have to change the name format to be recognized by pipeline. could you guide me about this?

Thanks

     "ALT1_1_clean.fq.gz" change to "ALT_1.R1.fq.gz"
    "ALT1_2_clean.fq.gz"  change to " ALT_1.R2.fq.gz"
    "ALT2_1_clean.fq.gz" change to " ALT_2.R1.fq.gz"
    "ALT2_2_clean.fq.gz" change to " ALT_2.R2.fq.gz"
    .
    .
    .
awk bash • 311 views
ADD COMMENTlink modified 11 days ago by shenwei3563.6k • written 12 days ago by Sam70
4
gravatar for Pierre Lindenbaum
12 days ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum107k wrote:
ls *_clean.fq.gz | while read F; do mv "$F" $( echo "${F}" | sed 's/_\([12]\)_clean.fq.gz/.R\1.fq.gz/;s/ALT/ALT_/') ; done
ADD COMMENTlink written 12 days ago by Pierre Lindenbaum107k
4
gravatar for igor
12 days ago by
igor6.0k
United States
igor6.0k wrote:

The easiest and most readable option (in my opinion):

rename ALT ALT_ *.fq.gz
rename _1_clean .R1 *.fq.gz
rename _2_clean .R2 *.fq.gz

Unfortunately, the rename utility may not be available on all systems.

ADD COMMENTlink written 12 days ago by igor6.0k
3
gravatar for Eric Lim
12 days ago by
Eric Lim540
Stoke Therapeutics, Inc
Eric Lim540 wrote:

There are countless ways to accomplish such bash operation, but I always prefer to write simple rules in snakemake.

# mvfq.py
rule:
    input: expand('{samples}_{reads}.fq.gz', samples=['ALT_1', 'ALT_2'], reads=['R1', 'R2'])

rule move_fqs:
    output: mvto = '{sample}_{read}.fq.gz'
    run:
        mvfrom = '_'.join([wildcards.sample.replace('_',''), wildcards.read.replace('R',''), 'clean.fq.gz'])
        shell('mv {mvfrom} {output.mvto}')

I can dryrun it

snakemake -s mvfq.py --dryrun

or run a specific target to make sure everything is working

snakemake -s mvfq.py ALT_1_R1.fq.gz

or run it all on my laptop

snakemake -s mvfq.py

or run it using 4 cores

snakemake -s mvfq.py -j4

or in a cluster via qsub with 100 independent jobs

snakemake -s mvfq.py -j100 -c "qsub"

or using remote files at S3 (or dropbox, google drive, etc) in a cluster

snakemake -s mvfq.py -j100 -c "qsub" --default-remote-provider S3 --default-remote-prefix s3/location/

or I can restart from the last failure check points, and many more.

All without changing the underlying code.

ADD COMMENTlink modified 12 days ago • written 12 days ago by Eric Lim540
3
gravatar for h.mon
12 days ago by
h.mon15k
Brazil
h.mon15k wrote:

Honestly, change the source code of the pipeline. If this is not possible, here is a one-liner rename (which, as igor noted, may not be available or installed on some systems):

rename 's/(\d)_(\d)_clean.fq.gz/_$1.R$2.fq.gz/' *.gz

Note the single quotes ', is you use double quotes " the capture will not work. As batch-renaming can have catastrophic consequences, I suggest you first perform a fry-run with -n, check if everything is good to go, then proceed with the renaming by not using -n.

ADD COMMENTlink written 12 days ago by h.mon15k
1

And to make things even more complicated, the rename tool linked by igor in another answer is not the same as the rename tool in this answer, which is available at https://metacpan.org/release/File-Rename, and in the rename package on Debian and related systems.

ADD REPLYlink written 12 days ago by Charles Plessy2.5k

Indeed, good point, which I overlooked. There are renames and renames around, this one is a Perl script, that other one is a binary executable, and in Debian and relatives is called rename.ul.

That is a lot of answers for a "how to rename files" question...

ADD REPLYlink written 12 days ago by h.mon15k

I guess this can be further shortened (code) and extended (function) by:

$ rename -n 's/(\d+)_(\d+)_clean/_$1.R$2/' *.gz
ADD REPLYlink modified 11 days ago • written 12 days ago by cpad01125.3k

To further complicate things, I don't think every rename has the -n flag. Mine (from util-linux-ng) does not.

ADD REPLYlink written 11 days ago by igor6.0k
2
gravatar for cpad0112
12 days ago by
cpad01125.3k
cpad01125.3k wrote:

Assuming that the files follow same pattern (esp digit_digit pattern)

$  parallel cp {} '{= s:([0-9]+)_([0-9]+)_clean:_$1\.R$2: =}' ::: *.gz
ADD COMMENTlink modified 12 days ago • written 12 days ago by cpad01125.3k
1
gravatar for shenwei356
11 days ago by
shenwei3563.6k
China
shenwei3563.6k wrote:

---- corrected answer----

Try brename, a practical cross-platform command-line tool for safely batch renaming files/directories via regular expression.

$ brename -p "(\d+)_(\d+)_clean" -r "_\$1.R\$2"
[INFO] checking: [ ok ] 'ALT1_1_clean.fq.gz' -> 'ALT_1.R1.fq.gz'
[INFO] checking: [ ok ] 'ALT1_2_clean.fq.gz' -> 'ALT_1.R2.fq.gz'
[INFO] checking: [ ok ] 'ALT2_1_clean.fq.gz' -> 'ALT_2.R1.fq.gz'
[INFO] checking: [ ok ] 'ALT2_2_clean.fq.gz' -> 'ALT_2.R2.fq.gz'
[INFO] 4 path(s) to be renamed
[INFO] renamed: 'ALT1_1_clean.fq.gz' -> 'ALT_1.R1.fq.gz'
[INFO] renamed: 'ALT1_2_clean.fq.gz' -> 'ALT_1.R2.fq.gz'
[INFO] renamed: 'ALT2_1_clean.fq.gz' -> 'ALT_2.R1.fq.gz'
[INFO] renamed: 'ALT2_2_clean.fq.gz' -> 'ALT_2.R2.fq.gz'
[INFO] 4 path(s) renamed
ADD COMMENTlink modified 11 days ago • written 11 days ago by shenwei3563.6k

That is not quite what OP wanted.

ADD REPLYlink written 11 days ago by genomax48k

Sorry for my carelessness, it's fixed.

ADD REPLYlink written 11 days ago by shenwei3563.6k

No worries. Your software is always comprehensive. Nice that you have sanity check built in before the changes are made. I assume software will stop if a test fails?

ADD REPLYlink modified 11 days ago • written 11 days ago by genomax48k

Right, it detects potential conflicts (overwriting existed paths and overwriting newly renamed path) and errors (blank target).

ADD REPLYlink written 11 days ago by shenwei3563.6k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 645 users visited in the last hour