Question: changing the name of files
2
gravatar for Sam
6 months ago by
Sam80
Sam80 wrote:

Dear All

I have about 200 of libs with this naming format ALT1_1_clean.fq.gz but I have to change the name format to be recognized by pipeline. could you guide me about this?

Thanks

     "ALT1_1_clean.fq.gz" change to "ALT_1.R1.fq.gz"
    "ALT1_2_clean.fq.gz"  change to " ALT_1.R2.fq.gz"
    "ALT2_1_clean.fq.gz" change to " ALT_2.R1.fq.gz"
    "ALT2_2_clean.fq.gz" change to " ALT_2.R2.fq.gz"
    .
    .
    .
awk bash • 495 views
ADD COMMENTlink modified 6 months ago by shenwei3564.2k • written 6 months ago by Sam80
4
gravatar for Pierre Lindenbaum
6 months ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum114k wrote:
ls *_clean.fq.gz | while read F; do mv "$F" $( echo "${F}" | sed 's/_\([12]\)_clean.fq.gz/.R\1.fq.gz/;s/ALT/ALT_/') ; done
ADD COMMENTlink written 6 months ago by Pierre Lindenbaum114k
4
gravatar for igor
6 months ago by
igor6.9k
United States
igor6.9k wrote:

The easiest and most readable option (in my opinion):

rename ALT ALT_ *.fq.gz
rename _1_clean .R1 *.fq.gz
rename _2_clean .R2 *.fq.gz

Unfortunately, the rename utility may not be available on all systems.

ADD COMMENTlink written 6 months ago by igor6.9k
3
gravatar for Eric Lim
6 months ago by
Eric Lim1.1k
Boston
Eric Lim1.1k wrote:

There are countless ways to accomplish such bash operation, but I always prefer to write simple rules in snakemake.

# mvfq.py
rule:
    input: expand('{samples}_{reads}.fq.gz', samples=['ALT_1', 'ALT_2'], reads=['R1', 'R2'])

rule move_fqs:
    output: mvto = '{sample}_{read}.fq.gz'
    run:
        mvfrom = '_'.join([wildcards.sample.replace('_',''), wildcards.read.replace('R',''), 'clean.fq.gz'])
        shell('mv {mvfrom} {output.mvto}')

I can dryrun it

snakemake -s mvfq.py --dryrun

or run a specific target to make sure everything is working

snakemake -s mvfq.py ALT_1_R1.fq.gz

or run it all on my laptop

snakemake -s mvfq.py

or run it using 4 cores

snakemake -s mvfq.py -j4

or in a cluster via qsub with 100 independent jobs

snakemake -s mvfq.py -j100 -c "qsub"

or using remote files at S3 (or dropbox, google drive, etc) in a cluster

snakemake -s mvfq.py -j100 -c "qsub" --default-remote-provider S3 --default-remote-prefix s3/location/

or I can restart from the last failure check points, and many more.

All without changing the underlying code.

ADD COMMENTlink modified 6 months ago • written 6 months ago by Eric Lim1.1k
3
gravatar for h.mon
6 months ago by
h.mon21k
Brazil
h.mon21k wrote:

Honestly, change the source code of the pipeline. If this is not possible, here is a one-liner rename (which, as igor noted, may not be available or installed on some systems):

rename 's/(\d)_(\d)_clean.fq.gz/_$1.R$2.fq.gz/' *.gz

Note the single quotes ', is you use double quotes " the capture will not work. As batch-renaming can have catastrophic consequences, I suggest you first perform a fry-run with -n, check if everything is good to go, then proceed with the renaming by not using -n.

ADD COMMENTlink written 6 months ago by h.mon21k
1

And to make things even more complicated, the rename tool linked by igor in another answer is not the same as the rename tool in this answer, which is available at https://metacpan.org/release/File-Rename, and in the rename package on Debian and related systems.

ADD REPLYlink written 6 months ago by Charles Plessy2.6k

Indeed, good point, which I overlooked. There are renames and renames around, this one is a Perl script, that other one is a binary executable, and in Debian and relatives is called rename.ul.

That is a lot of answers for a "how to rename files" question...

ADD REPLYlink written 6 months ago by h.mon21k

I guess this can be further shortened (code) and extended (function) by:

$ rename -n 's/(\d+)_(\d+)_clean/_$1.R$2/' *.gz
ADD REPLYlink modified 6 months ago • written 6 months ago by cpad01129.9k

To further complicate things, I don't think every rename has the -n flag. Mine (from util-linux-ng) does not.

ADD REPLYlink written 6 months ago by igor6.9k
2
gravatar for cpad0112
6 months ago by
cpad01129.9k
India
cpad01129.9k wrote:

Assuming that the files follow same pattern (esp digit_digit pattern)

$  parallel cp {} '{= s:([0-9]+)_([0-9]+)_clean:_$1\.R$2: =}' ::: *.gz
ADD COMMENTlink modified 6 months ago • written 6 months ago by cpad01129.9k
1
gravatar for shenwei356
6 months ago by
shenwei3564.2k
China
shenwei3564.2k wrote:

---- corrected answer----

Try brename, a practical cross-platform command-line tool for safely batch renaming files/directories via regular expression.

$ brename -p "(\d+)_(\d+)_clean" -r "_\$1.R\$2"
[INFO] checking: [ ok ] 'ALT1_1_clean.fq.gz' -> 'ALT_1.R1.fq.gz'
[INFO] checking: [ ok ] 'ALT1_2_clean.fq.gz' -> 'ALT_1.R2.fq.gz'
[INFO] checking: [ ok ] 'ALT2_1_clean.fq.gz' -> 'ALT_2.R1.fq.gz'
[INFO] checking: [ ok ] 'ALT2_2_clean.fq.gz' -> 'ALT_2.R2.fq.gz'
[INFO] 4 path(s) to be renamed
[INFO] renamed: 'ALT1_1_clean.fq.gz' -> 'ALT_1.R1.fq.gz'
[INFO] renamed: 'ALT1_2_clean.fq.gz' -> 'ALT_1.R2.fq.gz'
[INFO] renamed: 'ALT2_1_clean.fq.gz' -> 'ALT_2.R1.fq.gz'
[INFO] renamed: 'ALT2_2_clean.fq.gz' -> 'ALT_2.R2.fq.gz'
[INFO] 4 path(s) renamed
ADD COMMENTlink modified 6 months ago • written 6 months ago by shenwei3564.2k

That is not quite what OP wanted.

ADD REPLYlink written 6 months ago by genomax58k

Sorry for my carelessness, it's fixed.

ADD REPLYlink written 6 months ago by shenwei3564.2k

No worries. Your software is always comprehensive. Nice that you have sanity check built in before the changes are made. I assume software will stop if a test fails?

ADD REPLYlink modified 6 months ago • written 6 months ago by genomax58k

Right, it detects potential conflicts (overwriting existed paths and overwriting newly renamed path) and errors (blank target).

ADD REPLYlink written 6 months ago by shenwei3564.2k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1397 users visited in the last hour