Question: Batch rename *fastq.gz files using regular expression
1
gravatar for Leonardo Normando
23 days ago by
Brazil
Leonardo Normando50 wrote:

I'm trying to get a regex to work with rename; I've tried the approach of similar answered questions here but couldn't get the results I wanted.

The files are named as such:

SR1_S90_L001_R1_001.fastq.gz 
SR1_S90_L001_R2_001.fastq.gz
Rinc_S96_L001_R1_001.fastq.gz 
Rinc_S96_L001_R2_001.fastq.gz

And I would like to retain only the information prior to the first underscore and the _R1_ or _R2_ tags, like this:

SR1_R1_.fastq.gz
SR1_R2_.fastq.gz
Rinc_R1_.fastq.gz 
Rinc_R2_.fastq.gz

Thanks in advance!

regex fastq perl rename • 191 views
ADD COMMENTlink modified 23 days ago by cpad01127.6k • written 23 days ago by Leonardo Normando50
4
gravatar for shenwei356
23 days ago by
shenwei3564.0k
China
shenwei3564.0k wrote:

Try safe-batch-rename tool brename ( https://github.com/shenwei356/brename )

brename -p '^(\w+?)_.+_(R[12])_.+' -r '${1}_$2.fq.gz'    # updated

# original answer
# brename -p '^(\w+)_.+_(R[12])_.+' -r '${1}_$2.fq.gz'
# if you have ran this, you can run 'brename -u' to undo.
ADD COMMENTlink modified 22 days ago • written 23 days ago by shenwei3564.0k
1

Almost there!

  • The first group was including the second tag in the filename (eg. _S90_), hence the addition of the second " _.+ "
  • Changed the structure of the expression to include the underscore after the _R[12]

The command with the final changes:

brename -p '^(\w+)_.+_.+(_R[12]_).+' -r '${1}$2.fastq.gz' -d
  • Included the -d for the dry run tests ;)

Thanks a bunch and congratulations on your software, Wei Shen

ADD REPLYlink written 23 days ago by Leonardo Normando50
1

thanks for pointing out, if you have ran with the old command, you can run 'brename -u' to undo.

ADD REPLYlink written 22 days ago by shenwei3564.0k

Yeah! I saw the parameters that after running the script and was amazed to see that option (couldn't test since I already had deleted the folder XD )

Thanks also for the seqkit software, Shen Wei!

ADD REPLYlink modified 22 days ago • written 22 days ago by Leonardo Normando50
4
gravatar for st.ph.n
23 days ago by
st.ph.n2.3k
Philadelphia, PA
st.ph.n2.3k wrote:

Quick python solution.

#!/usr/bin/env python
import os, glob

for file in glob.glob("*.fastq.gz"):
    # test with print statement
    print file, '\t', file.split('_')[0] + '_' + file.split('_')[3] +  '_.fastq.gz'
    # uncomment to rename
    # os.rename(file, file.split('_')[0] + '_' + file.split('_')[3] +  '_.fastq.gz')

Save as rename_fastq.py; run as python rename_fastq.py in the directory containing fastq.gz files.

Not sure why you want to keep '_' after the R*

ADD COMMENTlink modified 23 days ago • written 23 days ago by st.ph.n2.3k

Hello!

I want to keep the '_' after the R* just to keep my sanity while running other scripts (that check for the patter _R*_ )

I've got a syntax error while running your script:

    import os, glob for file in glob.glob("/*.fastq.gz"):
                      ^
SyntaxError: invalid syntax

I've tried to replace the double quotes for single ones, but to no avail.

ADD REPLYlink written 23 days ago by Leonardo Normando50
1

the for statement should be on a new line from the import statement. Looks like it must not have copied/pasted correctly. I commented out the actually renaming part, so you could test first and review the lines that are printed.

ADD REPLYlink modified 23 days ago • written 23 days ago by st.ph.n2.3k

When running on:

python --version
Python 3.6.5 :: Anaconda, Inc.

I've got:

  File "rename_fastq.py", line 6
    print file, '\t', file.split('_')[0] + '_' + file.split('_')[3] + '_.fastq.gz'
             ^
SyntaxError: invalid syntax

But, using a Python 2.7.15 environment the script runs perfectly and as intended :D Thanks for you time!

ADD REPLYlink written 23 days ago by Leonardo Normando50
1

yes, i'm still writing 2.7 syntax.

ADD REPLYlink written 23 days ago by st.ph.n2.3k
3
gravatar for cpad0112
23 days ago by
cpad01127.6k
India
cpad01127.6k wrote:

rename -n 's/(\w_).*_(R[0-9])_.*(.fastq.gz)/$1$2$3/' *.fastq.gz or rename -n 's/(\w+_)\w+_\w+_(\w._)\w+(.\w+)/$1$2$3/' *.fastq.gz

-n runs the command in dummy mode and it is distro specific. Check the available for options for rename on your distro. -n option is available on ubuntu 18.04 and remove -n for final conversion.

ADD COMMENTlink modified 23 days ago • written 23 days ago by cpad01127.6k

Thanks!

It works as intended! Just modified to include the underscore after the _(R[0-9])_ part {and changed the range to [1-2]}

rename -n 's/(\w_).*_(R[1-2]_).*(.fastq.gz)/$1$2$3/' *.fastq.gz
ADD REPLYlink written 23 days ago by Leonardo Normando50
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1136 users visited in the last hour