How to remove [ACTG] from file names in bash
2
0
Entering edit mode
22 months ago

Hello i have several files which names have the following pattern

VIR3A_CCGCGGTT-CTAGCGCT_L00M_R1_001.fastq.gz 

VIR3Q_TAATACAG-GTGAATAT_L00M_R2_001.fastq.gz

 VIR4J_CGTTAGAA-GACCTGAA_L00M_R1_001.fastq.gz

I want to cut all the substrings which have ACTG. This is the desired output:

VIR3A_L00M_R1_001.fastq.gz 

VIR3Q_L00M_R2_001.fastq.gz

 VIR4J_L00M_R1_001.fastq.gz

how can I do that?

Thanks for your time :)

bash substring • 841 views
ADD COMMENT
0
Entering edit mode

why is it an issue ?

ADD REPLY
0
Entering edit mode

I'm trying to change the file names to the mentioned desired ones, there are like 40 of the files with the same pattern of ACTG elements

ADD REPLY
2
Entering edit mode
22 months ago
JC 13k
$ for F in *fastq.gz; do mv $F $(echo $F | perl -pe 's/_[ACGT]+?-[ACGT]+?_/_/'); done
mv VIR3A_CCGCGGTT-CTAGCGCT_L00M_R1_001.fastq.gz VIR3A_L00M_R1_001.fastq.gz
mv VIR3Q_TAATACAG-GTGAATAT_L00M_R2_001.fastq.gz VIR3Q_L00M_R2_001.fastq.gz
mv VIR4J_CGTTAGAA-GACCTGAA_L00M_R1_001.fastq.gz VIR4J_L00M_R1_001.fastq.gz
ADD COMMENT
1
Entering edit mode
22 months ago

with rename:

$ rename -n 's/_[ATGC]*-[ATGC]*//' *.gz

'VIR3A_CCGCGGTT-CTAGCGCT_L00M_R1_001.fastq.gz' would be renamed to 'VIR3A_L00M_R1_001.fastq.gz'
'VIR3Q_TAATACAG-GTGAATAT_L00M_R2_001.fastq.gz' would be renamed to 'VIR3Q_L00M_R2_001.fastq.gz'
'VIR4J_CGTTAGAA-GACCTGAA_L00M_R1_001.fastq.gz' would be renamed to 'VIR4J_L00M_R1_001.fastq.gz'

Remove -n once you are satisfied with dry-run.

with parallel:

$ parallel --plus --dry-run cp {} {=s/_\[ATGC\]+-\[ATGC\]+//=} ::: *.gz

cp VIR3A_CCGCGGTT-CTAGCGCT_L00M_R1_001.fastq.gz VIR3A_L00M_R1_001.fastq.gz
cp VIR3Q_TAATACAG-GTGAATAT_L00M_R2_001.fastq.gz VIR3Q_L00M_R2_001.fastq.gz
cp VIR4J_CGTTAGAA-GACCTGAA_L00M_R1_001.fastq.gz VIR4J_L00M_R1_001.fastq.gz

Remove dry-run if you are okay with output from dry run.

in bash shell (and sed):

$ for i in *.gz; do output=$(echo $i| sed 's/_[ATGC]*-[ATGC]*//'); echo cp $i $output; done

cp VIR3A_CCGCGGTT-CTAGCGCT_L00M_R1_001.fastq.gz VIR3A_L00M_R1_001.fastq.gz
cp VIR3Q_TAATACAG-GTGAATAT_L00M_R2_001.fastq.gz VIR3Q_L00M_R2_001.fastq.gz
cp VIR4J_CGTTAGAA-GACCTGAA_L00M_R1_001.fastq.gz VIR4J_L00M_R1_001.fastq.gz

Remove second echo if you are okay with dry-run.

ADD COMMENT
0
Entering edit mode

FYI, rename command is not a basic Linux tool, it needs to be installed with apt, yum, pacman, ...

ADD REPLY

Login before adding your answer.

Traffic: 2442 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6