Question: A Question On Doing Substitution By Using Bash
gravatar for Yunfei Li
8.9 years ago by
Yunfei Li310
ThermoFisher Scientific
Yunfei Li310 wrote:

I have a file like this:


DGM97JN1_135:2:1101:1283:2110    16    chr13.fa    ...

DGM97JN1_135:2:1101:1434:2186    16    chr08.fa    ...

DGM97JN1_135:2:1101:1385:2244    0    chr16.fa    ...

DGM97JN1_135:2:1101:1663:2038    0    chr13.fa    ...



and I would like to use a fast and easy way to delete the ".fa" after chromosome names in each row of the third column. How I do it now is

                   cat <input file> | tr -d '.fa'

But it does not works well, since it will also delete all "." in the file for some reasons, why this would happen and what is the right way to code it?

Besides since it is a large file, I wonder if there is a way I can narrow down the searching and substituting on only third column and therefore accelerate the process?

unix sam • 1.6k views
ADD COMMENTlink modified 2.1 years ago by Biostar ♦♦ 20 • written 8.9 years ago by Yunfei Li310
gravatar for Pierre Lindenbaum
8.9 years ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum129k wrote:

you need sed, not tr


 sed 's/\.fa//'  <input file>
ADD COMMENTlink written 8.9 years ago by Pierre Lindenbaum129k

and for inplace, use sed -i ''

ADD REPLYlink written 8.9 years ago by brentp23k

This will probably work for his data if and only if the ".fa" appears only in the third column ... or if he wants ".fa" removed from everywhere.

If it were specifically for the third column, things get a bit more complicated ... especially if columns are separated by spaces instead of single tabs. Then I'd write a quick (perl|tcl|whatever) filter.

ADD REPLYlink written 8.9 years ago by Bach550
gravatar for Frédéric Mahé
8.5 years ago by
France, Montpellier, CIRAD
Frédéric Mahé3.0k wrote:

Here is a pure bash solution:

while read c1 c2 c3 cn ; do new=${c3%.fa} ; echo -e "${c1}\t${c2}\t${new}\t${cn}" ; done < input.file

For each line in your file, it splits the line in 3 units (column a, b, and c) and it puts all the rest in d. The pattern ".fa" is removed from the third column, and the modified line is printed on the standard output.

ADD COMMENTlink written 8.5 years ago by Frédéric Mahé3.0k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1943 users visited in the last hour