Question

A Question On Doing Substitution By Using Bash

3

Entering edit mode

12.7 years ago

Yunfei Li ▴ 310

I have a file like this:

"

DGM97JN1_135:2:1101:1283:2110    16    chr13.fa    ...

DGM97JN1_135:2:1101:1434:2186    16    chr08.fa    ...

DGM97JN1_135:2:1101:1385:2244    0    chr16.fa    ...

DGM97JN1_135:2:1101:1663:2038    0    chr13.fa    ...

...........

"

and I would like to use a fast and easy way to delete the ".fa" after chromosome names in each row of the third column. How I do it now is

                   cat <input file> | tr -d '.fa'

But it does not works well, since it will also delete all "." in the file for some reasons, why this would happen and what is the right way to code it?

Besides since it is a large file, I wonder if there is a way I can narrow down the searching and substituting on only third column and therefore accelerate the process?

unix sam • 2.2k views

ADD COMMENT • link updated 5.9 years ago by Biostar 20 • written 12.7 years ago by Yunfei Li ▴ 310

1

Entering edit mode

12.3 years ago

Frédéric Mahé ★ 3.2k

Here is a pure bash solution:

while read c1 c2 c3 cn ; do new=${c3%.fa} ; echo -e "${c1}\t${c2}\t${new}\t${cn}" ; done < input.file

For each line in your file, it splits the line in 3 units (column a, b, and c) and it puts all the rest in d. The pattern ".fa" is removed from the third column, and the modified line is printed on the standard output.

ADD COMMENT • link 12.3 years ago by Frédéric Mahé ★ 3.2k

score 11 · Accepted Answer · 2011-08-17

11

Entering edit mode

12.7 years ago

Pierre Lindenbaum 161k

you need sed, not tr

try:

 sed 's/\.fa//'  <input file>

ADD COMMENT • link 12.7 years ago by Pierre Lindenbaum 161k

0

Entering edit mode

and for inplace, use sed -i ''

ADD REPLY • link 12.7 years ago by brentp 24k

0

Entering edit mode

This will probably work for his data if and only if the ".fa" appears only in the third column ... or if he wants ".fa" removed from everywhere.

If it were specifically for the third column, things get a bit more complicated ... especially if columns are separated by spaces instead of single tabs. Then I'd write a quick (perl|tcl|whatever) filter.

ADD REPLY • link 12.7 years ago by Bach ▴ 550