Question: A Question On Doing Substitution By Using Bash
3
gravatar for Yunfei Li
7.9 years ago by
Yunfei Li290
ThermoFisher Scientific
Yunfei Li290 wrote:

I have a file like this:

"

DGM97JN1_135:2:1101:1283:2110    16    chr13.fa    ...

DGM97JN1_135:2:1101:1434:2186    16    chr08.fa    ...

DGM97JN1_135:2:1101:1385:2244    0    chr16.fa    ...

DGM97JN1_135:2:1101:1663:2038    0    chr13.fa    ...

...........

"

and I would like to use a fast and easy way to delete the ".fa" after chromosome names in each row of the third column. How I do it now is

                   cat <input file> | tr -d '.fa'

But it does not works well, since it will also delete all "." in the file for some reasons, why this would happen and what is the right way to code it?

Besides since it is a large file, I wonder if there is a way I can narrow down the searching and substituting on only third column and therefore accelerate the process?

unix sam • 1.4k views
ADD COMMENTlink modified 13 months ago by Biostar ♦♦ 20 • written 7.9 years ago by Yunfei Li290
11
gravatar for Pierre Lindenbaum
7.9 years ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum121k wrote:

you need sed, not tr

try:

 sed 's/\.fa//'  <input file>
ADD COMMENTlink written 7.9 years ago by Pierre Lindenbaum121k

and for inplace, use sed -i ''

ADD REPLYlink written 7.9 years ago by brentp23k

This will probably work for his data if and only if the ".fa" appears only in the third column ... or if he wants ".fa" removed from everywhere.

If it were specifically for the third column, things get a bit more complicated ... especially if columns are separated by spaces instead of single tabs. Then I'd write a quick (perl|tcl|whatever) filter.

ADD REPLYlink written 7.9 years ago by Bach550
1
gravatar for Frédéric Mahé
7.5 years ago by
France, Montpellier, CIRAD
Frédéric Mahé2.9k wrote:

Here is a pure bash solution:

while read c1 c2 c3 cn ; do new=${c3%.fa} ; echo -e "${c1}\t${c2}\t${new}\t${cn}" ; done < input.file

For each line in your file, it splits the line in 3 units (column a, b, and c) and it puts all the rest in d. The pattern ".fa" is removed from the third column, and the modified line is printed on the standard output.

ADD COMMENTlink written 7.5 years ago by Frédéric Mahé2.9k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1536 users visited in the last hour