A Question On Doing Substitution By Using Bash
2
3
Entering edit mode
12.7 years ago
Yunfei Li ▴ 310

I have a file like this:

"

DGM97JN1_135:2:1101:1283:2110    16    chr13.fa    ...

DGM97JN1_135:2:1101:1434:2186    16    chr08.fa    ...

DGM97JN1_135:2:1101:1385:2244    0    chr16.fa    ...

DGM97JN1_135:2:1101:1663:2038    0    chr13.fa    ...

...........

"

and I would like to use a fast and easy way to delete the ".fa" after chromosome names in each row of the third column. How I do it now is

                   cat <input file> | tr -d '.fa'

But it does not works well, since it will also delete all "." in the file for some reasons, why this would happen and what is the right way to code it?

Besides since it is a large file, I wonder if there is a way I can narrow down the searching and substituting on only third column and therefore accelerate the process?

unix sam • 2.2k views
ADD COMMENT
11
Entering edit mode
12.7 years ago

you need sed, not tr

try:

 sed 's/\.fa//'  <input file>
ADD COMMENT
0
Entering edit mode

and for inplace, use sed -i ''

ADD REPLY
0
Entering edit mode

This will probably work for his data if and only if the ".fa" appears only in the third column ... or if he wants ".fa" removed from everywhere.

If it were specifically for the third column, things get a bit more complicated ... especially if columns are separated by spaces instead of single tabs. Then I'd write a quick (perl|tcl|whatever) filter.

ADD REPLY
1
Entering edit mode
12.3 years ago

Here is a pure bash solution:

while read c1 c2 c3 cn ; do new=${c3%.fa} ; echo -e "${c1}\t${c2}\t${new}\t${cn}" ; done < input.file

For each line in your file, it splits the line in 3 units (column a, b, and c) and it puts all the rest in d. The pattern ".fa" is removed from the third column, and the modified line is printed on the standard output.

ADD COMMENT

Login before adding your answer.

Traffic: 2739 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6