6.0 years ago by
Your attempt is kind of close. You may want to try the following command. The parts of the pipeline are separated on different lines to make the whole easier to read. The
\characters inform the terminal that the command continues on the next line.
cut -d " " -f 1 f1 | \
perl -pe 's/>/_newline_>/; s/\n/\t/' | \
perl -pe 's/_newline_//' | \
perl -pe 's/_newline_/\n/g' | \
perl -pe 's/\t$//' > f2
Here are some details about the steps:
cut -d " " -f 1 where
-d " "specifies that the delimiter is the space. This removes anything after the first space of the line.
perl -pe is used mostly like
sed -e, but sometimes I find it better to use perl, so rather than learning both sed and perl, I suggest learning only perl.
's/>/_newline_>/ adds a unique string to recreate the lines later
's/\n/\t/'replaces the newlines by tabs. At this point, the whole file is only one line.
perl -pe 's/_newline_//' removes the first occurence of
_newline_ in the file to avoid starting the file with an empty line later.
perl -pe 's/_newline_/\n/g' changes the
_newline_ string with a new line.
perl -pe 's/\t$//' removes tabulations at the end of the lines.
In this example, I use pipes (
|) a few times at places that may not be evident. Perl treats the file, or the input it gets through a pipe, one line at a time, as delimited by a new line character(
\n or some such). Thus, for example, when I remove all the new line characters at step 4, I create one long line and must use a pipe so that the next transformation can be applied to the whole file, not only the line that is currently being treated. This permits the trick in item 5 where I only remove the first occurrence or
_newline_ in the whole file, which is now on one line.