Question: awk format question
0
gravatar for newbiebio
3.1 years ago by
newbiebio80
newbiebio80 wrote:

I have a gene list txt file and need to format it to tab-delimit file. The list has six columns: chr, startPos,endPos, width, strand, name The code is awk '{printf("%s\t%s\t%s\t%s\t%s\t%s\t\n",$1,$2,$3,$4,$5,$6)};' input.txt > output.txt Then I ran some codes, the result said that 'Perhaps you have non-integer starts or ends at line 1?' I used ' cat -e file |more' to check the lines, each line has addition (tab) $ at the end.

ex. chr12<tab>1234<tab>456<tab>789<tab>+<tab>TP53<tab>$ Please let me know where is wrong. Thanks in advance.

awk format bed • 842 views
ADD COMMENTlink modified 3.1 years ago by genomax76k • written 3.1 years ago by newbiebio80
0
gravatar for Alex Reynolds
3.1 years ago by
Alex Reynolds29k
Seattle, WA USA
Alex Reynolds29k wrote:

It isn't related to the error, probably, but you print an extra and unnecessary tab at the end of each line:

awk '{printf("%s\t%s\t%s\t%s\t%s\t%s\t\n",$1,$2,$3,$4,$5,$6)};' input.txt > output.txt
                                    ^
                                    |

I'm not sure what the error message means, but in your example, there are a few potential problems:

  1. The end coordinate is smaller than the start coordinate.
  2. The name ("ID") generally goes into the fourth column.
  3. The score or some numerical value generally goes into the fifth column.
  4. The strand generally goes into the sixth column.

Most of these issues can be fixed with a few tweaks, like reordering the field variables $2 through $6. If you can post the actual output of running the above on your true input.txt file, that may help figure out what the real issue is.

ADD COMMENTlink modified 3.1 years ago • written 3.1 years ago by Alex Reynolds29k

I removed the '\t' you printed out, the '$' is still at the end of each line. The result like 'chr12<tab>1234<tab>456<tab>789<tab>+<tab>TP53$' now. But I will check each column as you suggested. And see if it works. Thank you very much for your suggestion.

ADD REPLYlink written 3.1 years ago by newbiebio80

The $ is just a symbol to show the newline character, which is correct. I would recommend checking the order of $2 through $6.

ADD REPLYlink modified 3.1 years ago • written 3.1 years ago by Alex Reynolds29k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 943 users visited in the last hour