Question: awk format question
0
gravatar for newbiebio
2.3 years ago by
newbiebio80
newbiebio80 wrote:

I have a gene list txt file and need to format it to tab-delimit file. The list has six columns: chr, startPos,endPos, width, strand, name The code is awk '{printf("%s\t%s\t%s\t%s\t%s\t%s\t\n",$1,$2,$3,$4,$5,$6)};' input.txt > output.txt Then I ran some codes, the result said that 'Perhaps you have non-integer starts or ends at line 1?' I used ' cat -e file |more' to check the lines, each line has addition (tab) $ at the end.

ex. chr12<tab>1234<tab>456<tab>789<tab>+<tab>TP53<tab>$ Please let me know where is wrong. Thanks in advance.

awk format bed • 693 views
ADD COMMENTlink modified 2.3 years ago by genomax65k • written 2.3 years ago by newbiebio80
0
gravatar for Alex Reynolds
2.3 years ago by
Alex Reynolds28k
Seattle, WA USA
Alex Reynolds28k wrote:

It isn't related to the error, probably, but you print an extra and unnecessary tab at the end of each line:

awk '{printf("%s\t%s\t%s\t%s\t%s\t%s\t\n",$1,$2,$3,$4,$5,$6)};' input.txt > output.txt
                                    ^
                                    |

I'm not sure what the error message means, but in your example, there are a few potential problems:

  1. The end coordinate is smaller than the start coordinate.
  2. The name ("ID") generally goes into the fourth column.
  3. The score or some numerical value generally goes into the fifth column.
  4. The strand generally goes into the sixth column.

Most of these issues can be fixed with a few tweaks, like reordering the field variables $2 through $6. If you can post the actual output of running the above on your true input.txt file, that may help figure out what the real issue is.

ADD COMMENTlink modified 2.3 years ago • written 2.3 years ago by Alex Reynolds28k

I removed the '\t' you printed out, the '$' is still at the end of each line. The result like 'chr12<tab>1234<tab>456<tab>789<tab>+<tab>TP53$' now. But I will check each column as you suggested. And see if it works. Thank you very much for your suggestion.

ADD REPLYlink written 2.3 years ago by newbiebio80

The $ is just a symbol to show the newline character, which is correct. I would recommend checking the order of $2 through $6.

ADD REPLYlink modified 2.3 years ago • written 2.3 years ago by Alex Reynolds28k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 986 users visited in the last hour