Question: File conversion in bash, discrepancy in number of lines after conversion.
0
gravatar for  DataFanatic
7 months ago by
DataFanatic270
DataFanatic270 wrote:

csv file in excel and saved it as .txt and when I count the lines using wc -l there is one line less in the .txt format 17768 vs 17769.

  1. Is this normal? and if so which line is lost in the conversion?
  2. What is the easiest way to make this conversion in bash?

Thanks!

txt csv file conversion • 266 views
ADD COMMENTlink modified 7 months ago by zx87549.7k • written 7 months ago by DataFanatic270
1

Hello again. You could check the output of head and tail on both files, and that should reveal the discrepancy.

Kevin

ADD REPLYlink written 7 months ago by Kevin Blighe67k

I did all lines match in both files

ADD REPLYlink written 7 months ago by DataFanatic270
1

You could try the diff command. The discrepancy is likely related to encoding and end-line characters, e.g., Windows carriage returns (\r) versus end-line characters on Linux (\n)

ADD REPLYlink written 7 months ago by Kevin Blighe67k

head/tail would have freaked out on one of the files if the line endings had been different between them.

ADD REPLYlink written 7 months ago by _r_am31k

The initial assumption with head and tail was that there may have been a blank line at the beginning or end of one of the files. We cannot see the user's actions from where we are sitting.

ADD REPLYlink written 7 months ago by Kevin Blighe67k
1
gravatar for genomax
7 months ago by
genomax92k
United States
genomax92k wrote:

Try this instead:

cat results1.csv | tr "," "\t"  > results1.txt
ADD COMMENTlink written 7 months ago by genomax92k

You are the winner!!! Thank you!!

ADD REPLYlink written 7 months ago by DataFanatic270
1
gravatar for _r_am
7 months ago by
_r_am31k
Baylor College of Medicine, Houston, TX
_r_am31k wrote:

wc -l counts the number of new lines, not the number of lines. If your file ends with the last character of the last line, it will amount to one fewer line than the other file but diff would ignore differences. Try running a diff.

echo "Hello World" | wc
1 2 12

As you can see, the above has ` line and 12 characters, whereas the one below where the new line is suppressed has 0 lines and 11 characters.

echo -n "Hello World" | wc
0 2 11
ADD COMMENTlink modified 7 months ago • written 7 months ago by _r_am31k

For future reference, how can I convert the .csv file to a .txt file in bash?

ADD REPLYlink written 7 months ago by DataFanatic270
1

.csv is a file name extension for comma separated values while .txt is a file name extension for a plain text file. The former strongly alludes to its content format (comma separated values) whereas the latter only speaks of the nature of the content (plain text), not its format.

What is the difference between the content in the two files? If they're the same, you can simply copy/soft-link one to the other (an unnecessary step). If there is an actual difference, you could use sed to go from one to the other.

ADD REPLYlink modified 7 months ago • written 7 months ago by _r_am31k

Thanks for the clarification. Suppose I have a .csv file and want to convert/save as this file to a .txt file how would I do that in bash? Usually what I do is from excel I save the .csv file as a .txt

ADD REPLYlink modified 7 months ago • written 7 months ago by DataFanatic270
1

You don't need to do that. Most programs that accept a .txt input will accept a .csv input. If they don't just rename the file.

ADD REPLYlink written 7 months ago by _r_am31k
1

And to rename the file in the console you can use mv file.csv file.txt, just in case that's what you were looking for.

ADD REPLYlink written 7 months ago by LilithElina10
1

If you check with file *.csv you will see that operating system will identify (a real csv) file as ASCII text.

ADD REPLYlink modified 7 months ago • written 7 months ago by genomax92k

Thanks but my questions is very simple I only have curiosity to know how I would save a .csv as a .txt. I looked this up and it seemed overwhelmingly complicated and I don’t understand why, it should be simple a simple command that can take one and produce the other.

ADD REPLYlink modified 7 months ago • written 7 months ago by DataFanatic270
3

No need to convert. csv is a text file where all the columns in the text file are separated by commas. If it is a text file where all the columns are separated by tab, then it is a tsv file. Based on the separator (in this case comma), text files are called csv, tsv etc. An apt question could be how to convert the one separator (comma) to another separator (tab) for any given text file.

ADD REPLYlink modified 7 months ago • written 7 months ago by cpad011214k
2

All CSV files are TXT files, but not all TXT files are CSV files

ADD REPLYlink written 7 months ago by Joe18k

this is all I need to know: FBgn0286941,0,0,0,NA,NA,NA,NA,NA,NA,NA,NA,0,0,0,0,0,0

not what I want, I want to print tab-separated fields like this:

FBgn0286941 0 0 0 NA NA NA NA NA NA NA NA 0 0 0 0 0 0

How to do this???

ADD REPLYlink written 7 months ago by DataFanatic270
1
cat file |    tr "," "\t"  > new_file

Example

$ echo FBgn0286941,0,0,0,NA,NA,NA,NA,NA,NA,NA,NA,0,0,0,0,0,0 | tr "," "\t"
FBgn0286941     0       0       0       NA      NA      NA      NA      NA      NA      NA      NA      0       0       0       0       0 0
ADD REPLYlink modified 7 months ago • written 7 months ago by genomax92k

tr "," "\t" results1.csv > results1.txt

usage: tr [-Ccsu] string1 string2 tr [-Ccu] -d string1 tr [-Ccu] -s string1 tr [-Ccu] -ds string1 string2

0 Apr 29 10:51 results1.txt

ADD REPLYlink modified 7 months ago • written 7 months ago by DataFanatic270
1

As mentioned csv is already a plain text format with "," or ";" as a separator. Two options:

  1. replace the comma with another separator of your choice, for example by using tr
  2. just rename the extension: mv myfile.csv myfile.txt.
ADD REPLYlink written 7 months ago by zx87549.7k

doing that will print: cat test.txt

FBgn0286941,0,0,0,NA,NA,NA,NA,NA,NA,NA,NA,0,0,0,0,0,0

not what I want, I want to print tab-separated fields like this:

FBgn0286941 0 0 0 NA NA NA NA NA NA NA NA 0 0 0 0 0 0

Also, I don't want to replace the file so instead of mv one would use cp to keep the csv and the txt versions

ADD REPLYlink modified 7 months ago • written 7 months ago by DataFanatic270
1

You can open the CSV file in Excel, again, but ensure that Excel has correctly interpreted that the commas are the field delimiters. If it has correctly interpreted this, then the values will appear in multiple columns. Then save the file as TXT (tab-delimited).

Alternatively, if you have Mac or Linux, open a terminal and convert all commas to tabs via:

sed 's/,/\t/g' test.csv > test.tsv
ADD REPLYlink written 7 months ago by Kevin Blighe67k

This sed command will print:

FBgn0033358t502.2t899.9t104.6t0.1t-3.1t2.94230548037037e-60t4.57293117759163e-56t0t0t876.1t995.1t828.4t93.4t119.6t100.7

ADD REPLYlink written 7 months ago by DataFanatic270
1

It should not:

echo "FBgn0286941,0,0,0,NA,NA,NA,NA,NA,NA,NA,NA,0,0,0,0,0,0" |\
  sed 's/,/\t/g'
FBgn0286941 0   0   0   NA  NA  NA  NA  NA  NA  NA  NA  0   0   0   0   0   0

The functioning of sed will be system-dependent. I use linux (Ubuntu), and GNU sed v4.2.2.

Thank you.

ADD REPLYlink written 7 months ago by Kevin Blighe67k

Okay, thanks for your help!

ADD REPLYlink written 7 months ago by DataFanatic270
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1344 users visited in the last hour