Question: File conversion in bash, discrepancy in number of lines after conversion.
0
gravatar for ATCG
10 months ago by
ATCG300
ATCG300 wrote:

csv file in excel and saved it as .txt and when I count the lines using wc -l there is one line less in the .txt format 17768 vs 17769.

  1. Is this normal? and if so which line is lost in the conversion?
  2. What is the easiest way to make this conversion in bash?

Thanks!

txt csv file conversion • 328 views
ADD COMMENTlink modified 10 months ago by zx875410.0k • written 10 months ago by ATCG300
1

Hello again. You could check the output of head and tail on both files, and that should reveal the discrepancy.

Kevin

ADD REPLYlink written 10 months ago by Kevin Blighe71k

I did all lines match in both files

ADD REPLYlink written 10 months ago by ATCG300
1

You could try the diff command. The discrepancy is likely related to encoding and end-line characters, e.g., Windows carriage returns (\r) versus end-line characters on Linux (\n)

ADD REPLYlink written 10 months ago by Kevin Blighe71k

head/tail would have freaked out on one of the files if the line endings had been different between them.

ADD REPLYlink written 10 months ago by Ram32k

The initial assumption with head and tail was that there may have been a blank line at the beginning or end of one of the files. We cannot see the user's actions from where we are sitting.

ADD REPLYlink written 10 months ago by Kevin Blighe71k
1
gravatar for GenoMax
10 months ago by
GenoMax96k
United States
GenoMax96k wrote:

Try this instead:

cat results1.csv | tr "," "\t"  > results1.txt
ADD COMMENTlink written 10 months ago by GenoMax96k

You are the winner!!! Thank you!!

ADD REPLYlink written 10 months ago by ATCG300
1
gravatar for Ram
10 months ago by
Ram32k
Baylor College of Medicine, Houston, TX
Ram32k wrote:

wc -l counts the number of new lines, not the number of lines. If your file ends with the last character of the last line, it will amount to one fewer line than the other file but diff would ignore differences. Try running a diff.

echo "Hello World" | wc
1 2 12

As you can see, the above has ` line and 12 characters, whereas the one below where the new line is suppressed has 0 lines and 11 characters.

echo -n "Hello World" | wc
0 2 11
ADD COMMENTlink modified 10 months ago • written 10 months ago by Ram32k

For future reference, how can I convert the .csv file to a .txt file in bash?

ADD REPLYlink written 10 months ago by ATCG300
1

.csv is a file name extension for comma separated values while .txt is a file name extension for a plain text file. The former strongly alludes to its content format (comma separated values) whereas the latter only speaks of the nature of the content (plain text), not its format.

What is the difference between the content in the two files? If they're the same, you can simply copy/soft-link one to the other (an unnecessary step). If there is an actual difference, you could use sed to go from one to the other.

ADD REPLYlink modified 10 months ago • written 10 months ago by Ram32k

Thanks for the clarification. Suppose I have a .csv file and want to convert/save as this file to a .txt file how would I do that in bash? Usually what I do is from excel I save the .csv file as a .txt

ADD REPLYlink modified 10 months ago • written 10 months ago by ATCG300
1

You don't need to do that. Most programs that accept a .txt input will accept a .csv input. If they don't just rename the file.

ADD REPLYlink written 10 months ago by Ram32k
1

And to rename the file in the console you can use mv file.csv file.txt, just in case that's what you were looking for.

ADD REPLYlink written 10 months ago by LilithElina10
1

If you check with file *.csv you will see that operating system will identify (a real csv) file as ASCII text.

ADD REPLYlink modified 10 months ago • written 10 months ago by GenoMax96k

Thanks but my questions is very simple I only have curiosity to know how I would save a .csv as a .txt. I looked this up and it seemed overwhelmingly complicated and I don’t understand why, it should be simple a simple command that can take one and produce the other.

ADD REPLYlink modified 10 months ago • written 10 months ago by ATCG300
3

No need to convert. csv is a text file where all the columns in the text file are separated by commas. If it is a text file where all the columns are separated by tab, then it is a tsv file. Based on the separator (in this case comma), text files are called csv, tsv etc. An apt question could be how to convert the one separator (comma) to another separator (tab) for any given text file.

ADD REPLYlink modified 10 months ago • written 10 months ago by cpad011215k
2

All CSV files are TXT files, but not all TXT files are CSV files

ADD REPLYlink written 10 months ago by Joe19k

this is all I need to know: FBgn0286941,0,0,0,NA,NA,NA,NA,NA,NA,NA,NA,0,0,0,0,0,0

not what I want, I want to print tab-separated fields like this:

FBgn0286941 0 0 0 NA NA NA NA NA NA NA NA 0 0 0 0 0 0

How to do this???

ADD REPLYlink written 10 months ago by ATCG300
1
cat file |    tr "," "\t"  > new_file

Example

$ echo FBgn0286941,0,0,0,NA,NA,NA,NA,NA,NA,NA,NA,0,0,0,0,0,0 | tr "," "\t"
FBgn0286941     0       0       0       NA      NA      NA      NA      NA      NA      NA      NA      0       0       0       0       0 0
ADD REPLYlink modified 10 months ago • written 10 months ago by GenoMax96k

tr "," "\t" results1.csv > results1.txt

usage: tr [-Ccsu] string1 string2 tr [-Ccu] -d string1 tr [-Ccu] -s string1 tr [-Ccu] -ds string1 string2

0 Apr 29 10:51 results1.txt

ADD REPLYlink modified 10 months ago • written 10 months ago by ATCG300
1

As mentioned csv is already a plain text format with "," or ";" as a separator. Two options:

  1. replace the comma with another separator of your choice, for example by using tr
  2. just rename the extension: mv myfile.csv myfile.txt.
ADD REPLYlink written 10 months ago by zx875410.0k

doing that will print: cat test.txt

FBgn0286941,0,0,0,NA,NA,NA,NA,NA,NA,NA,NA,0,0,0,0,0,0

not what I want, I want to print tab-separated fields like this:

FBgn0286941 0 0 0 NA NA NA NA NA NA NA NA 0 0 0 0 0 0

Also, I don't want to replace the file so instead of mv one would use cp to keep the csv and the txt versions

ADD REPLYlink modified 10 months ago • written 10 months ago by ATCG300
1

You can open the CSV file in Excel, again, but ensure that Excel has correctly interpreted that the commas are the field delimiters. If it has correctly interpreted this, then the values will appear in multiple columns. Then save the file as TXT (tab-delimited).

Alternatively, if you have Mac or Linux, open a terminal and convert all commas to tabs via:

sed 's/,/\t/g' test.csv > test.tsv
ADD REPLYlink written 10 months ago by Kevin Blighe71k

This sed command will print:

FBgn0033358t502.2t899.9t104.6t0.1t-3.1t2.94230548037037e-60t4.57293117759163e-56t0t0t876.1t995.1t828.4t93.4t119.6t100.7

ADD REPLYlink written 10 months ago by ATCG300
1

It should not:

echo "FBgn0286941,0,0,0,NA,NA,NA,NA,NA,NA,NA,NA,0,0,0,0,0,0" |\
  sed 's/,/\t/g'
FBgn0286941 0   0   0   NA  NA  NA  NA  NA  NA  NA  NA  0   0   0   0   0   0

The functioning of sed will be system-dependent. I use linux (Ubuntu), and GNU sed v4.2.2.

Thank you.

ADD REPLYlink written 10 months ago by Kevin Blighe71k

Okay, thanks for your help!

ADD REPLYlink written 10 months ago by ATCG300
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1724 users visited in the last hour
_