Question: Bedtools - writing to an output files results in empty files
0
gravatar for Sadia
10 months ago by
Sadia10
Sadia10 wrote:

Hello, I sort of have 2 questions. I've always used bedtools through cygwin on my PC but have recently switched to a MAC. It took me a while to get used to the line endings issue with txt files on the MAC and even though everyone recommends mac2unix to convert files, it always gives me blank files even though I have the dos2unix package installed. So even though I would prefer mac2unix, I am stuck with using "perl -p -e 's/\r/\n/g' < macfile.txt > unixfile.txt".

In addition, I am currently trying to use the sort function from Bedtools. sort -k1,1 -k2,2n input.bed > sortedinput.bed

and when I do not specify an output filename, I can see the sorted results in the Terminal but when I want to the results to go into an output file, the file is just blank. The exact command I am using is "sort -k 1,1 -k2,2n unixfile.txt > unixfile-sort.txt"

Could someone please offer up any ideas and or issues of how to make sort/bedtools work for me and any issues I may not have addressed when switching between PC and MAC. Thanks.

chip-seq • 464 views
ADD COMMENTlink modified 10 months ago • written 10 months ago by Sadia10

You do not need to pipe mac2unix. Just run it like this: mac2unix file.txt. That will convert file.txt in place.

ADD REPLYlink written 10 months ago by igor4.4k

Hi Sadia. The sort command is a unix command, which you can do without having bedtools installed. Can you tell us a little bit more about the file that is created after doing the command and can you show us the first few lines of the input file that you are using? Thanks.

ADD REPLYlink written 10 months ago by eromasko120
0
gravatar for Sadia
10 months ago by
Sadia10
Sadia10 wrote:

The first few lines of the file I am trying to sort are:

chr8 47953244 47954244 chr8:47953244-47954244 peak 1 chr5 143584660 143585660 chr5:143584660-143585660 peak 2 chr15 38472011 38473011 chr15:38472011-38473011 peak 3 chr5 38738013 38739013 chr5:38738013-38739013 peak 4

The output file I get after sort is simply a blank file. There is no text.

ADD COMMENTlink written 10 months ago by Sadia10

Could you format this better? I can't tell what is a line and what is a column.

ADD REPLYlink written 10 months ago by cbio370

Sorry about that, and I've checked that the file is converted to ASCII text.

chr8 47953244 47954244 chr8:47953244-47954244 peak 1

chr5 143584660 143585660 chr5:143584660-143585660 peak 2

chr15 38472011 38473011 chr15:38472011-38473011 peak 3

chr5 38738013 38739013 chr5:38738013-38739013 peak 4

chr4 128963983 128964983 chr4:128963983-128964983 peak 5

ADD REPLYlink written 10 months ago by Sadia10
0
gravatar for apnri
10 months ago by
apnri40
apnri40 wrote:

Your bed file seems to be separated by space instead of tabs. You can try this:

gsort -t' ' -k1,1V -k2,2n in.bed >out.bed

EDIT: Adding to Alexander's command - to sort the first column correctly, you will need to add a -k1,1V. The updated command should now work for your file.

$ cut -d' ' -f1,2,3 test.txt | gsort -t' ' -k1,1 -k2,2n
chr15 38472011 38473011
chr4 128963983 128964983
chr5 38738013 38739013
chr5 143584660 143585660

Adding -k1,1V

$ cut -d' ' -f1,2,3 test.txt | gsort -t' ' -k1,1V -k2,2n
chr4 128963983 128964983
chr5 38738013 38739013
chr5 143584660 143585660
chr15 38472011 38473011

You can add -u to keep unique rows

ADD COMMENTlink modified 10 months ago • written 10 months ago by apnri40

That didn't seem to work either.

ADD REPLYlink written 10 months ago by Sadia10

Looks like there is no new line in your file. Can you format the snippet of file contents as code? That will be helpful. May be you can also add a screenshot of how the file looks when you open it in vim.

ADD REPLYlink modified 10 months ago • written 10 months ago by apnri40
0
gravatar for ATPoint
10 months ago by
ATPoint580
Muenster, Germany
ATPoint580 wrote:

Just as recommendation by experience: Do not use the bedtools sort function. Use either the built-in sort of OS X or (my recommendation) get gsort from GNU coreutils, e.g. via homebrew, which provides a multithreading option. gsort in my experience outcompetes both other options.

ADD COMMENTlink written 10 months ago by ATPoint580

I would prefer to use the built-in sort of OS X but I can't get it to work thus far. Can I ask why you would not recommend the bedtools sort function?

ADD REPLYlink written 10 months ago by Sadia10

and which command would be used to sort by chromosome with GNU coreutils?

ADD REPLYlink written 10 months ago by Sadia10
1

bedtools sort is slower and consumes more memory than built-in sort, see also the manual page of bedtools, where this is mentioned: http://bedtools.readthedocs.io/en/latest/content/tools/sort.html GNU command would be

gsort -k1,1 -k2,2n with --parallel=cores and -u to keep only unique entries, if desired.

ADD REPLYlink modified 10 months ago • written 10 months ago by ATPoint580
0
gravatar for Sadia
10 months ago by
Sadia10
Sadia10 wrote:

I have gotten it to do what I want now. First I convert to unix format using

changeNewLine.pl unixfile.txt and check that it has been converted,

file unixfile.txt gives me: unixfile.txt: ASCII text

Then I was able to sort using the same command I was using before which works now:

sort -k1,1 -k2,2n unixfile.txt > unixsort.txt

The only thing different I did was to open up the txt file in excel and save it again as a txt. This allowed changeNewLine.pl to work which wasn't working before for me. Maybe the computer is being moody or maybe I overlooked something basic, but thank goodness it finally works. Thank you everyone for the advice and your time, it was truly appreciated.

ADD COMMENTlink written 10 months ago by Sadia10
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1163 users visited in the last hour