Bedtools - writing to an output files results in empty files
4
0
Entering edit mode
7.6 years ago
Sadia ▴ 60

Hello, I sort of have 2 questions. I've always used bedtools through cygwin on my PC but have recently switched to a MAC. It took me a while to get used to the line endings issue with txt files on the MAC and even though everyone recommends mac2unix to convert files, it always gives me blank files even though I have the dos2unix package installed. So even though I would prefer mac2unix, I am stuck with using "perl -p -e 's/\r/\n/g' < macfile.txt > unixfile.txt".

In addition, I am currently trying to use the sort function from Bedtools. sort -k1,1 -k2,2n input.bed > sortedinput.bed

and when I do not specify an output filename, I can see the sorted results in the Terminal but when I want to the results to go into an output file, the file is just blank. The exact command I am using is "sort -k 1,1 -k2,2n unixfile.txt > unixfile-sort.txt"

Could someone please offer up any ideas and or issues of how to make sort/bedtools work for me and any issues I may not have addressed when switching between PC and MAC. Thanks.

ChIP-Seq • 3.6k views
ADD COMMENT
0
Entering edit mode

You do not need to pipe mac2unix. Just run it like this: mac2unix file.txt. That will convert file.txt in place.

ADD REPLY
0
Entering edit mode

Hi Sadia. The sort command is a unix command, which you can do without having bedtools installed. Can you tell us a little bit more about the file that is created after doing the command and can you show us the first few lines of the input file that you are using? Thanks.

ADD REPLY
0
Entering edit mode
7.6 years ago
Sadia ▴ 60

The first few lines of the file I am trying to sort are:

chr8 47953244 47954244 chr8:47953244-47954244 peak 1 chr5 143584660 143585660 chr5:143584660-143585660 peak 2 chr15 38472011 38473011 chr15:38472011-38473011 peak 3 chr5 38738013 38739013 chr5:38738013-38739013 peak 4

The output file I get after sort is simply a blank file. There is no text.

ADD COMMENT
0
Entering edit mode

Could you format this better? I can't tell what is a line and what is a column.

ADD REPLY
0
Entering edit mode

Sorry about that, and I've checked that the file is converted to ASCII text.

chr8 47953244 47954244 chr8:47953244-47954244 peak 1

chr5 143584660 143585660 chr5:143584660-143585660 peak 2

chr15 38472011 38473011 chr15:38472011-38473011 peak 3

chr5 38738013 38739013 chr5:38738013-38739013 peak 4

chr4 128963983 128964983 chr4:128963983-128964983 peak 5

ADD REPLY
0
Entering edit mode
7.6 years ago
apnri ▴ 40

Your bed file seems to be separated by space instead of tabs. You can try this:

gsort -t' ' -k1,1V -k2,2n in.bed >out.bed

EDIT: Adding to Alexander's command - to sort the first column correctly, you will need to add a -k1,1V. The updated command should now work for your file.

$ cut -d' ' -f1,2,3 test.txt | gsort -t' ' -k1,1 -k2,2n
chr15 38472011 38473011
chr4 128963983 128964983
chr5 38738013 38739013
chr5 143584660 143585660

Adding -k1,1V

$ cut -d' ' -f1,2,3 test.txt | gsort -t' ' -k1,1V -k2,2n
chr4 128963983 128964983
chr5 38738013 38739013
chr5 143584660 143585660
chr15 38472011 38473011

You can add -u to keep unique rows

ADD COMMENT
0
Entering edit mode

That didn't seem to work either.

ADD REPLY
0
Entering edit mode

Looks like there is no new line in your file. Can you format the snippet of file contents as code? That will be helpful. May be you can also add a screenshot of how the file looks when you open it in vim.

ADD REPLY
0
Entering edit mode
7.6 years ago
ATpoint 81k

Just as recommendation by experience: Do not use the bedtools sort function. Use either the built-in sort of OS X or (my recommendation) get gsort from GNU coreutils, e.g. via homebrew, which provides a multithreading option. gsort in my experience outcompetes both other options.

ADD COMMENT
0
Entering edit mode

I would prefer to use the built-in sort of OS X but I can't get it to work thus far. Can I ask why you would not recommend the bedtools sort function?

ADD REPLY
0
Entering edit mode

and which command would be used to sort by chromosome with GNU coreutils?

ADD REPLY
1
Entering edit mode

bedtools sort is slower and consumes more memory than built-in sort, see also the manual page of bedtools, where this is mentioned: http://bedtools.readthedocs.io/en/latest/content/tools/sort.html GNU command would be

gsort -k1,1 -k2,2n with --parallel=cores and -u to keep only unique entries, if desired.

ADD REPLY
0
Entering edit mode
7.6 years ago
Sadia ▴ 60

I have gotten it to do what I want now. First I convert to unix format using

changeNewLine.pl unixfile.txt and check that it has been converted,

file unixfile.txt gives me: unixfile.txt: ASCII text

Then I was able to sort using the same command I was using before which works now:

sort -k1,1 -k2,2n unixfile.txt > unixsort.txt

The only thing different I did was to open up the txt file in excel and save it again as a txt. This allowed changeNewLine.pl to work which wasn't working before for me. Maybe the computer is being moody or maybe I overlooked something basic, but thank goodness it finally works. Thank you everyone for the advice and your time, it was truly appreciated.

ADD COMMENT

Login before adding your answer.

Traffic: 2932 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6