Question: New to BEDtools, can't get intersect to work
1
gravatar for Ashley S.
3.0 years ago by
Ashley S.30
Ashley S.30 wrote:

Hi y'all, I'm new to BEDtools and this type of data analysis in general, and I'm having a lot of trouble getting going.

I have two excel files that I saved as .txt files and then changed to .bed files and put in the Bedtools folder I'm using. They each have a column for chromosome number, start site, and end site. These are listed at the top but are commented out with #.

I can use bedtools and follow along with the tutorial and intersect will work. However, when I try to use my own files ("bedtools intersect -a EuarchCons6.bed -b PbxEmx6.bed | head -5") the terminal takes a minute to load and then displays nothing, as if there's no overlap or something, but I'm sure there is. I even created a test file with made up sequences that I knew overlapped with the first several regions in one of the files and it returned nothing.

I'm sure I'm probably missing something simple here, but if anyone could help me understand why intersect doesn't appear to be working that'd be really helpful. Thanks so much.

bedtools • 2.4k views
ADD COMMENTlink modified 3.0 years ago by morovatunc400 • written 3.0 years ago by Ashley S.30
1

hi, a couple of checks - 1) Is the chromosome name in both BEDs of same format? Sometime different genome releases of the same organism have chr name in different formats. 2) I am not sure if its necessary, but are the BEDs coord. sorted? 3) BED files are tab-delimited. I reckon you are not using a text-editor. Ensure that while saving the excel as .txt, you choose the tab-delim .txt option.

ADD REPLYlink written 3.0 years ago by Amitm1.6k

Thanks so much for you reply!

1) I think the chromosome name is in the same format? It's just listed as chr1, chr2, chr3....etc in both files.

2) I'm not actually sure what coord. sorted means, so I'm not sure. Is this something I should do?

3) I did save as the tab delimited .txt option, then once it was saved I changed from .txt to .bed.

If you have any further input or guidance that'd be really helpful...

ADD REPLYlink written 3.0 years ago by Ashley S.30
1

That is the problem of converting .txt to .bed format directly. You have to change the file encoding. Is it working in MAC OSX or windows or linux? If you are using MAC then first open the file in any text editor and change format from Classic MAC (CR) to Unix LF try to rename the file in terminal to .bed and then sort. Finally run your command. It should work

Try to print first 5 lines of both the file in the question and let us see what is the problem. It might be due to the fact that either the formatting is not correct for the bed file or it is not sorted.

ADD REPLYlink modified 3.0 years ago • written 3.0 years ago by ivivek_ngs4.8k

Hi, thanks for your help.

Here is the format of my two files:

chr1 5483534 5483544 emx2-pbx1 220 -

chr1 5665040 5665050 emx2-pbx1 226 -

chr1 5693479 5693489 emx2-pbx1 203 -

chr1 8264531 8264541 emx2-pbx1 216 +

chr1 10019964 10019974 emx2-pbx1 220 +

And:

chr1 3000305 3002480 chr1.1

chr1 3002511 3004262 chr1.2

chr1 3004282 3004535 chr1.3

chr1 3017203 3017692 chr1.4

chr1 3017906 3019013 chr1.5

I have them open in Textedit but I'm not quite sure how to change the format or sort like you mentioned.

Sorry these are such basic issues....really don't have much experience with anything like this.

ADD REPLYlink modified 3.0 years ago • written 3.0 years ago by Ashley S.30
1

This should be fine. It only needs first 3 columns and seems to be sorted to be as well

there should be text wrangler for MAC , open the files in text wrangler and change the format at the bottom bar from Classic Mac (CR) to Unix LF and save. Then use command line terminal command of bedtool on both the file. if you have opened once the file in .txt the format is mac so everything is in one line and so you need to save it in unix format.

ADD REPLYlink modified 3.0 years ago • written 3.0 years ago by ivivek_ngs4.8k
1
gravatar for morovatunc
3.0 years ago by
morovatunc400
Turkey
morovatunc400 wrote:

Hi,

The problem might be related with tab delimited file, if you are sure about your format. (chr start end peakname peak height).

Dont use excel to adjust your tab delimited text files, when you save a txt with excel it adds a different character to the end of the lines, thus messes up your tab delimited format.

Since bedtools intersect check for the tab delimited file format, it might not recognise that format.

How can you solve this problem, find a untouched bed file from Geo or somewhere. Try to the same action with them. If it happens, then the problem is probably related with your beds.

Hit me up if you need more help.

T.

ADD COMMENTlink written 3.0 years ago by morovatunc400

That does sound like it's definitely my issue. Now my problem is I don't know how to appropriately convert these .txt files into .bed files for bedtools. Is there a recommended way to do this? Thanks!

ADD REPLYlink written 3.0 years ago by Ashley S.30
1

extension of the files is not always important. If your data is in bed format, it will work even you put .txt extension to it. (unless in code it is specific to remove if the extension is not right) So, if you can put your data in a format which like below your problem will be solved. Can you use command line?

Also, for your next questions/issue, it would be better if you can specify your system info. For example, I am using a MacOsx Yosomite etc. So that we can help you with suggesting different tools for different OS.

CHR"TAB"START"TAB"END"TAB"PEAKNAME"TAB"PEAK_HEIGHT

ADD REPLYlink written 3.0 years ago by morovatunc400

Ok I see. I'm using MacOsx El Capitan. I'm usually ok with command line in a terminal.

So right now I have two Textedit files that look like this:

chr1    3000305 3002480 chr1.1
chr1    3002511 3004262 chr1.2
chr1    3004282 3004535 chr1.3
chr1    3017203 3017692 chr1.4
chr1    3017906 3019013 chr1.5

and

chr1    5483534     5483544     emx2-pbx1   220 -
chr1    5665040     5665050     emx2-pbx1   226 -
chr1    5693479     5693489     emx2-pbx1   203 -
chr1    8264531     8264541     emx2-pbx1   216 +
chr1    10019964    10019974    emx2-pbx1   220 +

Is it a problem that they have different columns? The first three in both are still chromosome, start, end site.

ADD REPLYlink written 3.0 years ago by Ashley S.30

Could you try take first four columns of the second file? and compare again. It might mess up the algorithm. Also could you check the delimeters with

$vi filename when you enter vi do this :set list this will show the hidden characters make sure, it is tab separated or at least same?

ADD REPLYlink written 3.0 years ago by morovatunc400
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1288 users visited in the last hour