Bedtools Intersect with files containing some alternative information
1
0
Entering edit mode
9.0 years ago

Do two files have to have the exact same information in them to intersect the two files? I'm trying to intersect these two files with a format similar to below:

Example of file 1:

chrom   exon_start      exon_end        strand  isoform exon_numer      gene    coding_length   total_mutations_reported        total_exonic_mutations  exonic_splicing_mutations       total_splice_site_mutations     3_ss_mutations  5_ss_mutations
chr17   7125985 7126184 +       NM_000018       10      ACADVL  199     15      11      0       4       2       2
chr17   7126962 7127049 +       NM_000018       12      ACADVL  87      7       4       0       3       1       2
chr11   108016928       108017086       +       NM_000019       11      ACAT1   158     10      7       1       3       2       1
chr12   52307342        52307554        +       NM_000020       4       ACVRL1  212     10      7       0       3       1       2

Example of file 2:

chr1    957580  957842  NM_198576       2       +       AGRN    262     0       0       0       0       0       0       exon    GTTCGGGTCTGGCGGTACTTGAAGGGCAAAGACCTGGTGGCCCGGGAGAGCCTGCTGGACGGCGGCAACAAGGTGGTGATCAGCGGCTTTGGAGACCCCCTCATCTGTGACAACCAGGTGTCCACTGGGGACACCAGGATCTTCTTTGTGAACCCTGCACCCCCATACCTGTGGCCAGCCCACAAGAACGAGCTGATGCTCAACTCCAGCCTCATGCGGATCACCCTGCGGAACCTGGAGGAGGTGGAGTTCTGTGTGGAAG  0.72692929292929
chr1    989132  989357  NM_198576       34      +       AGRN    225     0       0       0       0       0       0       exon    CGAGAAGGCACTGCAGAGCAACCACTTTGAACTGAGCCTGCGCACTGAGGCCACGCAGGGGCTGGTGCTCTGGAGTGGCAAGGCCACGGAGCGGGCAGACTATGTGGCACTGGCCATTGTGGACGGGCACCTGCAACTGAGCTACAACCTGGGCTCCCAGCCCGTGGTGCTGCGTTCCACCGTGCCCGTCAACACCAACCGCTGGTTGCGGGTCGTGGCACATAG       0.72252380952381

I want my output of intersect to be the information from file one. However, when I try bedtools intersect -a file1.bed -b file2.bed -wa I get this Error: unable to open file or unable to determine types for file total_splice_site_mut_greater3.bed. I just want to compare the first three columns of my files for returns, but I'm assuming that's not possible with bedtools if the files don't contain the same information. Am I correct? Any ideas on how I can achieve my desired result?

Edit: total_splice_sit_mut_greater3.bed is file1.bed

bedtools • 3.0k views
ADD COMMENT
0
Entering edit mode

your file1 doesn't seem to follow proper BED file format.

ADD REPLY
2
Entering edit mode
9.0 years ago
Ram 43k

File1.bed has a header where cols 2 and 3 are not numeric. Try extracting tail -n +2 of File1.bed to a File1_beheaded.bed and then running bedtools.

ADD COMMENT
0
Entering edit mode

Thank you. I should have noticed that.

ADD REPLY

Login before adding your answer.

Traffic: 2436 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6