Question: Bedtools Intersect with files containing some alternative information
0
gravatar for christylynn002
3.7 years ago by
United States
christylynn0020 wrote:

Do two files have to have the exact same information in them to intersect the two files? I'm trying to intersect these two files with a format similar to below:

Example of file 1:

chrom   exon_start      exon_end        strand  isoform exon_numer      gene    coding_length   total_mutations_reported        total_exonic_mutations  exonic_splicing_mutations       total_splice_site_mutations     3_ss_mutations  5_ss_mutations

chr17   7125985 7126184 +       NM_000018       10      ACADVL  199     15      11      0       4       2       2

chr17   7126962 7127049 +       NM_000018       12      ACADVL  87      7       4       0       3       1       2

chr11   108016928       108017086       +       NM_000019       11      ACAT1   158     10      7       1       3       2       1

chr12   52307342        52307554        +       NM_000020       4       ACVRL1  212     10      7       0       3       1       2

 

Example of file 2:

chr1    957580  957842  NM_198576       2       +       AGRN    262     0       0       0       0       0       0       exon    GTTCGGGTCTGGCGGTACTTGAAGGGCAAAGACCTGGTGGCCCGGGAGAGCCTGCTGGACGGCGGCAACAAGGTGGTGATCAGCGGCTTTGGAGACCCCCTCATCTGTGACAACCAGGTGTCCACTGGGGACACCAGGATCTTCTTTGTGAACCCTGCACCCCCATACCTGTGGCCAGCCCACAAGAACGAGCTGATGCTCAACTCCAGCCTCATGCGGATCACCCTGCGGAACCTGGAGGAGGTGGAGTTCTGTGTGGAAG  0.72692929292929

chr1    989132  989357  NM_198576       34      +       AGRN    225     0       0       0       0       0       0       exon    CGAGAAGGCACTGCAGAGCAACCACTTTGAACTGAGCCTGCGCACTGAGGCCACGCAGGGGCTGGTGCTCTGGAGTGGCAAGGCCACGGAGCGGGCAGACTATGTGGCACTGGCCATTGTGGACGGGCACCTGCAACTGAGCTACAACCTGGGCTCCCAGCCCGTGGTGCTGCGTTCCACCGTGCCCGTCAACACCAACCGCTGGTTGCGGGTCGTGGCACATAG       0.72252380952381

 

I want my output of intersect to be the information from file one. However, when I try bedtools intersect -a file1.bed -b file2.bed -wa I get this Error: unable to open file or unable to determine types for file total_splice_site_mut_greater3.bed . I just want to compare the first three columns of my files for returns, but I'm assuming that's not possible with bedtools if the files don't contain the same information. Am I correct? Any ideas on how I can achieve my desired result?

edit total_splice_sit_mut_greater3.bed is file1.bed

bedtools • 1.7k views
ADD COMMENTlink modified 10 weeks ago by Biostar ♦♦ 20 • written 3.7 years ago by christylynn0020

your file1 doesn't seem to follow proper BED file format.

ADD REPLYlink written 3.7 years ago by raunakms980
2
gravatar for RamRS
3.7 years ago by
RamRS19k
Houston, TX
RamRS19k wrote:

File1.bed has a header where cols 2 and 3 are not numeric. Try extracting tail -n +2 of File1.bed to a File1_beheaded.bed and then running bedtools.

ADD COMMENTlink written 3.7 years ago by RamRS19k

Thank you. I should have noticed that.

ADD REPLYlink written 3.7 years ago by christylynn0020
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1108 users visited in the last hour