How to merge genomic features shared by more than one samples
1
0
Entering edit mode
3.0 years ago
tianshenbio ▴ 170

I have 32 bed files and I hope to generate a unique annotation file from them by merging the features that shows overlapping regions from more than one individual files. For instance, if I got region coordinates: 1-15 and 13-17, the output should be 1-17. Any features only appearing in one sample will be discarded. Is there any tools I could use? I searched bedtools but there seem to be no suitable tools for this task.

bedtools gene RNA-seq bed annotation • 886 views
ADD COMMENT
0
Entering edit mode
3.0 years ago

bedtools multiinter output filtered with awk

ADD COMMENT
0
Entering edit mode

Hi, I searched the bedtools multiinter but the usage page was empty...I wonder if you know how I can use the tool to do that?

ADD REPLY
0
Entering edit mode
$ bedtools multiinter -examples

Tool:    bedtools multiinter (aka multiIntersectBed)
Version: v2.19.1-213-ga1aa1ae-dirty
Summary: Identifies common intervals among multiple
     BED/GFF/VCF files.

Usage:   bedtools multiinter [OPTIONS] -i FILE1 FILE2 .. FILEn
     Requires that each interval file is sorted by chrom/start. 

Options: 
    -cluster    Invoke Ryan Layers's clustering algorithm.

    -header     Print a header line.
            (chrom/start/end + names of each file).

    -names      A list of names (one/file) to describe each file in -i.
            These names will be printed in the header line.

    -g      Use genome file to calculate empty regions.
            - STRING.

    -empty      Report empty regions (i.e., start/end intervals w/o
            values in all files).
            - Requires the '-g FILE' parameter.

    -filler TEXT    Use TEXT when representing intervals having no value.
            - Default is '0', but you can use 'N/A' or any text.

    -examples   Show detailed usage examples.

Example usage:

== Input files: ==

 $ cat a.bed
 chr1  6   12
 chr1  10  20
 chr1  22  27
 chr1  24  30

 $ cat b.bed
 chr1  12  32
 chr1  14  30

 $ cat c.bed
 chr1  8   15
 chr1  10  14
 chr1  32  34

 $ cat sizes.txt
 chr1  5000

== Multi-intersect the files: ==

 $ multiIntersectBed -i a.bed b.bed c.bed
chr1    6   8   1   1   1   0   0
chr1    8   12  2   1,3 1   0   1
chr1    12  15  3   1,2,3   1   1   1
chr1    15  20  2   1,2 1   1   0
chr1    20  22  1   2   0   1   0
chr1    22  30  2   1,2 1   1   0
chr1    30  32  1   2   0   1   0
chr1    32  34  1   3   0   0   1

== Multi-intersect the files, with a header line (titles are the file names): ==

 $ multiIntersectBed -header -i a.bed b.bed c.bed
 chrom  start   end num list    a.bed   b.bed   c.bed
 chr1   6   8   1   1   1   0   0
 chr1   8   12  2   1,3 1   0   1
 chr1   12  15  3   1,2,3   1   1   1
 chr1   15  20  2   1,2 1   1   0
 chr1   20  22  1   2   0   1   0
 chr1   22  30  2   1,2 1   1   0
 chr1   30  32  1   2   0   1   0
 chr1   32  34  1   3   0   0   1

== Multi-intersect the files, with a header line and custom names: ==

 $ multiIntersectBed -header -i a.bed b.bed c.bed -names A B C
 chrom  start   end num list    A   B   C
 chr1   6   8   1   A   1   0   0
 chr1   8   12  2   A,C 1   0   1
 chr1   12  15  3   A,B,C   1   1   1
 chr1   15  20  2   A,B 1   1   0
 chr1   20  22  1   B   0   1   0
 chr1   22  30  2   A,B 1   1   0
 chr1   30  32  1   B   0   1   0
 chr1   32  34  1   C   0   0   1

== Multi-intersect the files, showing empty regions (note, requires -g): ==

 $ multiIntersectBed -header -i a.bed b.bed c.bed -names A B C -empty -g sizes.txt
 chrom  start   end num list    A   B   C
 chr1   0   6   0   none    0   0   0
 chr1   6   8   1   A   1   0   0
 chr1   8   12  2   A,C 1   0   1
 chr1   12  15  3   A,B,C   1   1   1
 chr1   15  20  2   A,B 1   1   0
 chr1   20  22  1   B   0   1   0
 chr1   22  30  2   A,B 1   1   0
 chr1   30  32  1   B   0   1   0
 chr1   32  34  1   C   0   0   1
 chr1   34  5000    0   none    0   0   0
ADD REPLY
0
Entering edit mode

This is very clear!Thank you for your clarification.

ADD REPLY

Login before adding your answer.

Traffic: 1515 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6