Question: multiIntersectBed for SNPs, need to compare but not merge intervals
0
gravatar for Adrian Pelin
2.9 years ago by
Adrian Pelin2.3k
Canada
Adrian Pelin2.3k wrote:

Hello,

I have cases when there are SNPs called adjacent to each other, ex:

scaffold_1  106 107 C/A +
scaffold_1  107 108 G/A +

This is probably because this is not an SNP but an MNP (multiple nucleotide polymorphism?) where more than one bp changes at a time. Regardless, I would like to treat these as SNPs for the time being because they aren't that many and are not affecting my results.

I want to compare different isolate with multiIntersectBed, but the problem I am getting is that bedttools is merging such intervals, creating a common one 106 to 108, which really screws up the analysis. To show you what I mean:

@OHRI-bell-virt /scratch/temp_Test $ head ABWUN.NUMALT1.SNP.bed
scaffold_1  54  55  A/T +
scaffold_1  55  56  G/T +
scaffold_1  62  63  G/A +
scaffold_1  103 104 T/C +
scaffold_1  106 107 C/A +
scaffold_1  107 108 G/A +
scaffold_1  124 125 G/T +
scaffold_1  158 159 A/T +
scaffold_1  162 163 T/C +
scaffold_1  218 219 T/G +

@OHRI-bell-virt /scratch/temp_Test $ cp ABWUN.NUMALT1.SNP.bed ABWUN.NUMALT1.SNP.bed2

@OHRI-bell-virt /scratch/temp_Test $ wc -l ABWUN.NUMALT1.SNP.be*
  458044 ABWUN.NUMALT1.SNP.bed
  458044 ABWUN.NUMALT1.SNP.bed2
  916088 total

@OHRI-bell-virt /scratch/temp_Test $ multiIntersectBed -i ABWUN.NUMALT1.SNP.bed ABWUN.NUMALT1.SNP.bed2 > ABWUN.NUMALT1.SNP.multiIntersect.bed

@OHRI-bell-virt /scratch/temp_Test $ wc -l ABWUN.NUMALT1.SNP.multiIntersect.bed
402231 ABWUN.NUMALT1.SNP.multiIntersect.bed

@OHRI-bell-virt /scratch/temp_Test $ head ABWUN.NUMALT1.SNP.multiIntersect.bed
scaffold_1  54  56  2   1,2 1   1
scaffold_1  62  63  2   1,2 1   1
scaffold_1  103 104 2   1,2 1   1
scaffold_1  106 108 2   1,2 1   1
scaffold_1  124 125 2   1,2 1   1
scaffold_1  158 159 2   1,2 1   1
scaffold_1  162 163 2   1,2 1   1
scaffold_1  218 219 2   1,2 1   1
scaffold_1  225 226 2   1,2 1   1
scaffold_1  239 240 2   1,2 1   1

The fact that the merged file has less intervals (because it merged 106 to 107 and 107 to 108) is creating problems, because I want to compare amount of SNPs present in a given isolate compared to all. When I create a multiIntersect of all isolates and then use it to find out the proportion of SNPs in any given isolate present in at least 1 or 2, I sometimes get more intervals than SNPs in an isolate.

Any idea how to compare with multiIntersect but not merge intervals? Thanks, Adrian

snp intersect bedtools • 785 views
ADD COMMENTlink modified 2.9 years ago by Alex Reynolds29k • written 2.9 years ago by Adrian Pelin2.3k
1
gravatar for Alex Reynolds
2.9 years ago by
Alex Reynolds29k
Seattle, WA USA
Alex Reynolds29k wrote:

You can use bedops and bedmap --count to filter N BED files for at least X mutual exact overlaps:

$ bedops --everything file1.bed file2.bed file3.bed .. fileN.bed \
    | bedmap --count --echo --exact --delim '\t' - \
    | awk -voverlaps=${X} '$1 >= overlaps' \
    | cut -f2- \
    > answer.bed

Set $X to 1, 2, etc. Note that, when counting overlaps in a multiset union, an element will always overlap itself at least once, because it will show up in the multiset union at least once. So a threshold of 2 or more is probably more useful for this counting exercise, generally, unless you want to count everything.

ADD COMMENTlink modified 2.9 years ago • written 2.9 years ago by Alex Reynolds29k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 785 users visited in the last hour