Question

Produce a BED file with genomic coordinates outside a given interval

0

Entering edit mode

4.8 years ago

jacopo.martelossi2 • 0

Hi everyone, I'm carrying on a whole genome alignment analysis between two assemblies belonging to the same specie and obtained with different sequencing methods. In brief, they show a great different in the estimated genome size, with one of them being more in agreement with what expected from flow cytometry and I was interested in characterize the regions of the bigger one (here called as the query genome) that cannot be aligned with the smaller one ( the reference genome). I have already aligned the two assemblies using Mummer v.4 and I have produced a coord file with the following code:

show-coords -THrcl out.qdelta > out.qcoord

In the output are present the alignment coordinates of the reference and the query in a format like this:

[S1] [E1] [S2] [E2] [LEN 1] [LEN 2] [LEN R] [LEN Q] [TAGS]
5080 5651 12880   13453   572 574 58439140 50106 CM018522.1 tig00016503
5083 6078 39063   40043   996 981 58439140 62467 CM018522.1 tig00016642
5217 6079 20844   21712   863 869 58439140 22868 CM018522.1 tig00015536
5330 6079 36215   35465   750 751 58439140 61697 CM018522.1 tig00015298

where [S1] and [E1] represent respectively the start and the end alignment coordinates of the reference contig/scaffold/chromosome while [S2] and [E2] of the query contig/scaffold/chromosome.

I was wondering if there were a fast way to produce something like a BED file with the coordinates of the genomic regions that do not overlap with these intervals from which I can make a bit of filtering and extract them. E.g. Lets suppose that the contig named "tig00016503" ( first row) is 15,000 pb in length, the expected output should be something like this:

tig00016503 1 12879 
tig00016503 13454 15000

I have made some attempts with some bash codes and the intersect function from the bedtools package, but I have not found an easy way without producing a large amount of intermediate files.

Thank you very much in advance for your help!

genome alignment • 1.5k views

ADD COMMENT • link updated 4.8 years ago by Pierre Lindenbaum 166k • written 4.8 years ago by jacopo.martelossi2 • 0

score 3 · Accepted Answer · 2021-02-09

3

Entering edit mode

4.8 years ago

Pierre Lindenbaum 166k

I'm tool lazy to read your post , but I think you only want

bedtools complement

ADD COMMENT • link 4.8 years ago by Pierre Lindenbaum 166k

0

Entering edit mode

Thank you a lot, I completely miss that bedtools function! To much text for such a simple answer.

ADD REPLY • link 4.8 years ago by jacopo.martelossi2 • 0