Question: Find overlap between bam and gff
0
gravatar for alvarocentron91
22 months ago by
alvarocentron9110 wrote:

Hello I have a .bam with RNA-seq data, a .gff with the regions I would like to study and another .gff with repeatmasked positions.

I would like to get a file where I have the regions from my .gff which overlap (completely and not) with at least X reads from my .bam and with no overlap within the repeatmasked positions.

Any tips?

Many thanks in advance!

rna-seq • 715 views
ADD COMMENTlink modified 22 months ago by Alex Reynolds30k • written 22 months ago by alvarocentron9110
2
gravatar for h.mon
22 months ago by
h.mon29k
Brazil
h.mon29k wrote:

Use bedtools subtract to get gff of interest minus gff repeats, then use featureCounts or bedtools coverage using the resulting gff to count reads mapping to the remaining features.

ADD COMMENTlink written 22 months ago by h.mon29k

Thank you I will give it a try!

ADD REPLYlink written 22 months ago by alvarocentron9110
2
gravatar for Alex Reynolds
22 months ago by
Alex Reynolds30k
Seattle, WA USA
Alex Reynolds30k wrote:

Convert to BED via convert2bed helper scripts:

$ gff2bed < annotations.gff > annotations.bed
$ gff2bed < rmsk.gff > rmsk.bed
$ bam2bed < reads.bam > reads.bed

If you want at least X reads that overlap annotations that do not overlap repeatmasked regions:

$ X=1234
$ bedmap --count --echo --delim '\t' annotations.bed reads.bed | awk -vX=${X} '$1 >= X' | cut -f2- | bedops -n 1 - rmsk.bed > answer.bed

(Replace X=1234 with whatever threshold you want.)

The file answer will contain annotations that meet your read threshold and which do not overlap repeatmasked regions.

You could instead do conversion, mapping, and filtering with the following one-liner, which avoids making intermediate files and so will be even faster than the usual BEDOPS speedup:

$ gff2bed < annotations.gff | bedmap --count --echo --delim '\t' - <(bam2bed < reads.bam) | awk -vX=${X} '$1 >= X' | cut -f2- | bedops -n 1 - <(gff2bed < rmsk.gff) > answer.bed
ADD COMMENTlink modified 22 months ago • written 22 months ago by Alex Reynolds30k

I will try this one as well, so I can compare both, thank you very much

ADD REPLYlink written 22 months ago by alvarocentron9110
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2152 users visited in the last hour