Question: Find overlap between bam and gff
0
gravatar for alvarocentron91
11 months ago by
alvarocentron9110 wrote:

Hello I have a .bam with RNA-seq data, a .gff with the regions I would like to study and another .gff with repeatmasked positions.

I would like to get a file where I have the regions from my .gff which overlap (completely and not) with at least X reads from my .bam and with no overlap within the repeatmasked positions.

Any tips?

Many thanks in advance!

rna-seq • 427 views
ADD COMMENTlink modified 11 months ago by Alex Reynolds28k • written 11 months ago by alvarocentron9110
2
gravatar for h.mon
11 months ago by
h.mon25k
Brazil
h.mon25k wrote:

Use bedtools subtract to get gff of interest minus gff repeats, then use featureCounts or bedtools coverage using the resulting gff to count reads mapping to the remaining features.

ADD COMMENTlink written 11 months ago by h.mon25k

Thank you I will give it a try!

ADD REPLYlink written 11 months ago by alvarocentron9110
2
gravatar for Alex Reynolds
11 months ago by
Alex Reynolds28k
Seattle, WA USA
Alex Reynolds28k wrote:

Convert to BED via convert2bed helper scripts:

$ gff2bed < annotations.gff > annotations.bed
$ gff2bed < rmsk.gff > rmsk.bed
$ bam2bed < reads.bam > reads.bed

If you want at least X reads that overlap annotations that do not overlap repeatmasked regions:

$ X=1234
$ bedmap --count --echo --delim '\t' annotations.bed reads.bed | awk -vX=${X} '$1 >= X' | cut -f2- | bedops -n 1 - rmsk.bed > answer.bed

(Replace X=1234 with whatever threshold you want.)

The file answer will contain annotations that meet your read threshold and which do not overlap repeatmasked regions.

You could instead do conversion, mapping, and filtering with the following one-liner, which avoids making intermediate files and so will be even faster than the usual BEDOPS speedup:

$ gff2bed < annotations.gff | bedmap --count --echo --delim '\t' - <(bam2bed < reads.bam) | awk -vX=${X} '$1 >= X' | cut -f2- | bedops -n 1 - <(gff2bed < rmsk.gff) > answer.bed
ADD COMMENTlink modified 11 months ago • written 11 months ago by Alex Reynolds28k

I will try this one as well, so I can compare both, thank you very much

ADD REPLYlink written 11 months ago by alvarocentron9110
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1255 users visited in the last hour