filtering out reads from bam file
2
1
Entering edit mode
7.1 years ago
sgmiriuka ▴ 10

Dear all,

this may be simple for others, not for me... Is there any way to filter out reads from a bam file based on a gff3 file information? My purpose is to get out the microRNA info from bams to run analysis without this 'contaminating' data.

thanks!!!

sgm

bam gff3 filter RNA-seq • 3.6k views
ADD COMMENT
1
Entering edit mode

You could convert your gff3 file to BED and then use samtools view region: Extracting reads from multiple regions

ADD REPLY
1
Entering edit mode

doesn't bedtools intersect -v -abam bam1 -b gff3 work???

ADD REPLY
1
Entering edit mode

As genomax2 said, you can use samtools view with the -L option. Transform your gff3 to BED and then make the complement of that file. Then use this file for -L

samtools view -L complement.bed input.bam -o output_filtered.bam
ADD REPLY
0
Entering edit mode

thanks all of you! great help

ADD REPLY
0
Entering edit mode
7.0 years ago
jlncrnt • 0

If you have only a bam and a gff3 file, you can use GenomicAlignments:

# Import gff as GRanges
gff <- import.gff3(pathgff)
# Import your alignments 
aln <- readGAlignments("myAlns.bam")
# Filter out the rRNA annotated ranges
no.miRNA <- gff[mcols(gff)$type!="miRNA_gene",] # Example with an SGD file
# Then get GenomicAlignments on filtered GRange object
filtered.bam <- subsetByOverlaps(aln,no.miRNA)
#Then eventually export the bam
export(filtered.bam,"filtered.bam",format = "BAM")

Use carefully, you will loose the headers.

If you have a fasta file of the microRNA, a better method would be to use the --un-gz option on Bowtie2, keeping unaligned reads while mapping on generated microRNA index.

ADD COMMENT
0
Entering edit mode
7.0 years ago

Here are some commands to run this quickly:

$ gff2bed < annotations.gff > annotations.bed
$ bam2bed < reads.bam > reads.bed
$ bedops --element-of 1 reads.bed annotations.bed > answer.bed

Or if you are using bash, this will avoid making intermediate files:

$ bedops --element-of 1 <(bam2bed < reads.bam) <(gff2bed < annotations.gff) > answer.bed

If you know the specific feature type of GFF elements you want, you could filter reads even more specifically:

$ bedops --element-of 1 <(bam2bed < reads.bam) <(gff2bed < annotations.gff | grep miRNA -) > answer.bed

Use Unix pipes and streams where you can; it will save you a lot of your time over alternatives.

ADD COMMENT
0
Entering edit mode

OP either wants to extract reads out in a new file or filter bam files to remove the reads (it is not completely clear). This answer in the present form does not seem to be doing either. Unless I am misinterpreting.

ADD REPLY
0
Entering edit mode

Maybe you are? Not sure.

ADD REPLY
0
Entering edit mode

well, this is the case. I have a bunch of bam files from small-rnaseq analyses, and I want to take out the info on microRNAs (coming from the gff3 file from mirbase). I do want the bam file left without the microRNA information and work with it. This file will allow me to align moRNA reads; doing so without taking out microRNA-related reads is a problem for microRNAs closely packed in clusters. hope its clear. thanks for your help! sgm

ADD REPLY
0
Entering edit mode

If you want reads without microRNAs, use --not-element-of instead of --element-of, with the grep miRNA statement. This would filter reads in reads.bam for those reads which do not overlap miRNA features from annotations.gff. In other words:

$ bedops --not-element-of 1 <(bam2bed < reads.bam) <(gff2bed < annotations.gff | grep miRNA -) > answer.bed

There are tools available for converting BED to BAM, in case you need that format, instead.

Hope this helps!

ADD REPLY

Login before adding your answer.

Traffic: 2447 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6