I have a bam file that I'd like to filter. I want to filter out all the reads that were aligned to intronic regions - i.e., CIGAR field containing an N.
Anyone familiar with a way to filter out reads with CIGAR field containing an N?
Note: I could convert all the Bams to Sams and then write my own custom script, but was wondering if it'd be possible with samtools or picard tools directly, couldn't find any direct instruction.
Note2: The bam was generated by aligning mRNA-Seq to the genome.
Please let me know.
I'm also getting reads that were unmapped, so I added a filter to make sure
$6is real and the read is mapped, but unspliced:
@brentp's script will also work, with slight modification to filter out unmapped reads (bit 0x2):
@brentp, thanks, looks like what I was looking for. I thought I could do it using samtools directly (no conversion to sam) but this is good enough.
Do you actually want to filter out reads aligning to introns or reads that are spliced? Filtering out reads with N in the cigar string will filter spliced reads, which will typically not map to an intron but, rather, span over them.
@dpryan, thanks. Yes, I'd like to filter reads that are spliced, i.e., that span an intron