Understanding IntersectBed columns
0
1
Entering edit mode
5.8 years ago

I'm running intersectBed as follows:

intersectBed -a introns.bed -b accepted_hits.bam -wao > result.bed

Where introns.bed is a bed file containing all introns in hg38 and accepted_hits.bam is from STAR.

Here is a sample row from result.bed:

chr1 13220 14409 exon:NR_046018:3 . + refGene exon . ID=exon:NR_046018:3;Parent=NR_046018 chr1 13763 13864 SRR2149928_MCF10A_R1.17281814 0 - 101

In total there are 17 columns; I am a bit confused as to what the 15th and 17th column represent, which in the example above have values of 0 and 101 respectively.

Can anyone guide me as to what they are?

RNA-Seq next-gen • 1.4k views
ADD COMMENT
1
Entering edit mode

The last column is the number of basepairs overlapping between the two features, triggered by the -wao option. The columns before that are simply the entire entry from -a and -b. Only the last column is appended by bedtools, the rest must already be present in your input files.

ADD REPLY
0
Entering edit mode

That definitely makes sense, although I'm a bit confused as to what the 15th column stands for. I understand it comes from the bam file, but I'm not sure what it stands for.

ADD REPLY
0
Entering edit mode

Ok, sorry I misunderstood your initial question. This column is the 5th column in the BAM (SAM file) and indicates the mapping quality of the read alignment. In this case, it is 0, representing a read that aligned to multiple locations with equal score (multimapper). This is not uncommon, because based on the coordinates, it is at the very left of chromosome 1, which is a repetitive (low-complexity) region.

ADD REPLY
0
Entering edit mode

You should use the -split parameter with intersectBed, as this is RNAseq.

ADD REPLY

Login before adding your answer.

Traffic: 2010 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6