How to extract reads that map exclusively to a single site with 1 or zero mismatches from BAM files
1
0
Entering edit mode
23 months ago
plicht ▴ 20

I generated BAM files by aligning human RNA reads against the human reference genome using BWA MEM and TopHat2. I would like to know how I can retain only reads that mapped exclusively to a single site in the reference genome with one or zero mismatches for downstream analysis.

RNAseq Samtools • 1.4k views
ADD COMMENT
2
Entering edit mode
23 months ago

Intersect your BAM file with a coordinate

  samtools view -h input.bam  chrom:start-end > subset.sam

to generate a sam file that only contains the alignment that overlap with your coordinate of interest, then match for tags NM:i:0 and NM:i:1

You could do the latter with a regular expression grep or with samsift (if you needed more complex matching later on).

https://github.com/karel-brinda/samsift

If you use UNIX grep then you would need to add the SAM header into the file. From UNIX it would look somewhat like this:

samtools view -h subset.sam chrom:start-end | egrep '(^@|(NM:i:(0|1)))' > results.sam
ADD COMMENT
0
Entering edit mode

Thanks a lot for your help, Ivstan.

Just for understanding purposes, what is the difference between samtools sort and samtools view -h chrom:start-end? Can I intersect the BAMs with the chrom coordinate on already sorted BAMS?

ADD REPLY
0
Entering edit mode

the BAM file needs to be sorted for the intersect to work, I forgot to mention that explicitly,

I assumed it was sorted because, as a rule, in general, all bam files should be always sorted by position.

ADD REPLY
0
Entering edit mode

Samtools tells me that:

[E::idx_find_and_load] Could not retrieve index file for 'input.bam' samtools view: Random alignment retrieval only works for indexed SAM.gz, BAM or CRAM files.

Should I perform samtools index -b input.bam beforehand and store the index in the same directory as the input.bam?

ADD REPLY
0
Entering edit mode

Would the extraction of reads that fall within a specified region (e. g. chr1) not also include multiple mapped reads?

ADD REPLY
0
Entering edit mode

some further filtering is necessary, look into what aligner you used and how that aligner marks the multimapped reads,

if it is bwa you can add -q 0 (the quality is set to zero for multimapped reads) or the presence of XA tags.

There is some variability in how aligners represent multimapped reads and that needs to be dealt with accordingly

ADD REPLY

Login before adding your answer.

Traffic: 1610 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6