BAM file reads mapping to multiple genes
3
0
Entering edit mode
6 months ago
Nicole • 0

I am unfamiliar with BAM files and pretty new to the linux command line. I have what I suspect is a fairly simple problem to solve. I have a dataset where I believe there are a large number of reads mapping to multiple genes. I am trying to find a way to filter for reads that map to more than one gene. Thanks for any help!

Edit: This is data from a scRNA sequencing experiment done with 10X equipment. The cells were from a rabbit which is not supported by 10X for genome alignment so I made a custom reference genome. I had a low rate of mapping to the genome (~20%) and I am trying to figure out the cause - if it is an issue with the reference or our sample. I think there is a possibility that the multiple mapping is due to overlapping annotations in the reference genome (this was suggested to me by 10X support) in which case I don't want to be filtering out those reads I want to fix the reference. But here I am trying to identify if that is actually the case and if so if there are a particular set of genes that are the problem.

samtools sequencing 10x scrnaseq bamfile • 653 views
0
Entering edit mode
6 months ago
tomas4482 ▴ 70

Do you align the reads to your own reference or do you get the bam from other data source? If it is the first case, you can set your aligner to ignore multiple-mapping reads and align those pair-matched reads. Most aligners have this function. If it is the second case, samtools view -bq 1 file.bam > unique.bam should work. Take a reference here.

0
Entering edit mode
6 months ago
Marco Pannone ▴ 200

What sort of data are you dealing with? ChIP-seq data? However, if you want to remove multimapping reads (for a valid reason) you can use sambamba in the following way:

sambamba view -h -f bam -F "[XS] == null" input.bam -o output.bam

XS flag is a "mark" given by certain aligners (such as Bowtie2) to reads that report multiple alignments. There is a debate if removing or not multimapping reads. Personally, when dealing with ChIP-seq data I always remove multimapping reads.

0
Entering edit mode

I updated my original post with more detail. This data is from a scRNA seq experiment and it was processed using 10X cellranger count which uses the STAR aligner. The reads were removed by cellranger but I am not sure they should be removed as I explain in my edit above.

0
Entering edit mode
6 months ago
colindaven ★ 3.8k

A more general option is to filter a BAM file by Mapping Quality to exclude poorly aligned reads

Just one answer on how to do that is here:

Filtering A Sam File For Quality Scores