Question: Larger number of reads after filtering 'chrM'??
1
gravatar for cpc5
2.6 years ago by
cpc550
chicaho
cpc550 wrote:

Hi all,

I want to remove all mitochondrial reads from my BAM files, but after doing so the number of mapped reads increased. Here's what I did after indexing:

samtools idxstats .bam | cut -f 1 | grep -v chrM | xargs samtools view -b  .bam > clean.bam

However comparing with wc -l says:

samtools view -F 0x904 -c clean.bam 
19452291
samtools view -F 0x904 -c .bam
18328474

The second number matches the summary output by tophat2 (fr-frstrand)

any suggestions?

sequencing rna-seq alignment • 1.2k views
ADD COMMENTlink modified 2.6 years ago • written 2.6 years ago by cpc550
1

I am using the exact same command, except that I put chrM in quotation marks. Also, don't forget the -h flag in the last view, as many tools need the BAM header to work properly (or work at all).

ADD REPLYlink written 2.6 years ago by ATpoint19k

He's outputting to BAM (-b) so it will always have a header. It's only SAM where it's optional for some reason.

ADD REPLYlink written 2.6 years ago by John12k
5
gravatar for WouterDeCoster
2.6 years ago by
Belgium
WouterDeCoster40k wrote:

Your approach looks pretty strange to me, and I would instead do the following:

samtools view -h yourbam.bam | grep -v 'chrM' | samtools view -b > yourbamwithoutchrM.bam
ADD COMMENTlink written 2.6 years ago by WouterDeCoster40k
2

This will also filter out reads whose mate maps to chrM which I'm not sure the OP wants to do.

ADD REPLYlink written 2.6 years ago by d-cameron2.0k
1

Hm, you are probably right. I'm not sure why OP would want to remove reads mapping to chrM in the first place. But I think the event you describe is rare and discordant reads like that aren't really desirable.

That said, I'm not sure if there is a good reason to start tampering with bam files.

ADD REPLYlink written 2.6 years ago by WouterDeCoster40k
1

All the likes.

Also OP i ran your thing on some single-end data and it worked for me giving the correct result. I'd try WDC's method and if that gives the same result, update your samtools.

ADD REPLYlink written 2.6 years ago by John12k

Wdc method gave similar output. I also tracked back to another data set where I did the same and it worked fine. There must be something odd with my data. WdC, I was asked by my PI to remove chrM, not much I can do about that. Thanks for your answers everyone

ADD REPLYlink written 2.6 years ago by cpc550
1

What does just "samtools idxstats .bam | cut -f 1 | grep -v chrM " show? I'm curious. I bet there's something interesting in there like a chromosome mentioned twice.

ADD REPLYlink written 2.6 years ago by John12k
0
gravatar for cpc5
2.6 years ago by
cpc550
chicaho
cpc550 wrote:

I found the issue! this is with samtools 1.2 with reads mapped to gencode v25

the increased number is due to chr1 doubling in number after sorting by position (default). I still dont know why this happens though.

no chrM:
chr1    248956422   5140060 0
all:
chr1    248956422   2570030 0

maybe it has something to do with the reads named, GL00000x.x...

ADD COMMENTlink written 2.6 years ago by cpc550
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1624 users visited in the last hour