Collapse mapped reads in sorted bam file based on the start and end position coordinates
1
1
Entering edit mode
3.7 years ago
xiaoleiusc ▴ 140

Dear Biostars Users,

Is there a way to collapse mapped reads in sorted bam files based on the start and end position coordinates regardless of sequences? I would like to collapse the number of reads with the same start and end position as "1" although these reads have different sequences. I attach an image here [ https://ibb.co/J7Fj00T] to illustrate my points if not clear in words.

collapse-duplicates-in-bam-file Thanks ahead.

Xiao

rna-seq • 1.3k views
ADD COMMENT
1
Entering edit mode
ADD REPLY
1
Entering edit mode

Thanks man. This is right on

ADD REPLY
0
Entering edit mode

Thanks, I followed your suggestions and re-uploaded my image. Feel free to let me know if there is any problem.

ADD REPLY
0
Entering edit mode

You could do this in R with GenomicRanges/Alignments relatively easily

ADD REPLY
1
Entering edit mode
3.7 years ago
GenoMax 141k

Sounds like you need to clumpify your reads BEFORE you align them (to remove sequence duplicates): A: Introducing Clumpify: Create 30% Smaller, Faster Gzipped Fastq Files

ADD COMMENT
0
Entering edit mode

Thanks Genomax, could Clumpify remove sequence duplicates with different sequences? I removed my sequence duplicates with FastX-toolkits (FASTQ/A Collapser) before mapping. Now I consider more stringent criteria of duplicates: reads mapped to the references that have the same start and end position in the alignment. Note that the sequences of these reads can be different.

ADD REPLY
1
Entering edit mode

clumpify will do that but not using the aligned BAM files. To get the result you need, you will need to first collapse the reads (clumpify allows errors, hdist=Number parameter, so reads containing one or more SNP's can be treated as having identical sequence). Then follow that by alignment.

ADD REPLY

Login before adding your answer.

Traffic: 2213 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6