How to extract unique fastq seq. from BAM, irrespective of FLAG value ?
2
0
Entering edit mode
14 months ago
ved_vyas ▴ 10

I need to extract fastq seq. from a BAM file for other experiments. But the fastq file should only contain unique fastq seq. [Here unique fastq means if there are multiple copies of a seq. in BAM file then in the fastq file I only want a single copy of that seq. ]

fastq BAM • 639 views
ADD COMMENT
1
Entering edit mode
14 months ago

The operation that you ask about is called deduplication.

But first, you'd need to extract the sequences which can be slightly more complicated for paired-end reads, you would need to collate then extract, see this post

https://lh3.github.io/2021/07/06/remapping-an-aligned-bam

Deduplication:

fastp 0.22 released, with new FASTQ deduplication feature.

ADD COMMENT
1
Entering edit mode
14 months ago

If you want to make sure that you get each read only once, even if it aligned to multiple places, and don't care if two separate reads have identical sequences, you can filter your bam file to remove secondary alignments. Each read will have exactly one primary alignment.

ADD COMMENT

Login before adding your answer.

Traffic: 870 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6