Count unique fragments from a given BAM file
2
0
Entering edit mode
4 hours ago
Ankit ▴ 520

Hi All, Could any one suggest a tool to count unique fragments from a given BAM file? My data is exome-seq

Any leads will be appreciated

Thanks

dna bam • 207 views
ADD COMMENT
0
Entering edit mode

comment or validate your previous questions, please.

ADD REPLY
0
Entering edit mode

Ankit :

do also take a minute to look at what Pierre Lindenbaum asked !

(that would be much appreciated ;-) , thanks )

ADD REPLY
0
Entering edit mode

Sure I will look into.

Thanks

ADD REPLY
0
Entering edit mode
4 hours ago

Does your BAM file contain UMI info or is it pure alignment?

here a few general pointers (for tools) : Picard MarkDuplicates ; sambamba markdup ; (samblaster?) . If the data contains UMI tags: have a look at the UMI_tools package

Also the BBtools package has subcommands to achieve this: clumpify.sh, bbduk.sh, ...

Those are mainly to make your data "unique" , for the counting part you can use samtools or alike

ADD COMMENT
0
Entering edit mode

Thanks, my data is non UMI for the counting part you can use samtools or alike

--counting unique fragments how? any flag /syntax?

ADD REPLY
0
Entering edit mode

the most basic one is :

samtools view -c <your bam file>

(will count everything, omitting you exome info )

a bit more 'advanced' : use bedtools coverage or samtools depth

(those you can 'subset' to only target your exome info)

ADD REPLY
0
Entering edit mode

That wont count unique fragments. It will just count coverage

ADD REPLY
0
Entering edit mode

No I got the logic make unique and count

ADD REPLY
0
Entering edit mode

exactly, first make unique and then count them :)

(don't think there is any that does this in one go)

ADD REPLY
0
Entering edit mode
1 hour ago
GenoMax 154k

a tool to count unique fragments from a given BAM file?

This can be a bit tricky. You can imagine a situation where you may have two fragments that may have some overlap but they could still be considered unique since they don't have identical sequence.

dedupebymapping.sh (https://bbmap.org/tools/dedupebymapping ) may also useful for using your existing BAM file (assuming the data is already mapped).

You could use a tool like clumpify.shfrom BBMap suite (https://bbmap.org/tools/clumpify ) to count reads and compress the file (read count is added to the fastq header) that are perfectly identical (and also allow for some mismatches). Other potential option is dedupe.sh (https://bbmap.org/tools/dedupe ).

ADD COMMENT

Login before adding your answer.

Traffic: 3946 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6