Count number of unique alignments to each chromosome from sam file
0
0
Entering edit mode
2.5 years ago
sg197 ▴ 40

Hi,

I want to count the number of unique reads aligned to each chromosome after mapping to the genome. I don't mind if something is multimapped, say if read A maps to both chr3 and chr6 that is fine, but if read A maps to chr6 twice I want to only count it once.

I've pieced together 2 commands which I think should both do this, but not sure whether they are right/ one is preferred.

option 1: samtools view file.bam | cut -f 1,3 | sort | uniq -c > chr_stats.txt

option 2: awk ' $1 !~ /@/ {print $3}' samtools view file.bam | sort | uniq -c > chr_stats.txt

1) convert a bam to sam 2) pull out column 1 and 3 from the sam file (read ID and chr) 3) sort these 4) only keep unique lines 5) count the number of unique IDs per chromosome 6) output to result.txt file

I'm new to piping commands using bash so would appreciate if someone could clarify whther these are doing what I want them to do

Thanks :)

samtools sort unix uniq sam • 883 views
ADD COMMENT
0
Entering edit mode

Take a look at pileup.sh from BBMap suite. Use secondary=f flag to exclude those alignments.

ADD REPLY
0
Entering edit mode

Thanks, I had a quick look at pileup.sh, does this work in same as samtools -F 0x100 (removes non primary alignments)? Essentially I want to keep secondary alignments but only if they're on a different chromosome to the primary alignment, so was thinking using bash commands to pull out unique lines from fields 1 and 3, the readID and chr, from a sam file would be the simplest way

ADD REPLY

Login before adding your answer.

Traffic: 1503 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6