can anyone explain read id sorted in the samblaster page
0
1
Entering edit mode
9.1 years ago
Ming Tommy Tang ★ 3.9k

Hi,

I am using samblaster to mark duplicates, and it requires the reads to be read id sorted.

Can anyone explain it? I have read sam specification from here:

Dad's data:

@RG     ID:FLOWCELL1.LANE1      PL:ILLUMINA     LB:LIB-DAD-1 SM:DAD      PI:200
@RG     ID:FLOWCELL1.LANE2      PL:ILLUMINA     LB:LIB-DAD-1 SM:DAD      PI:200
@RG     ID:FLOWCELL1.LANE3      PL:ILLUMINA     LB:LIB-DAD-2 SM:DAD      PI:400
@RG     ID:FLOWCELL1.LANE4      PL:ILLUMINA     LB:LIB-DAD-2 SM:DAD      PI:400

Mom's data:

@RG     ID:FLOWCELL1.LANE5      PL:ILLUMINA     LB:LIB-MOM-1 SM:MOM      PI:200
@RG     ID:FLOWCELL1.LANE6      PL:ILLUMINA     LB:LIB-MOM-1 SM:MOM      PI:200
@RG     ID:FLOWCELL1.LANE7      PL:ILLUMINA     LB:LIB-MOM-2 SM:MOM      PI:400
@RG     ID:FLOWCELL1.LANE8      PL:ILLUMINA     LB:LIB-MOM-2 SM:MOM      PI:400

Kid's data:

@RG     ID:FLOWCELL2.LANE1      PL:ILLUMINA     LB:LIB-KID-1 SM:KID      PI:200
@RG     ID:FLOWCELL2.LANE2      PL:ILLUMINA     LB:LIB-KID-1 SM:KID      PI:200
@RG     ID:FLOWCELL2.LANE3      PL:ILLUMINA     LB:LIB-KID-2 SM:KID      PI:400
@RG     ID:FLOWCELL2.LANE4      PL:ILLUMINA     LB:LIB-KID-2 SM:KID      PI:400

The @RG ID is to identify reads from a specific lane, SM is for the sample name. So, what is the read id? I am a bit confused. My bam file only contains one SM and one ID.

Thank you!

Ming

bam sequencing • 3.6k views
ADD COMMENT
1
Entering edit mode

Posting as a comment because I'm not entirely sure it's correct...

From the context in the samblaster documentation, I suspect that "read-id" is what would normally be called "query name" or "read name" in the spec. In other words, use samtools sort -n. That would also make sense given that it explicitly mentions that "read-id" sorting is what aligners produce.

ADD REPLY
0
Entering edit mode

Thanks for your reply. my bam files were sorted by coordinates. I might have to sort them by name. I know HTSeq requires bam files to be sorted by name (-n), I am not sure whether the same requirement is for samblaster.

ADD REPLY

Login before adding your answer.

Traffic: 1266 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6