How to keep only first part of the name when adding AddOrReplaceReadGroups in multiple BAM files at a time?
1
0
Entering edit mode
5 weeks ago
kabir.deb ▴ 80

Hi,

I was wondering how can I use AddOrReplaceReadGroups for assinging Read Group (RG) tags for multiple bam files. I've 20 BAM files with names cfR_1....20_valAligned_star_mrdup_sort.bam generated using STAR aligner. Here, I'm expecting the added RG name should be only cfR_1, not the additional portion of the name.

cd /path to BAM/rnaBAM

file1=$(ls *_valAligned_star_mrdup_sort.bam | sed -n ${SLURM_ARRAY_TASK_ID}p)

picard AddOrReplaceReadGroups VALIDATION_STRINGENCY=LENIENT I=$file1 O=${file1%%.*}_rg.bam RGID=${file1%%.*} RGLB=${file1%%.*}
RGPL=illumina RGPU=run RGSM=${file1%%.*}


samtools view -H cfR_1_valAligned_star_mrdup_sort_rg.bam | grep "@RG"
[W::bam_hdr_read] EOF marker is absent. The input is probably truncated.
@RG ID:cfR_1_valAligned_star_mrdup_sort LB:cfR_1_valAligned_star_mrdup_sort PL:illumina SM:cfR_1_valAligned_star_mrdup_sort PU:run

I ended up with ID:cfR_1_valAligned_star_mrdup_sort, while I just want cfR_1 to be mentioned only.

SLURM BASH GATK Picard • 248 views
ADD COMMENT
0
Entering edit mode
5 weeks ago
kabir.deb ▴ 80

Well, I got a solution, First I changed the input bam file name cfR_1_valAligned_star_mrdup_sort.bam to cfR-1_valAligned_star_mrdup_sort.bam, and then AddOrReplaceReadGroups.

picard AddOrReplaceReadGroups VALIDATION_STRINGENCY=LENIENT I=$file1 O=${file1%%.*}_rg.bam RGID=${file1%%_*} RGLB=${file1%%_*}
RGPL=illumina RGPU=run RGSM=${file1%%_*}
ADD COMMENT

Login before adding your answer.

Traffic: 1611 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6