Question: Sam/Bam File Samfilereader
1
gravatar for mad.cichlids
5.7 years ago by
mad.cichlids100
Texas
mad.cichlids100 wrote:

Hi, there

I was trying to add read groups to my alignments and take advantage of GATK to call variants. I first did my cleaned reads and added read groups to the new alignment with (just one example file, I have multiple alignments for the whole dataset) :

bwa aln -n 0.04 -k 5 -I -t 4 contig ~/Desktop/Test_dataset2/1_cold_trimmed_clipped.fastq | bwa samse -r '@RG\tID:1_cold\tSM:1_cold\tPL:Illumina' contig - ~/Desktop/Test_dataset2/1_cold_trimmed_clipped.fastq > ~/Desktop/Test_dataset2/auto_data_31/aligned_sam/1_cold.sam

Then I converted my generated sam file and sorted them with :

samtools view -bS 1_cold.sam | samtools sort -m 300000000 - ./sam_bam/1_cold_sorted

Then I replace the duplicates:

java -Xmx2g -jar  ~/Desktop/apps/picard-tools-1.92/MarkDuplicates.jar INPUT=1_cold_sorted.bam OUTPUT=1_cold_dedup.bam METRICS_FILE=1_cold_metricsfile MAX_FILE_HANDLES_FOR_READ_ENDS_MAP=250 ASSUME_SORTED=true VALIDATION_STRINGENCY=SILENT REMOVE_DUPLICATES=True

Then merged all of the files after reducing duplicate:

samtools merge -rh RG.txt merged.bam *dedup.bam

 samtools index merged.bam 

 java -jar ~/Desktop/apps/picard-tools-1.92/CreateSequenceDictionary.jar R= contigs.fa o=contigs.dict

 samtools faidx contigs.fa

Then I was ready to realign with GATK:

java -Xmx2g -jar ~/Desktop/apps/GenomeAnalysisTK-2.6-5-gba531bd/GenomeAnalysisTK.jar \
  -T RealignerTargetCreator \
  -R contigs.fa \
  -o merged_output.intervals \
  -I ~/Desktop/Test_dataset2/auto_data_31/aligned_sam/sam_bam/merged.bam \
  --minReadsAtLocus 3

There was an error message:

##### ERROR MESSAGE: SAM/BAM file SAMFileReader{/home/bq/Desktop/Test_dataset2/auto_data_31/aligned_sam/sam_bam/merged.bam} is malformed: Read HWI-ST141_0363:2:1101:11445:6328#TAGCTT uses a read group (5_hot_dedup) that is not defined in the BAM header, which is not valid.  Please use http://gatkforums.broadinstitute.org/discussion/59/companion-utilities-replacereadgroups to fix this problem

And I went back to the original file to check header: They were both there.

samtools view -h 5_hot_dedup.bam | grep "^@RG"
@RG    ID:5_hot    PL:Illumina    SM:5_hot

grep "^@RG" 5_hot.sam 
@RG    ID:5_hot    SM:5_hot    PL:Illumina

And I rerun the whole analysis from the original sequence, the same error occurred again, Could you please shed some light on where I could possibly do wrong? I had worked on this for nearly weeks, but little progress has been made. There might be some other options like in picard addreadgroups, but i really want to know what did i do wrong with the analysis? Thank you very much for any suggestions and sorry for the long post.

snps • 3.7k views
ADD COMMENTlink modified 5.7 years ago by Biostar ♦♦ 20 • written 5.7 years ago by mad.cichlids100

in samtools merge -rh RG.txt what's the content of RG.txt ?

ADD REPLYlink written 5.7 years ago by Pierre Lindenbaum119k

Here is the content in RG.txt for the complete samples

    @RG    ID:1_cold    SM:1_cold    PL:Illumina
    @RG    ID:2_cold    SM:2_cold    PL:Illumina
    @RG    ID:3_cold    SM:3_cold    PL:Illumina
    @RG    ID:4_cold    SM:4_cold    PL:Illumina
    @RG    ID:5_cold    SM:5_cold    PL:Illumina
    @RG    ID:6_cold    SM:6_cold    PL:Illumina
    @RG    ID:1_hot    SM:1_hot    PL:Illumina
    @RG    ID:2_hot    SM:2_hot    PL:Illumina
    @RG    ID:3_hot    SM:3_hot    PL:Illumina
    @RG    ID:4_hot    SM:4_hot    PL:Illumina
    @RG    ID:5_hot    SM:5_hot    PL:Illumina
    @RG    ID:6_hot    SM:6_hot    PL:Illumina
ADD REPLYlink modified 5.7 years ago • written 5.7 years ago by mad.cichlids100
1
gravatar for Ashutosh Pandey
5.7 years ago by
Philadelphia
Ashutosh Pandey11k wrote:

The error clearly says that the read id associated with this read HWI-ST141_0363:2:1101:11445:6328#TAGCTT is 5_hot_dedup and this RG ID is not present in your header. Your header has 5_hot instead of 5_hot_dedup. so change your header accordingly.

ADD COMMENTlink written 5.7 years ago by Ashutosh Pandey11k

ok, here is what i did, i changed

@RG    ID:5_hot    SM:5_hot    PL:Illumina

to

@RG    ID:5_hot_dedup    SM:5_hot_dedup    PL:Illumina

then the same error occurred again:

ERROR MESSAGE: SAM/BAM file SAMFileReader{/home/bq/Desktop/Test_dataset2/auto_data_31/aligned_sam/sam_bam/merged.bam} is malformed: Read HWI-ST141_0363:2:1101:11445:6328#TAGCTT uses a read group (5_hot_dedup) that is not defined in the BAM header, which is not valid.  Please use http://gatkforums.broadinstitute.org/discussion/59/companion-utilities-replacereadgroups to fix this problem

I looked into this error:

samtools view -h merged.bam | grep "HWI-ST141_0363:2:1101:11445:6328#TAGCTT"
HWI-ST141_0363:2:1101:11445:6328#TAGCTT    16    NODE_1_length_332_cov_16.180723    1    37    50M    *    0    0    TCTCGGGTGGCCGGAACGTTTACTTTGAAAAAATTAGAGTGTTCAAAGCA    GHBIHGB@DGDJIGHEIIIIIHIJJIGJIIHGGIIGIHHHFHDFFFFCCC    X0:i:1    X1:i:0    MD:Z:50    PG:Z:MarkDuplicates    XG:i:0    NM:i:0    XM:i:0    XO:i:0    XT:A:U    RG:Z:5_hot_dedup

It seems the RG group is RG:Z:5_hot_dedup, but I never introduce this as my readgroup,

What I did for my bwa alignment and added the readgroup was

bwa aln -n 0.04 -k 5 -I -t 4 contig ~/Desktop/Test_dataset2/5_hot_trimmed_clipped.fastq | bwa samse -r '@RG\tID:5_hot\tSM:5_hot\tPL:Illumina' contig - ~/Desktop/Test_dataset2/5_hot_trimmed_clipped.fastq > ~/Desktop/Test_dataset2/auto_data_31/aligned_sam/5_hot.sam

so 5_hot_dedup was not not introduced as readgroup at the beginning, that is the part confused me. I really appreciate your time on my question, thanks a lot!

I also looked into my merged.bam file,the 5_dedup read group is not there either...

samtools view -h merged.bam | grep "^@RG"
@RG    ID:1_cold    SM:1_cold    PL:Illumina
@RG    ID:2_cold    SM:2_cold    PL:Illumina
@RG    ID:3_cold    SM:3_cold    PL:Illumina
@RG    ID:4_cold    SM:4_cold    PL:Illumina
@RG    ID:5_cold    SM:5_cold    PL:Illumina
@RG    ID:6_cold    SM:6_cold    PL:Illumina
@RG    ID:1_hot    SM:1_hot    PL:Illumina
@RG    ID:2_hot    SM:2_hot    PL:Illumina
@RG    ID:3_hot    SM:3_hot    PL:Illumina
@RG    ID:4_hot    SM:4_hot    PL:Illumina
@RG    ID:5_hot    SM:5_hot    PL:Illumina
@RG    ID:6_hot    SM:6_hot    PL:Illumina
ADD REPLYlink modified 5.7 years ago • written 5.7 years ago by mad.cichlids100
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 949 users visited in the last hour