Question: GATK: SAM file doesn't have any read groups defined in the header.
0
gravatar for mahnoornaseer97
9 weeks ago by
mahnoornaseer970 wrote:

enter code here ##### ERROR MESSAGE: SAM/BAM/CRAM file /home/mahnoor/Documents/IGV_DATA/Wheat_DATA/test_wheat/test_data/Ta63c-A1010005_aligned-20.sorted.bam is malformed. Please see https://software.broadinstitute.org/gatk/documentation/article?id=1317for more information. Error details: SAM file doesn't have any read groups defined in the header. The GATK no longer supports SAM files without read groups ##### ERROR ------------------------------------------------------------------------------------------

This is the error message that i received. i have tried AddOrReplaceReadGroups and FixMateInformation and ReplaceSamHeader and ValidateSamFile from which i got an ERROR :Missing_Read_Groups.

i am very new to programming and bioinformatics,

i am using this syntax for HAPLOTYPECALLER:

java -jar /home/mahnoor/Documents/SOFTWARES/GATK_MINE/GenomeAnalysisTK.jar -T HaplotypeCaller -R /home/mahnoor/Documents/IGV_DATA/Wheat_DATA/test_wheat/test_data/reference.fa -I /home/mahnoor/Documents/IGV_DATA/Wheat_DATA/reference.bam  -o /home/mahnoor/Documents/IGV_DATA/Wheat_DATA/reference.bam.vcf

as you can see i have given all my files the same name in case the system is not recognizing the file.

Can you please let me know what im doing wrong or what else can i do? any help would be much appreciated.

Thank you.

rna-seq gatk • 335 views
ADD COMMENTlink modified 9 weeks ago • written 9 weeks ago by mahnoornaseer970

I added markup to your post for increased readability. You can do this by selecting the text and clicking the 101010 button. When you compose or edit a post that button is in your toolbar, see image below:

101010 Button

ADD REPLYlink modified 9 weeks ago • written 9 weeks ago by WouterDeCoster21k

oh Great! thank you

ADD REPLYlink written 9 weeks ago by mahnoornaseer970
1
gravatar for WouterDeCoster
9 weeks ago by
Belgium
WouterDeCoster21k wrote:

The error is really self-explanatory and googling the error message would tell you what you need.

Error details: SAM file doesn't have any read groups defined in the header.

You need Picard AddOrReplaceReadGroups.

i am very new to programming and bioinformatics,

SPOILER ALERT: you will spend the next few months/years googling error messages and hopping from error to error until something works. Welcome to bioinformatics. But it will get better.

ADD COMMENTlink modified 9 weeks ago • written 9 weeks ago by WouterDeCoster21k

alright ! thanks But i have tried picards AddOrReplaceReadGroups twice, it still gives the same error?

ADD REPLYlink written 9 weeks ago by mahnoornaseer970
1

Hello,

what was the exact command you use for Picards AddOrReplaceReadGroups? Please also post the bam header. For this you can use this command:

samtools view -H /home/mahnoor/Documents/IGV_DATA/Wheat_DATA/reference.bam

fin swimmer

ADD REPLYlink written 9 weeks ago by finswimmer430

The exact command for Picards AddOrReplaceReadGroups which i used: java -jar picard.jar AddOrReplaceReadGroups -I reference.bam -O reference.rg.bam RGID=s_6 RGPL=illumina RGPU=HWUSI-EAS535_0025 RGSM=s_6

i used your command, this is what i got samtools view -H /home/mahnoor/Documents/IGV_DATA/Wheat_DATA/reference.bam

@SQ SN:T_aestivum_b59 LN:14738 @PG ID:bwa PN:bwa VN:0.6.0-r85

ADD REPLYlink written 9 weeks ago by mahnoornaseer970
1

You need to continue with the reference.rg.bam. Not the reference.bam.

ADD REPLYlink written 9 weeks ago by WouterDeCoster21k

i changed the previous command a little, i have used this instead

java -jar picard.jar AddOrReplaceReadGroups  I=/home/mahnoor/Documents/IGV_DATA/Wheat_DATA/reference.rg.bam  O=/home/mahnoor/Documents/IGV_DATA/Wheat_DATA/referenceoutput.rg.bam    RGID=4  RGLB=lib1   RGPL=illumina   RGPU=unit1   RGSM=20

this is what im getting

[Mon Jul 17 16:27:47 PKT 2017] picard.sam.AddOrReplaceReadGroups INPUT=/home/mahnoor/Documents/IGV_DATA/Wheat_DATA/reference.bam OUTPUT=/home/mahnoor/Documents/IGV_DATA/Wheat_DATA/referenceoutput.rg.bam RGID=4 RGLB=lib1 RGPL=illumina RGPU=unit1 RGSM=20    VERBOSITY=INFO QUIET=false VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false GA4GH_CLIENT_SECRETS=client_secrets.json
[Mon Jul 17 16:27:47 PKT 2017] Executing as mahnoor@biology-OptiPlex-7040 on Linux 4.4.0-83-generic amd64; OpenJDK 64-Bit Server VM 1.8.0_131-8u131-b11-0ubuntu1.16.04.2-b11; Picard version: 2.10.2-SNAPSHOT
INFO    2017-07-17 16:27:47 AddOrReplaceReadGroups  Created read group ID=4 PL=illumina LB=lib1 SM=20
[Mon Jul 17 16:27:52 PKT 2017] picard.sam.AddOrReplaceReadGroups done. Elapsed time: 0.08 minutes.*

however the error still remains ... with SAM/BAM/CRAM is malformed.

ADD REPLYlink modified 9 weeks ago by WouterDeCoster21k • written 9 weeks ago by mahnoornaseer970

Which command did you use with GATK this time?

ADD REPLYlink written 9 weeks ago by WouterDeCoster21k

java -jar /home/mahnoor/Documents/SOFTWARES/GATK_MINE/GenomeAnalysisTK.jar -T HaplotypeCaller -R /home/mahnoor/Documents/IGV_DATA/Wheat_DATA/test_wheat/test_data/reference.fa -I /home/mahnoor/Documents/IGV_DATA/Wheat_DATA/reference.bam -o /home/mahnoor/Documents/IGV_DATA/Wheat_DATA/reference.bam.vcf

^ using this command

ADD REPLYlink written 9 weeks ago by mahnoornaseer970

You need to continue with the reference.rg.bam. Not the reference.bam.

ADD REPLYlink written 9 weeks ago by WouterDeCoster21k

java -jar /home/mahnoor/Documents/SOFTWARES/GATK_MINE/GenomeAnalysisTK.jar -T HaplotypeCaller -R /home/mahnoor/Documents/IGV_DATA/Wheat_DATA/test_wheat/test_data/reference.fa -I /home/mahnoor/Documents/IGV_DATA/Wheat_DATA/reference.rg.bam -o /home/mahnoor/Documents/IGV_DATA/Wheat_DATA/reference.bam.vcf

^ This is what i command with reference.rg.bam

but it gives an error-- *##### ERROR MESSAGE: Could not read file /home/mahnoor/Documents/IGV_DATA/Wheat_DATA/reference.rg.bam because java.io.FileNotFoundException: /home/mahnoor/Documents/IGV_DATA/Wheat_DATA/reference.rg.bam (No such file or directory)

<h5>ERROR ------------------------------------------------------------------------------------------*</h5>
ADD REPLYlink written 9 weeks ago by mahnoornaseer970

This means the path to reference.rg.bam is incorrect. GATK has very clear error messages most of the time, you just have to read them.

Based on your previous command the path should be /home/mahnoor/Documents/IGV_DATA/Wheat_DATA/referenceoutput.rg.bam I believe, but these are issues you should be able to solve yourself.

ADD REPLYlink written 9 weeks ago by WouterDeCoster21k

IT WORKED! I WAS USING THE PREVIOUS PATH. i continue with the new one and it worked!Thank you!

ADD REPLYlink written 9 weeks ago by mahnoornaseer970
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 636 users visited in the last hour