Question: Problems running variant calling (VC) GATK: SAM/BAM/CRAM file dedup.bam is malformed
0
gravatar for Lila M
6 months ago by
Lila M 460
UK
Lila M 460 wrote:

Hi everybody, I'm trying to carry out some VC analysis, but I'm not very advanced at it, so I am a bit struggle... For my RNA-seq samples, I've run the following command lines:

#1 Mark duplicates
ls *bam | xargs -I {} -n 1 java -jar picard.jar MarkDuplicates I={} O={}.dedup  M={}.metrix.txt

#2 Index file
ls *bam | xargs -I {} -n 1 java -jar picard.jar BuildBamIndex  I={} O={}.bam.bai

#3  Split'N'Trim and reassign mapping qualities
ls *bam | xargs -I {} -n 1 java -jar GenomeAnalysisTK.jar -T SplitNCigarReads -R GRCh38.p10.genome.fa -I {} -o {}.split.bam -rf ReassignOneMappingQuality -RMQF 255 -RMQT 60 -U ALLOW_N_CIGAR_READS

But then, I get the following error:

##### ERROR MESSAGE: SAM/BAM/CRAM dedup.bam is malformed. Please see https://software.broadinstitute.org/gatk/documentation/article?id=1317for more information. Error details: SAM file doesn't have any read groups defined in the header.  The GATK no longer supports SAM files without read groups
##### ERROR ------------------------------------------------------------------------------------------

Any ideas about how to solve it? Thanks!

rna-seq vc gatk • 361 views
ADD COMMENTlink modified 6 months ago • written 6 months ago by Lila M 460
1

Have a look at one of the files, do you see anything obviously wrong with it?

ADD REPLYlink written 6 months ago by Devon Ryan88k

Not at all with

samtools view -H file.bam
@HD VN:1.4  SO:coordinate
@SQ SN:chr1 LN:248956422
@SQ SN:chr2 LN:242193529
@SQ SN:chr3 LN:198295559
@SQ SN:chr4 LN:190214555

but nothing comes up with:

samtools view -H file.bam | grep '^@RG'

does it means that my BAM file doesn't have read group and sample information?

ADD REPLYlink modified 6 months ago • written 6 months ago by Lila M 460
1

does it means that my BAM file doesn't have read group and sample information?

yes

ADD REPLYlink written 6 months ago by Pierre Lindenbaum118k

PAA* because they make it harder to read your post. In this case, it's probably "variant calling" what you are talking about. While VC may be a CUA** for you, this is not not necessarily the case for the rest of the users here.

* Please Avoid Abbreviations
** Commonly Used Abbreviation

ADD REPLYlink modified 6 months ago • written 6 months ago by WouterDeCoster37k
3
gravatar for Macspider
6 months ago by
Macspider2.8k
Vienna - BOKU
Macspider2.8k wrote:

Error details: SAM file doesn't have any read groups defined in the header. The GATK no longer supports SAM files without read groups

Your sam file doesn't have read groups. This is a common issue with GATK, as the program now wants the reads to be grouped for analysis purposes.

You can assign them manually, or you can use Picard's tool AddOrReplaceReadGroups. Third possibility, if this tool of Picard doesn't work, is to make your own script to perform this task, as I had to do (this is mine: https://github.com/MatteoSchiavinato/Utilities/blob/master/add-read-groups.py ).

ADD COMMENTlink modified 6 months ago • written 6 months ago by Macspider2.8k

And how can I get this info?

ADD REPLYlink written 6 months ago by Lila M 460
2

Make stuff up, the IDs and such are arbitrary.

ADD REPLYlink written 6 months ago by Devon Ryan88k
2

Or ask whoever prepared/ran the samples for real info.This information will be embedded in your files. If you make stuff up it will be difficult to make sense of it in a few days or if someone else needs to look at this data.

ADD REPLYlink modified 6 months ago • written 6 months ago by genomax64k

Both Devon and Genomax are right, because - at least my personal experience - it depends on your analysis.

Are you running an analysis composed of different replicates/accessions/genomes/species, or are the data coming from one source and one replicate?

  • Many sources: add read groups that make sense (in that case my script might not work as you desire)
  • One source: make stuff up, like using the sample name and that's it (my script will likely do the job - not 100% guaranteed, cause it worked with my data but haven't tested it extensively)
ADD REPLYlink modified 6 months ago • written 6 months ago by Macspider2.8k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 912 users visited in the last hour