Hi,
I have samples sequenced over three lanes that I am mapping together (paired end reads, three forward and three reverse) using STAR as follows:
STAR --runThreadN 8 --genomeDir data/genome_files/human_Ensembl/star_index --readFilesCommand zcat --readFilesIn trimmed-2231_L1_1P.fastq.gz,trimmed-2231_L2_1P.fastq.gz,trimmed-2231_L3_1P.fastq.gz trimmed-2231_L1_2P.fastq.gz,trimmed-2231_L2_2P.fastq.gz,trimmed-2231_L3_2P.fastq.gz --outSAMattrRGline ID:860167 , ID:862659 , ID:862660 --outSAMtype BAM SortedByCoordinate --outFileNamePrefix mapping.dir/2231_
I am planning to perform variant calling downstream and require read groups for GATK, so I added these using --outSAMattrRGline ID:860167 , ID:862659 , ID:862660
However, I am now trying to run picard SetNmMdAndUqTags
and am getting the following error:
Exception in thread "main" htsjdk.samtools.SAMFormatException: Error parsing SAM header. @RG line missing SM tag. Line: @RG ID:860167; File mapping.dir/2231_V1.dupmarked.bam; Line number 196 at htsjdk.samtools.SAMTextHeaderCodec.reportErrorParsingLine(SAMTextHeaderCodec.java:258) at htsjdk.samtools.SAMTextHeaderCodec.access$200(SAMTextHeaderCodec.java:46) at htsjdk.samtools.SAMTextHeaderCodec$ParsedHeaderLine.requireTag(SAMTextHeaderCodec.java:358) at htsjdk.samtools.SAMTextHeaderCodec.parseRGLine(SAMTextHeaderCodec.java:168) at htsjdk.samtools.SAMTextHeaderCodec.decode(SAMTextHeaderCodec.java:110) at htsjdk.samtools.BAMFileReader.readHeader(BAMFileReader.java:704) at htsjdk.samtools.BAMFileReader.<init>(BAMFileReader.java:298) at htsjdk.samtools.BAMFileReader.<init>(BAMFileReader.java:176) at htsjdk.samtools.SamReaderFactory$SamReaderFactoryImpl.open(SamReaderFactory.java:406) at htsjdk.samtools.SamReaderFactory$SamReaderFactoryImpl.open(SamReaderFactory.java:209) at picard.sam.SetNmMdAndUqTags.doWork(SetNmMdAndUqTags.java:123) at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:308) at picard.cmdline.PicardCommandLine.instanceMain(PicardCommandLine.java:103) at picard.cmdline.PicardCommandLine.main(PicardCommandLine.java:113)
It looks like I also need to add SM tags - based on my reading, these should be one per sample, so all the reads in my BAM file should have the same SM tag. Is this correct, and if so, how can I add this? It looks like Picard AddOrReplaceReadGroups can only replace all the read groups which I don't want to do.
Many thanks,
Lucy