Question

Gatk-2.3-9 And Input File Format

0

Entering edit mode

11.2 years ago

GPR ▴ 390

Hello, I am having sort of a nightmare in trying to format my bam files as required by GATK and have pretty much ran out of ideas. I will therefore appreciate some help.

I have followed the indications suggested to me here (http://www.broadinstitute.org/gatk/guide/article?id=1204). My bam files are sort-ordered and have read groups added, using Picard-Tools. I also have them indexed with either Picard-Tools, SAMTools or even BAMTools.

One of the problems I am facing is that while it is indicated that GATK only takes indexed bam files, it gives me the following error every time I input a *.bai

<< Invalid command line: The GATK reads argument (-I, --input_file) supports only BAM files with the .bam extension and lists of BAM files with the .list extension, but the file /home/gp53/tophat2-eber-2nd-R1-readgroups-reorder.bai has neither extension. Please ensure that your BAM file or list of BAM files is in the correct format, update the extension, and try again.

>

I have checked my read groups and headers to make sure they look like the one specified in the GATK website (http://www.broadinstitute.org/gatk/guide/article?id=1204). Using a non-indexed, yet sort-ordered/readgroup-added bam file, I ran RealignerTargetCreator and I got the following error:

<< ERROR MESSAGE: SAM/BAM file SAMFileReader{/home/gp53/tophat2-eber-2nd-R1-readgroups-reorder.bam} is malformed: Read HWI-ST830:129:D1459ACXX:8:1208:6666:45578 is either missing the read group or its read group is not defined in the BAM header, both of which are required by the GATK

>

My header looks like this:

@VN:1.0  SO:coordinate
@SQ     SN:chrM LN:16571        UR:file:/home/gp53/bwa/genome.fa        M5:d2ed829b8a1628d16cbeee88e88e39eb
@SQ     SN:chr1 LN:249250621    UR:file:/home/gp53/bwa/genome.fa        M5:1b22b98cdeb4a9304cb5d48026a85128
@SQ     SN:chr2 LN:243199373    UR:file:/home/gp53/bwa/genome.fa        M5:a0d9851da00400dec1098a9255ac712e
@SQ     SN:chr3 LN:198022430    UR:file:/home/gp53/bwa/genome.fa        M5:641e4338fa8d52a5b781bd2a2c08d3c3
@SQ     SN:chr4 LN:191154276    UR:file:/home/gp53/bwa/genome.fa        M5:23dccd106897542ad87d2765d28a19a1
@SQ     SN:chr5 LN:180915260    UR:file:/home/gp53/bwa/genome.fa        M5:0740173db9ffd264d728f32784845cd7
@SQ     SN:chr6 LN:171115067    UR:file:/home/gp53/bwa/genome.fa        M5:1d3a93a248d92a729ee764823acbbc6b
@SQ     SN:chr7 LN:159138663    UR:file:/home/gp53/bwa/genome.fa        M5:618366e953d6aaad97dbe4777c29375e
@SQ     SN:chr8 LN:146364022    UR:file:/home/gp53/bwa/genome.fa        M5:96f514a9929e410c6651697bded59aec
@SQ     SN:chr9 LN:141213431    UR:file:/home/gp53/bwa/genome.fa        M5:3e273117f15e0a400f01055d9f393768
@SQ     SN:chr10        LN:135534747    UR:file:/home/gp53/bwa/genome.fa        M5:988c28e000e84c26d552359af1ea2e1d
@SQ     SN:chr11        LN:135006516    UR:file:/home/gp53/bwa/genome.fa        M5:98c59049a2df285c76ffb1c6db8f8b96
@SQ     SN:chr12        LN:133851895    UR:file:/home/gp53/bwa/genome.fa        M5:51851ac0e1a115847ad36449b0015864
@SQ     SN:chr13        LN:115169878    UR:file:/home/gp53/bwa/genome.fa        M5:283f8d7892baa81b510a015719ca7b0b
@SQ     SN:chr14        LN:107349540    UR:file:/home/gp53/bwa/genome.fa        M5:98f3cae32b2a2e9524bc19813927542e
@SQ     SN:chr15        LN:102531392    UR:file:/home/gp53/bwa/genome.fa        M5:e5645a794a8238215b2cd77acb95a078
@SQ     SN:chr16        LN:90354753     UR:file:/home/gp53/bwa/genome.fa        M5:fc9b1a7b42b97a864f56b348b06095e6
@SQ     SN:chr17        LN:81195210     UR:file:/home/gp53/bwa/genome.fa        M5:351f64d4f4f9ddd45b35336ad97aa6de
@SQ     SN:chr18        LN:78077248     UR:file:/home/gp53/bwa/genome.fa        M5:b15d4b2d29dde9d3e4f93d1d0f2cbc9c
@SQ     SN:chr19        LN:59128983     UR:file:/home/gp53/bwa/genome.fa        M5:1aacd71f30db8e561810913e0b72636d
@SQ     SN:chr20        LN:63025520     UR:file:/home/gp53/bwa/genome.fa        M5:0dec9660ec1efaaf33281c0d5ea2560f
@SQ     SN:chr21        LN:48129895     UR:file:/home/gp53/bwa/genome.fa        M5:2979a6085bfe28e3ad6f552f361ed74d
@SQ     SN:chr22        LN:51304566     UR:file:/home/gp53/bwa/genome.fa        M5:a718acaa6135fdca8357d5bfe94211dd
@SQ     SN:chrX LN:155270560    UR:file:/home/gp53/bwa/genome.fa        M5:7e0e2e580297b7764e31dbc80c2540dd
@SQ     SN:chrY LN:59373566     UR:file:/home/gp53/bwa/genome.fa        M5:1e86411d73e6f00a10590f976be01623
@RG     ID:null PL:illumina     PU:single_lane  LB:unstranded   SM:tophat-eber-2nd-R1
@PG     ID:TopHat       VN:2.0.5        CL:/usr/local/bin/tophat2 -p 16 -g 1 -z pigz -G /home/gp53/tophat/genes.gtf --no-novel-juncs -o tophat-eber-2nd-R1 /home/administrator/Bowtie2Index/genome /media/Elements/Genaro/input/eber-2nd-R1.fastq

I would appreciate your help on this. G.

gatk bam • 3.9k views

ADD COMMENT • link updated 11.2 years ago by vdauwera ★ 1.2k • written 11.2 years ago by GPR ▴ 390

score 1 · Answer 1 · 2013-02-15

1

Entering edit mode

11.2 years ago

vdauwera ★ 1.2k

Attempting to answer at http://gatkforums.broadinstitute.org/discussion/2201/indelrealigner-input-file

ADD COMMENT • link 11.2 years ago by vdauwera ★ 1.2k

0

Entering edit mode

Thanks so much. I have the RealignerTargetCreator running now in both BWA and TopHat2 alignments.The one thing I changed is to leave the ID=string option as default=1 in AddOrReplaceReadGroups.jar.

That pretty much eliminated the recurring error: Read HWI-ST830:129:D1459ACXX:8:1208:6666:45578 is either missing the read group or its read group is not defined in the BAM header, both of which are required by the GATK

ADD REPLY • link 11.2 years ago by GPR ▴ 390