Question: Gatk-2.3-9 And Input File Format
0
gravatar for GPR
5.8 years ago by
GPR310
Mexico
GPR310 wrote:

Hello, I am having sort of a nightmare in trying to format my bam files as required by GATK and have pretty much ran out of ideas. I will therefore appreciate some help.

I have followed the indications suggested to me here (http://www.broadinstitute.org/gatk/guide/article?id=1204). My bam files are sort-ordered and have read groups added, using Picard-Tools. I also have them indexed with either Picard-Tools, SAMTools or even BAMTools.

One of the problems I am facing is that while it is indicated that GATK only takes indexed bam files, it gives me the following error every time I input a *.bai

<< Invalid command line: The GATK reads argument (-I, --input_file) supports only BAM files with the .bam extension and lists of BAM files with the .list extension, but the file /home/gp53/tophat2-eber-2nd-R1-readgroups-reorder.bai has neither extension. Please ensure that your BAM file or list of BAM files is in the correct format, update the extension, and try again.

>

I have checked my read groups and headers to make sure they look like the one specified in the GATK website (http://www.broadinstitute.org/gatk/guide/article?id=1204). Using a non-indexed, yet sort-ordered/readgroup-added bam file, I ran RealignerTargetCreator and I got the following error:

<< ERROR MESSAGE: SAM/BAM file SAMFileReader{/home/gp53/tophat2-eber-2nd-R1-readgroups-reorder.bam} is malformed: Read HWI-ST830:129:D1459ACXX:8:1208:6666:45578 is either missing the read group or its read group is not defined in the BAM header, both of which are required by the GATK

>

My header looks like this:

@VN:1.0  SO:coordinate
@SQ     SN:chrM LN:16571        UR:file:/home/gp53/bwa/genome.fa        M5:d2ed829b8a1628d16cbeee88e88e39eb
@SQ     SN:chr1 LN:249250621    UR:file:/home/gp53/bwa/genome.fa        M5:1b22b98cdeb4a9304cb5d48026a85128
@SQ     SN:chr2 LN:243199373    UR:file:/home/gp53/bwa/genome.fa        M5:a0d9851da00400dec1098a9255ac712e
@SQ     SN:chr3 LN:198022430    UR:file:/home/gp53/bwa/genome.fa        M5:641e4338fa8d52a5b781bd2a2c08d3c3
@SQ     SN:chr4 LN:191154276    UR:file:/home/gp53/bwa/genome.fa        M5:23dccd106897542ad87d2765d28a19a1
@SQ     SN:chr5 LN:180915260    UR:file:/home/gp53/bwa/genome.fa        M5:0740173db9ffd264d728f32784845cd7
@SQ     SN:chr6 LN:171115067    UR:file:/home/gp53/bwa/genome.fa        M5:1d3a93a248d92a729ee764823acbbc6b
@SQ     SN:chr7 LN:159138663    UR:file:/home/gp53/bwa/genome.fa        M5:618366e953d6aaad97dbe4777c29375e
@SQ     SN:chr8 LN:146364022    UR:file:/home/gp53/bwa/genome.fa        M5:96f514a9929e410c6651697bded59aec
@SQ     SN:chr9 LN:141213431    UR:file:/home/gp53/bwa/genome.fa        M5:3e273117f15e0a400f01055d9f393768
@SQ     SN:chr10        LN:135534747    UR:file:/home/gp53/bwa/genome.fa        M5:988c28e000e84c26d552359af1ea2e1d
@SQ     SN:chr11        LN:135006516    UR:file:/home/gp53/bwa/genome.fa        M5:98c59049a2df285c76ffb1c6db8f8b96
@SQ     SN:chr12        LN:133851895    UR:file:/home/gp53/bwa/genome.fa        M5:51851ac0e1a115847ad36449b0015864
@SQ     SN:chr13        LN:115169878    UR:file:/home/gp53/bwa/genome.fa        M5:283f8d7892baa81b510a015719ca7b0b
@SQ     SN:chr14        LN:107349540    UR:file:/home/gp53/bwa/genome.fa        M5:98f3cae32b2a2e9524bc19813927542e
@SQ     SN:chr15        LN:102531392    UR:file:/home/gp53/bwa/genome.fa        M5:e5645a794a8238215b2cd77acb95a078
@SQ     SN:chr16        LN:90354753     UR:file:/home/gp53/bwa/genome.fa        M5:fc9b1a7b42b97a864f56b348b06095e6
@SQ     SN:chr17        LN:81195210     UR:file:/home/gp53/bwa/genome.fa        M5:351f64d4f4f9ddd45b35336ad97aa6de
@SQ     SN:chr18        LN:78077248     UR:file:/home/gp53/bwa/genome.fa        M5:b15d4b2d29dde9d3e4f93d1d0f2cbc9c
@SQ     SN:chr19        LN:59128983     UR:file:/home/gp53/bwa/genome.fa        M5:1aacd71f30db8e561810913e0b72636d
@SQ     SN:chr20        LN:63025520     UR:file:/home/gp53/bwa/genome.fa        M5:0dec9660ec1efaaf33281c0d5ea2560f
@SQ     SN:chr21        LN:48129895     UR:file:/home/gp53/bwa/genome.fa        M5:2979a6085bfe28e3ad6f552f361ed74d
@SQ     SN:chr22        LN:51304566     UR:file:/home/gp53/bwa/genome.fa        M5:a718acaa6135fdca8357d5bfe94211dd
@SQ     SN:chrX LN:155270560    UR:file:/home/gp53/bwa/genome.fa        M5:7e0e2e580297b7764e31dbc80c2540dd
@SQ     SN:chrY LN:59373566     UR:file:/home/gp53/bwa/genome.fa        M5:1e86411d73e6f00a10590f976be01623
@RG     ID:null PL:illumina     PU:single_lane  LB:unstranded   SM:tophat-eber-2nd-R1
@PG     ID:TopHat       VN:2.0.5        CL:/usr/local/bin/tophat2 -p 16 -g 1 -z pigz -G /home/gp53/tophat/genes.gtf --no-novel-juncs -o tophat-eber-2nd-R1 /home/administrator/Bowtie2Index/genome /media/Elements/Genaro/input/eber-2nd-R1.fastq

I would appreciate your help on this. G.

gatk bam • 2.6k views
ADD COMMENTlink modified 5.8 years ago by vdauwera830 • written 5.8 years ago by GPR310
1
gravatar for vdauwera
5.8 years ago by
vdauwera830
Cambridge, MA
vdauwera830 wrote:

Attempting to answer at http://gatkforums.broadinstitute.org/discussion/2201/indelrealigner-input-file

ADD COMMENTlink written 5.8 years ago by vdauwera830

Thanks so much. I have the RealignerTargetCreator running now in both BWA and TopHat2 alignments.The one thing I changed is to leave the ID=string option as default=1 in AddOrReplaceReadGroups.jar.

That pretty much eliminated the recurring error: Read HWI-ST830:129:D1459ACXX:8:1208:6666:45578 is either missing the read group or its read group is not defined in the BAM header, both of which are required by the GATK

ADD REPLYlink written 5.8 years ago by GPR310
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1736 users visited in the last hour