Question: Contig Incompatibility For Depth Of Coverage Calculations Using Gatk
0
gravatar for agatorano
6.7 years ago by
agatorano50
agatorano50 wrote:

I am trying to calculate the depth of coverage of an exome using gatk and am having trouble with two parts:

This is the command I am trying to emulate -

java -Xmx3072m -jar ./Sting/dist/GenomeAnalysisTK.jar \
-T DepthOfCoverage -I group1.READS.bam.list -L EXOME.interval_list \
-R ./human_g1k_v37.fasta \
-dt BY_SAMPLE -dcov 5000 -l INFO --omitDepthOutputAtEachBase --omitLocusTable \
--minBaseQuality 0 --minMappingQuality 20 --start 1 --stop 5000 --nBins 200 \
--includeRefNSites \
-o group1.DATA

first question is the group1.READS.bam.list I am confused by what they are asking for. Do they simply want paths to a handfulmy bam files separated by new lines?

second question is when I run the command it errors saying my contigs are incompatible.

 Input files reads and reference have incompatible contigs: No overlapping contigs found.

ERROR   reads contigs = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, X, Y, MT, GL000207.1, GL000226.1, GL000229.1, GL000231.1, GL000210.1, GL000239.1, GL000235.1, GL000201.1, GL000247...

ERROR   reference contigs = [chr1, chr2, chr3, chr4, chr5, chr6, chr7, chrX, chr8, chr9, chr10, chr11, chr12, chr13, chr14, chr15, chr16, chr17, chr18, chr20, chrY, chr19, chr22, chr21, chr6_ssto_hap7, chr6_mcf_hap5...

Is this an issue with the reference genome I used or my bam files? I used the hg19 because the human_g1k_v37 was erroring on the reference while hg19 did not.

Thank you enormously.

gatk depth-of-coverage contigs • 5.6k views
ADD COMMENTlink modified 6.6 years ago • written 6.7 years ago by agatorano50
1
gravatar for Ashutosh Pandey
6.6 years ago by
Philadelphia
Ashutosh Pandey11k wrote:

Hi ,

I am not sure but I think the last chromosome or contig may be the one causing the problem. The last contig i.e. "SN:NC007605" in the bam file is not present in Reference file and the dict file. I would remove all the reads belonging to that particular contig from the bam file and try again. This is all I could say right now. You can use this command "samtools view -h Input.bam | grep -v "NC007605" | samtools view -bS - > Output.bam and try the new bam file. Basically what I am trying to do here is that I want to make sure that the chromosome names and their order in reference fasta file should be same as present in the bam file. Once you have the incompatibility removed I think it should work.

ADD COMMENTlink written 6.6 years ago by Ashutosh Pandey11k
0
gravatar for Johan
6.7 years ago by
Johan860
Sweden
Johan860 wrote:

first question is the group1.READS.bam.list I am confused by what they are asking for. Do they simply want paths to a handfulmy bam files separated by new lines?

Yes

Is this an issue with the reference genome I used or my bam files? I used the hg19 because the humang1kv37 was erroring on the reference while hg19 did not.

This issue here is that you have to use the same bam-file you aligned against when your are doing the coverage calculations. So if you aligned against hg19, then you need to use hg19 again here. Otherwise the sequence dictionaries will not be identical (and supposing that you are using references with incompatible coordinates, there will be even more trouble for obvious reasons).

Cheers.

ADD COMMENTlink written 6.7 years ago by Johan860

Thank you for your response! I aligned against hg19 initially and this error was still present when attempting to run the function using hg19 as a reference.

ADD REPLYlink written 6.7 years ago by agatorano50

Sounds weird - if you use the exact same reference in both steps this shouldn't happen. Check that the contig names in the hg19 version you have are the same as the ones in your bam file. If those are the same, the only things that I can thing of of the top of my head is that the EXOM.interval_list might be malformated, or that you could try regenerating the index/dictionary files.

ADD REPLYlink written 6.7 years ago by Johan860

I was told that I needed to use the humang1kv37 reference but I got this error:

Badly formed genome loc: Contig 'chr start stop name' does not match any contig in the GATK sequence dictionary derived from the reference; are you sure you are using the correct reference fasta file?

Does this mean that I don't have the reference dict in gatk?

ADD REPLYlink written 6.6 years ago by agatorano50

Try removing the .dict file, I GATK does not find it it should try to recreate it. You should also check that your EXOME.interval_list has the same contig names and is derived from the same reference that you are using.

ADD REPLYlink written 6.6 years ago by Johan860

Is the .dict in the gatk jar file? I am unsure how to delete what is within it

ADD REPLYlink written 6.6 years ago by agatorano50

Johan is right. The chromosome names in the header of the fasta sequences that were collectively used as a reference genome should match with the new reference file you are providing for your current analysis (coverage). GATK will even complain if the first reference file had chromosome names as 1,2,3....MT and the current reference contains chromosome names as chr1,chr2..chrMT. Also, the order of the chromosomes matter too. The BAM file should be sorted in the same order as chromosome appear in the new reference file or vice versa.

ADD REPLYlink written 6.7 years ago by Ashutosh Pandey11k

I was told that I needed to use the humang1kv37 reference but I got this error:

Badly formed genome loc: Contig 'chr start stop name' does not match any contig in the GATK sequence dictionary derived from the reference; are you sure you are using the correct reference fasta file?

Does this mean that I don't have the reference dict in gatk?

ADD REPLYlink written 6.6 years ago by agatorano50

1) Can you run the command: grep ">" Reference_file.fasta (Reference file you are giving as an input to GATK) and paste the output here. It should be the header line for each chromosome. 2) Also, run: samtools view -H Input.bam file (BAM file going as an input in GATK) and paste the header information of your BAM file here.

ADD REPLYlink written 6.6 years ago by Ashutosh Pandey11k

Also, paste the GATK command you are using.

ADD REPLYlink written 6.6 years ago by Ashutosh Pandey11k

Head lines for each chromosomes:

1 dna:chromosome chromosome:GRCh37:1:1:249250621:1 2 dna:chromosome chromosome:GRCh37:2:1:243199373:1 3 dna:chromosome chromosome:GRCh37:3:1:198022430:1 4 dna:chromosome chromosome:GRCh37:4:1:191154276:1 5 dna:chromosome chromosome:GRCh37:5:1:180915260:1 6 dna:chromosome chromosome:GRCh37:6:1:171115067:1 7 dna:chromosome chromosome:GRCh37:7:1:159138663:1 8 dna:chromosome chromosome:GRCh37:8:1:146364022:1 9 dna:chromosome chromosome:GRCh37:9:1:141213431:1 10 dna:chromosome chromosome:GRCh37:10:1:135534747:1 11 dna:chromosome chromosome:GRCh37:11:1:135006516:1 12 dna:chromosome chromosome:GRCh37:12:1:133851895:1 13 dna:chromosome chromosome:GRCh37:13:1:115169878:1 14 dna:chromosome chromosome:GRCh37:14:1:107349540:1 15 dna:chromosome chromosome:GRCh37:15:1:102531392:1 16 dna:chromosome chromosome:GRCh37:16:1:90354753:1 17 dna:chromosome chromosome:GRCh37:17:1:81195210:1 18 dna:chromosome chromosome:GRCh37:18:1:78077248:1 19 dna:chromosome chromosome:GRCh37:19:1:59128983:1 20 dna:chromosome chromosome:GRCh37:20:1:63025520:1 21 dna:chromosome chromosome:GRCh37:21:1:48129895:1 22 dna:chromosome chromosome:GRCh37:22:1:51304566:1 X dna:chromosome chromosome:GRCh37:X:1:155270560:1 Y dna:chromosome chromosome:GRCh37:Y:2649521:59034049:1 MT gi|251831106|ref|NC_012920.1| Homo sapiens mitochondrion, complete genome GL000207.1 dna:supercontig supercontig::GL000207.1:1:4262:1 GL000226.1 dna:supercontig supercontig::GL000226.1:1:15008:1 GL000229.1 dna:supercontig supercontig::GL000229.1:1:19913:1 GL000231.1 dna:supercontig supercontig::GL000231.1:1:27386:1 GL000210.1 dna:supercontig supercontig::GL000210.1:1:27682:1 GL000239.1 dna:supercontig supercontig::GL000239.1:1:33824:1 GL000235.1 dna:supercontig supercontig::GL000235.1:1:34474:1 GL000201.1 dna:supercontig supercontig::GL000201.1:1:36148:1 GL000247.1 dna:supercontig supercontig::GL000247.1:1:36422:1 GL000245.1 dna:supercontig supercontig::GL000245.1:1:36651:1 GL000197.1 dna:supercontig supercontig::GL000197.1:1:37175:1 GL000203.1 dna:supercontig supercontig::GL000203.1:1:37498:1 GL000246.1 dna:supercontig supercontig::GL000246.1:1:38154:1 GL000249.1 dna:supercontig supercontig::GL000249.1:1:38502:1 GL000196.1 dna:supercontig supercontig::GL000196.1:1:38914:1 GL000248.1 dna:supercontig supercontig::GL000248.1:1:39786:1 GL000244.1 dna:supercontig supercontig::GL000244.1:1:39929:1 GL000238.1 dna:supercontig supercontig::GL000238.1:1:39939:1 GL000202.1 dna:supercontig supercontig::GL000202.1:1:40103:1 GL000234.1 dna:supercontig supercontig::GL000234.1:1:40531:1 GL000232.1 dna:supercontig supercontig::GL000232.1:1:40652:1 GL000206.1 dna:supercontig supercontig::GL000206.1:1:41001:1 GL000240.1 dna:supercontig supercontig::GL000240.1:1:41933:1 GL000236.1 dna:supercontig supercontig::GL000236.1:1:41934:1 GL000241.1 dna:supercontig supercontig::GL000241.1:1:42152:1 GL000243.1 dna:supercontig supercontig::GL000243.1:1:43341:1 GL000242.1 dna:supercontig supercontig::GL000242.1:1:43523:1 GL000230.1 dna:supercontig supercontig::GL000230.1:1:43691:1 GL000237.1 dna:supercontig supercontig::GL000237.1:1:45867:1 GL000233.1 dna:supercontig supercontig::GL000233.1:1:45941:1 GL000204.1 dna:supercontig supercontig::GL000204.1:1:81310:1 GL000198.1 dna:supercontig supercontig::GL000198.1:1:90085:1 GL000208.1 dna:supercontig supercontig::GL000208.1:1:92689:1 GL000191.1 dna:supercontig supercontig::GL000191.1:1:106433:1 GL000227.1 dna:supercontig supercontig::GL000227.1:1:128374:1 GL000228.1 dna:supercontig supercontig::GL000228.1:1:129120:1 GL000214.1 dna:supercontig supercontig::GL000214.1:1:137718:1 GL000221.1 dna:supercontig supercontig::GL000221.1:1:155397:1 GL000209.1 dna:supercontig supercontig::GL000209.1:1:159169:1 GL000218.1 dna:supercontig supercontig::GL000218.1:1:161147:1 GL000220.1 dna:supercontig supercontig::GL000220.1:1:161802:1 GL000213.1 dna:supercontig supercontig::GL000213.1:1:164239:1 GL000211.1 dna:supercontig supercontig::GL000211.1:1:166566:1 GL000199.1 dna:supercontig supercontig::GL000199.1:1:169874:1 GL000217.1 dna:supercontig supercontig::GL000217.1:1:172149:1 GL000216.1 dna:supercontig supercontig::GL000216.1:1:172294:1 GL000215.1 dna:supercontig supercontig::GL000215.1:1:172545:1 GL000205.1 dna:supercontig supercontig::GL000205.1:1:174588:1 GL000219.1 dna:supercontig supercontig::GL000219.1:1:179198:1 GL000224.1 dna:supercontig supercontig::GL000224.1:1:179693:1 GL000223.1 dna:supercontig supercontig::GL000223.1:1:180455:1 GL000195.1 dna:supercontig supercontig::GL000195.1:1:182896:1 GL000212.1 dna:supercontig supercontig::GL000212.1:1:186858:1 GL000222.1 dna:supercontig supercontig::GL000222.1:1:186861:1 GL000200.1 dna:supercontig supercontig::GL000200.1:1:187035:1 GL000193.1 dna:supercontig supercontig::GL000193.1:1:189789:1 GL000194.1 dna:supercontig supercontig::GL000194.1:1:191469:1 GL000225.1 dna:supercontig supercontig::GL000225.1:1:211173:1 GL000192.1 dna:supercontig supercontig::GL000192.1:1:547496:1

ADD REPLYlink written 6.6 years ago by agatorano50

the samtools view -H input.bam command gives me too much output to paste in here.

it has a section of:

@HD VN:1.0 GO:none SO:coordinate @SQ SN:1 LN:249250621 UR:http://www.broadinstitute.org/ftp/pub/seq/references/Homosapiensassembly19.fasta AS:GRCh37 M5:1b22b98cdeb4a9304cb5d48026a85128 SP:Homo Sapiens @SQ SN:2 LN:243199373 UR:http://www.broadinstitute.org/ftp/pub/seq/references/Homosapiensassembly19.fasta AS:GRCh37 M5:a0d9851da00400dec1098a9255ac712e SP:Homo Sapiens @SQ SN:3 LN:198022430 UR:http://www.broadinstitute.org/ftp/pub/seq/references/Homosapiensassembly19.fasta AS:GRCh37 M5:fdfd811849cc2fadebc929bb925902e5 SP:Homo Sapiens @SQ SN:4 LN:191154276 UR:http://www.broadinstitute.org/ftp/pub/seq/references/Homosapiensassembly19.fasta AS:GRCh37 M5:23dccd106897542ad87d2765d28a19a1 SP:Homo Sapiens @SQ SN:5 LN:180915260 UR:http://www.broadinstitute.org/ftp/pub/seq/references/Homosapiensassembly19.fasta AS:GRCh37 M5:0740173db9ffd264d728f32784845cd7 SP:Homo Sapiens @SQ SN:6 LN:171115067 UR:http://www.broadinstitute.org/ftp/pub/seq/references/Homosapiensassembly19.fasta AS:GRCh37 M5:1d3a93a248d92a729ee764823acbbc6b SP:Homo Sapiens @SQ SN:7 LN:159138663 UR:http://www.broadinstitute.org/ftp/pub/seq/references/Homosapiensassembly19.fasta AS:GRCh37 M5:618366e953d6aaad97dbe4777c29375e SP:Homo Sapiens @SQ SN:8 LN:146364022 UR:http://www.broadinstitute.org/ftp/pub/seq/references/Homosapiensassembly19.fasta AS:GRCh37 M5:96f514a9929e410c6651697bded59aec SP:Homo Sapiens

as section of:

@RG ID:C08VM.1 PL:illumina PU:C08VMACXX111121.1.AGGTTATC LB:Catch-103331 DT:2011-11-21T00:00:00-0500 SM:ASD-371-3-4 CN:BI @RG ID:C08VM.2 PL:illumina PU:C08VMACXX111121.2.AGGTTATC LB:Catch-103331 DT:2011-11-21T00:00:00-0500 SM:ASD-371-3-4 CN:BI @RG ID:C08VM.3 PL:illumina PU:C08VMACXX111121.3.AGGTTATC LB:Catch-103331 DT:2011-11-21T00:00:00-0500 SM:ASD-371-3-4 CN:BI @RG ID:C08VM.4 PL:illumina PU:C08VMACXX111121.4.AGGTTATC LB:Catch-103331 DT:2011-11-21T00:00:00-0500 SM:ASD-371-3-4 CN:BI @RG ID:C08VM.5 PL:illumina PU:C08VMACXX111121.5.AGGTTATC LB:Catch-103331 DT:2011-11-21T00:00:00-0500 SM:ASD-371-3-4 CN:BI @RG ID:C08VM.6 PL:illumina PU:C08VMACXX111121.6.AGGTTATC LB:Catch-103331 DT:2011-11-21T00:00:00-0500 SM:ASD-371-3-4 CN:BI @RG ID:C08VM.7 PL:illumina PU:C08VMACXX111121.7.AGGTTATC LB:Catch-103331 DT:2011-11-21T00:00:00-0500 SM:ASD-371-3-4 CN:BI @RG ID:C08VM.8 PL:illumina PU:C08VMACXX111121.8.AGGTTATC LB:Catch-103331 DT:2011-11-21T00:00:00-0500 SM:ASD-371-3-4 CN:BI @RG ID:D0CGV.4 PL:illumina PU:D0CGVACXX111118.4.AGGTTATC LB:Catch-103331 DT:2011-11-18T00:00:00-0500 SM:ASD-371-3-4 CN:BI @RG ID:D0D16.1 PL:illumina PU:D0D16ACXX111121.1.AGGTTATC LB:Catch-103331 DT:2011-11-21T00:00:00-0500 SM:ASD-371-3-4 CN:BI @RG ID:D0D16.2 PL:illumina PU:D0D16ACXX111121.2.AGGTTATC LB:Catch-103331 DT:2011-11-21T00:00:00-0500 SM:ASD-371-3-4 CN:BI @RG ID:D0D16.3 PL:illumina PU:D0D16ACXX111121.3.AGGTTATC LB:Catch-103331 DT:2011-11-21T00:00:00-0500 SM:ASD-371-3-4 CN:BI @RG ID:D0D16.4 PL:illumina PU:D0D16ACXX111121.4.AGGTTATC LB:Catch-103331 DT:2011-11-21T00:00:00-0500 SM:ASD-371-3-4 CN:BI @RG ID:D0D16.5 PL:illumina PU:D0D16ACXX111121.5.AGGTTATC LB:Catch-103331 DT:2011-11-21T00:00:00-0500 SM:ASD-371-3-4 CN:BI @RG ID:D0D16.6 PL:illumina PU:D0D16ACXX111121.6.AGGTTATC LB:Catch-103331 DT:2011-11-21T00:00:00-0500 SM:ASD-371-3-4 CN:BI @RG ID:D0D16.7 PL:illumina PU:D0D16ACXX111121.7.AGGTTATC LB:Catch-103331 DT:2011-11-21T00:00:00-0500 SM:ASD-371-3-4 CN:BI

and a section of:

@PG ID:GATK TableRecalibration VN:1.2-64-g2ddbb7b CL:defaultreadgroup=null defaultplatform=null forcereadgroup=null forceplatform=null windowsizenqs=5 homopolymernback=7 exceptionifnotile=false solidrecalmode=SETQZERO solidnocallstrategy=THROWEXCEPTION recalfile=/seq/picard/D0D6UACXX/C1-1602011-11-182011-11-28/6/Catch-103331/D0D6UACXX.6.recaldata.csv preserveqscoreslessthan=5 smoothing=1 maxqualityscore=50 doNotWriteOriginalQuals=false nopgtag=false failwithnoeofmarker=false skipUQUpdate=false Covariates=[ReadGroupCovariate, QualityScoreCovariate, CycleCovariate, DinucCovariate] @PG ID:GATK TableRecalibration.1 VN:1.2-64-g2ddbb7b CL:defaultreadgroup=null defaultplatform=null forcereadgroup=null forceplatform=null windowsizenqs=5 homopolymernback=7 exceptionifnotile=false solidrecalmode=SETQZERO solidnocallstrategy=THROWEXCEPTION recalfile=/seq/picard/C08VMACXX/C1-1602011-11-212011-12-01/8/Catch-103331/C08VMACXX.8.recaldata.csv preserveqscoreslessthan=5 smoothing=1 maxqualityscore=50 doNotWriteOriginalQuals=false nopgtag=false failwithnoeofmarker=false skipUQUpdate=false Covariates=[ReadGroupCovariate, QualityScoreCovariate, CycleCovariate, DinucCovariate] @PG ID:GATK TableRecalibration.10 VN:1.2-64-g2ddbb7b CL:defaultreadgroup=null defaultplatform=null forcereadgroup=null forceplatform=null windowsizenqs=5 homopolymernback=7 exceptionifnotile=false solidrecalmode=SETQZERO solidnocallstrategy=THROWEXCEPTION recalfile=/seq/picard/C08VMACXX/C1-1602011-11-212011-12-01/5/Catch-103331/C08VMACXX.5.recaldata.csv preserveqscoreslessthan=5 smoothing=1 maxqualityscore=50 doNotWriteOriginalQuals=false nopgtag=false failwithnoeofmarker=false skipUQUpdate=false Covariates=[ReadGroupCovariate, QualityScoreCovariate, CycleCovariate, DinucCovariate] @PG ID:GATK TableRecalibration.11 VN:1.2-64-g2ddbb7b CL:defaultreadgroup=null defaultplatform=null forcereadgroup=null forceplatform=null windowsizenqs=5 homopolymernback=7 exceptionifnotile=false solidrecalmode=SETQZERO solidnocallstrategy=THROWEXCEPTION recalfile=/seq/picard/C08VMACXX/C1-1602011-11-212011-12-01/7/Catch-103331/C08VMACXX.7.recaldata.csv preserveqscoreslessthan=5 smoothing=1 maxqualityscore=50 doNotWriteOriginalQuals=false nopgtag=false failwithnoeofmarker=false skipUQUpdate=false Covariates=[ReadGroupCovariate, QualityScoreCovariate, CycleCovariate, DinucCovariate]

Is this proper?

ADD REPLYlink written 6.6 years ago by agatorano50

the approximate command I ran is in the original post.

ADD REPLYlink written 6.6 years ago by agatorano50
0
gravatar for Ashutosh Pandey
6.6 years ago by
Philadelphia
Ashutosh Pandey11k wrote:

Hi Agatorano,

I need to see the BAM header lines where the line starts with @SQ. Something like: @SQ SN:1 LN:197195432

You can run the following command : samtools view -H input.bam | grep "^@SQ" . I sent you my email id. You can rerun all the commands I asked you to run. Paste the output so that different lines are separated. Here when u pasted the chromosome names everything got wrapped up. Send the file to me using email.

Thanks

ADD COMMENTlink written 6.6 years ago by Ashutosh Pandey11k
0
gravatar for Ashutosh Pandey
6.6 years ago by
Philadelphia
Ashutosh Pandey11k wrote:

Hey I just realised that I can make sense out of things you pasted. I can clearly see that the chromosome names are different in the BAM header file and the reference genome file. so you need to change the chromosome header in your fasta files from ">1 dna:chromosome chromosome:GRCh37:1:1:249250621:1" to ">1" and same for other chromosomes too. The chromosome order seems right to me.

ADD COMMENTlink modified 6.6 years ago • written 6.6 years ago by Ashutosh Pandey11k

I did as you suggested but got the same error:

Badly formed genome loc: Contig 'chr start stop name' does not match any contig in the GATK sequence dictionary derived from the reference; are you sure you are using the correct reference fasta file?

here is what my file headers looks like:

>1
>2
>3
>4
>5
>6
>7
>8
>9
>10
>11
>12
>13
>14
>15
>16
>17
>18
>19
>20
>21
>22
>X
>Y
>MT
>GL000207.1
>GL000226.1
>GL000229.1
>GL000231.1
>GL000210.1
>GL000239.1
>GL000235.1
>GL000201.1
>GL000247.1
>GL000245.1
>GL000197.1
>GL000203.1
>GL000246.1
>GL000249.1
>GL000196.1
>GL000248.1
>GL000244.1
>GL000238.1
>GL000202.1
>GL000234.1
>GL000232.1
>GL000206.1
>GL000240.1
>GL000236.1
>GL000241.1
>GL000243.1
>GL000242.1
>GL000230.1
>GL000237.1
>GL000233.1
>GL000204.1
>GL000198.1
>GL000208.1
>GL000191.1
>GL000227.1
>GL000228.1
>GL000214.1
>GL000221.1
>GL000209.1
>GL000218.1
>GL000220.1
>GL000213.1
>GL000211.1
>GL000199.1
>GL000217.1
>GL000216.1
>GL000215.1
>GL000205.1
>GL000219.1
>GL000224.1
>GL000223.1
>GL000195.1
>GL000212.1
>GL000222.1
>GL000200.1
>GL000193.1
>GL000194.1
>GL000225.1
>GL000192.1
ADD REPLYlink modified 6.6 years ago • written 6.6 years ago by agatorano50

Hi Agatorano,

I am really sorry for this whole thing. It also happened with me the first time but it was not that bad. OK final try, I think ".dict" is a text file which should be created by GATK in the same location as your reference genome. Can you see one ? If yes, Can you paste the content here OR check yourself if the chromosome name, start, end matches in the dict file matches with the chromosome name, start, end in Reference file. If not delete the old dict file. The dict file should also contain all the contigs like >GL000225.1 other than your 1-22, X,Y,MT chromosome.

ADD REPLYlink modified 6.6 years ago • written 6.6 years ago by Ashutosh Pandey11k
@HD    VN:1.0    SO:unsorted
@SQ    SN:1    LN:249250621    UR:file:/data/Reference/human_g1k_v37/fixed_g1k_v37.fasta    M5:1b22b98cdeb4a9304cb5d48026a85128
@SQ    SN:2    LN:243199373    UR:file:/data/Reference/human_g1k_v37/fixed_g1k_v37.fasta    M5:a0d9851da00400dec1098a9255ac712e
@SQ    SN:3    LN:198022430    UR:file:/data/Reference/human_g1k_v37/fixed_g1k_v37.fasta    M5:fdfd811849cc2fadebc929bb925902e5
@SQ    SN:4    LN:191154276    UR:file:/data/Reference/human_g1k_v37/fixed_g1k_v37.fasta    M5:23dccd106897542ad87d2765d28a19a1
@SQ    SN:5    LN:180915260    UR:file:/data/Reference/human_g1k_v37/fixed_g1k_v37.fasta    M5:0740173db9ffd264d728f32784845cd7
@SQ    SN:6    LN:171115067    UR:file:/data/Reference/human_g1k_v37/fixed_g1k_v37.fasta    M5:1d3a93a248d92a729ee764823acbbc6b
@SQ    SN:7    LN:159138663    UR:file:/data/Reference/human_g1k_v37/fixed_g1k_v37.fasta    M5:618366e953d6aaad97dbe4777c29375e
@SQ    SN:8    LN:146364022    UR:file:/data/Reference/human_g1k_v37/fixed_g1k_v37.fasta    M5:96f514a9929e410c6651697bded59aec
@SQ    SN:9    LN:141213431    UR:file:/data/Reference/human_g1k_v37/fixed_g1k_v37.fasta    M5:3e273117f15e0a400f01055d9f393768
@SQ    SN:10    LN:135534747    UR:file:/data/Reference/human_g1k_v37/fixed_g1k_v37.fasta    M5:988c28e000e84c26d552359af1ea2e1d
@SQ    SN:11    LN:135006516    UR:file:/data/Reference/human_g1k_v37/fixed_g1k_v37.fasta    M5:98c59049a2df285c76ffb1c6db8f8b96
@SQ    SN:12    LN:133851895    UR:file:/data/Reference/human_g1k_v37/fixed_g1k_v37.fasta    M5:51851ac0e1a115847ad36449b0015864
@SQ    SN:13    LN:115169878    UR:file:/data/Reference/human_g1k_v37/fixed_g1k_v37.fasta    M5:283f8d7892baa81b510a015719ca7b0b
@SQ    SN:14    LN:107349540    UR:file:/data/Reference/human_g1k_v37/fixed_g1k_v37.fasta    M5:98f3cae32b2a2e9524bc19813927542e
@SQ    SN:15    LN:102531392    UR:file:/data/Reference/human_g1k_v37/fixed_g1k_v37.fasta    M5:e5645a794a8238215b2cd77acb95a078
@SQ    SN:16    LN:90354753    UR:file:/data/Reference/human_g1k_v37/fixed_g1k_v37.fasta    M5:fc9b1a7b42b97a864f56b348b06095e6
@SQ    SN:17    LN:81195210    UR:file:/data/Reference/human_g1k_v37/fixed_g1k_v37.fasta    M5:351f64d4f4f9ddd45b35336ad97aa6de
@SQ    SN:18    LN:78077248    UR:file:/data/Reference/human_g1k_v37/fixed_g1k_v37.fasta    M5:b15d4b2d29dde9d3e4f93d1d0f2cbc9c
@SQ    SN:19    LN:59128983    UR:file:/data/Reference/human_g1k_v37/fixed_g1k_v37.fasta    M5:1aacd71f30db8e561810913e0b72636d
@SQ    SN:20    LN:63025520    UR:file:/data/Reference/human_g1k_v37/fixed_g1k_v37.fasta    M5:0dec9660ec1efaaf33281c0d5ea2560f
@SQ    SN:21    LN:48129895    UR:file:/data/Reference/human_g1k_v37/fixed_g1k_v37.fasta    M5:2979a6085bfe28e3ad6f552f361ed74d
@SQ    SN:22    LN:51304566    UR:file:/data/Reference/human_g1k_v37/fixed_g1k_v37.fasta    M5:a718acaa6135fdca8357d5bfe94211dd
@SQ    SN:X    LN:155270560    UR:file:/data/Reference/human_g1k_v37/fixed_g1k_v37.fasta    M5:7e0e2e580297b7764e31dbc80c2540dd
@SQ    SN:Y    LN:59373566    UR:file:/data/Reference/human_g1k_v37/fixed_g1k_v37.fasta    M5:1fa3474750af0948bdf97d5a0ee52e51
@SQ    SN:MT    LN:16569    UR:file:/data/Reference/human_g1k_v37/fixed_g1k_v37.fasta    M5:c68f52674c9fb33aef52dcf399755519
@SQ    SN:GL000207.1    LN:4262    UR:file:/data/Reference/human_g1k_v37/fixed_g1k_v37.fasta    M5:f3814841f1939d3ca19072d9e89f3fd7
@SQ    SN:GL000226.1    LN:15008    UR:file:/data/Reference/human_g1k_v37/fixed_g1k_v37.fasta    M5:1c1b2cd1fccbc0a99b6a447fa24d1504
@SQ    SN:GL000229.1    LN:19913    UR:file:/data/Reference/human_g1k_v37/fixed_g1k_v37.fasta    M5:d0f40ec87de311d8e715b52e4c7062e1
@SQ    SN:GL000231.1    LN:27386    UR:file:/data/Reference/human_g1k_v37/fixed_g1k_v37.fasta    M5:ba8882ce3a1efa2080e5d29b956568a4
@SQ    SN:GL000210.1    LN:27682    UR:file:/data/Reference/human_g1k_v37/fixed_g1k_v37.fasta    M5:851106a74238044126131ce2a8e5847c
@SQ    SN:GL000239.1    LN:33824    UR:file:/data/Reference/human_g1k_v37/fixed_g1k_v37.fasta    M5:99795f15702caec4fa1c4e15f8a29c07
@SQ    SN:GL000235.1    LN:34474    UR:file:/data/Reference/human_g1k_v37/fixed_g1k_v37.fasta    M5:118a25ca210cfbcdfb6c2ebb249f9680
@SQ    SN:GL000201.1    LN:36148    UR:file:/data/Reference/human_g1k_v37/fixed_g1k_v37.fasta    M5:dfb7e7ec60ffdcb85cb359ea28454ee9
@SQ    SN:GL000247.1    LN:36422    UR:file:/data/Reference/human_g1k_v37/fixed_g1k_v37.fasta    M5:7de00226bb7df1c57276ca6baabafd15
@SQ    SN:GL000245.1    LN:36651    UR:file:/data/Reference/human_g1k_v37/fixed_g1k_v37.fasta    M5:89bc61960f37d94abf0df2d481ada0ec
@SQ    SN:GL000197.1    LN:37175    UR:file:/data/Reference/human_g1k_v37/fixed_g1k_v37.fasta    M5:6f5efdd36643a9b8c8ccad6f2f1edc7b
@SQ    SN:GL000203.1    LN:37498    UR:file:/data/Reference/human_g1k_v37/fixed_g1k_v37.fasta    M5:96358c325fe0e70bee73436e8bb14dbd
@SQ    SN:GL000246.1    LN:38154    UR:file:/data/Reference/human_g1k_v37/fixed_g1k_v37.fasta    M5:e4afcd31912af9d9c2546acf1cb23af2
@SQ    SN:GL000249.1    LN:38502    UR:file:/data/Reference/human_g1k_v37/fixed_g1k_v37.fasta    M5:1d78abec37c15fe29a275eb08d5af236
@SQ    SN:GL000196.1    LN:38914    UR:file:/data/Reference/human_g1k_v37/fixed_g1k_v37.fasta    M5:d92206d1bb4c3b4019c43c0875c06dc0
@SQ    SN:GL000248.1    LN:39786    UR:file:/data/Reference/human_g1k_v37/fixed_g1k_v37.fasta    M5:5a8e43bec9be36c7b49c84d585107776
@SQ    SN:GL000244.1    LN:39929    UR:file:/data/Reference/human_g1k_v37/fixed_g1k_v37.fasta    M5:0996b4475f353ca98bacb756ac479140
@SQ    SN:GL000238.1    LN:39939    UR:file:/data/Reference/human_g1k_v37/fixed_g1k_v37.fasta    M5:131b1efc3270cc838686b54e7c34b17b
@SQ    SN:GL000202.1    LN:40103    UR:file:/data/Reference/human_g1k_v37/fixed_g1k_v37.fasta    M5:06cbf126247d89664a4faebad130fe9c
@SQ    SN:GL000234.1    LN:40531    UR:file:/data/Reference/human_g1k_v37/fixed_g1k_v37.fasta    M5:93f998536b61a56fd0ff47322a911d4b
@SQ    SN:GL000232.1    LN:40652    UR:file:/data/Reference/human_g1k_v37/fixed_g1k_v37.fasta    M5:3e06b6741061ad93a8587531307057d8
@SQ    SN:GL000206.1    LN:41001    UR:file:/data/Reference/human_g1k_v37/fixed_g1k_v37.fasta    M5:43f69e423533e948bfae5ce1d45bd3f1
@SQ    SN:GL000240.1    LN:41933    UR:file:/data/Reference/human_g1k_v37/fixed_g1k_v37.fasta    M5:445a86173da9f237d7bcf41c6cb8cc62
@SQ    SN:GL000236.1    LN:41934    UR:file:/data/Reference/human_g1k_v37/fixed_g1k_v37.fasta    M5:fdcd739913efa1fdc64b6c0cd7016779
@SQ    SN:GL000241.1    LN:42152    UR:file:/data/Reference/human_g1k_v37/fixed_g1k_v37.fasta    M5:ef4258cdc5a45c206cea8fc3e1d858cf
@SQ    SN:GL000243.1    LN:43341    UR:file:/data/Reference/human_g1k_v37/fixed_g1k_v37.fasta    M5:cc34279a7e353136741c9fce79bc4396
@SQ    SN:GL000242.1    LN:43523    UR:file:/data/Reference/human_g1k_v37/fixed_g1k_v37.fasta    M5:2f8694fc47576bc81b5fe9e7de0ba49e
@SQ    SN:GL000230.1    LN:43691    UR:file:/data/Reference/human_g1k_v37/fixed_g1k_v37.fasta    M5:b4eb71ee878d3706246b7c1dbef69299
@SQ    SN:GL000237.1    LN:45867    UR:file:/data/Reference/human_g1k_v37/fixed_g1k_v37.fasta    M5:e0c82e7751df73f4f6d0ed30cdc853c0
@SQ    SN:GL000233.1    LN:45941    UR:file:/data/Reference/human_g1k_v37/fixed_g1k_v37.fasta    M5:7fed60298a8d62ff808b74b6ce820001
@SQ    SN:GL000204.1    LN:81310    UR:file:/data/Reference/human_g1k_v37/fixed_g1k_v37.fasta    M5:efc49c871536fa8d79cb0a06fa739722
@SQ    SN:GL000198.1    LN:90085    UR:file:/data/Reference/human_g1k_v37/fixed_g1k_v37.fasta    M5:868e7784040da90d900d2d1b667a1383
@SQ    SN:GL000208.1    LN:92689    UR:file:/data/Reference/human_g1k_v37/fixed_g1k_v37.fasta    M5:aa81be49bf3fe63a79bdc6a6f279abf6
@SQ    SN:GL000191.1    LN:106433    UR:file:/data/Reference/human_g1k_v37/fixed_g1k_v37.fasta    M5:d75b436f50a8214ee9c2a51d30b2c2cc
@SQ    SN:GL000227.1    LN:128374    UR:file:/data/Reference/human_g1k_v37/fixed_g1k_v37.fasta    M5:a4aead23f8053f2655e468bcc6ecdceb
@SQ    SN:GL000228.1    LN:129120    UR:file:/data/Reference/human_g1k_v37/fixed_g1k_v37.fasta    M5:c5a17c97e2c1a0b6a9cc5a6b064b714f
@SQ    SN:GL000214.1    LN:137718    UR:file:/data/Reference/human_g1k_v37/fixed_g1k_v37.fasta    M5:46c2032c37f2ed899eb41c0473319a69
@SQ    SN:GL000221.1    LN:155397    UR:file:/data/Reference/human_g1k_v37/fixed_g1k_v37.fasta    M5:3238fb74ea87ae857f9c7508d315babb
@SQ    SN:GL000209.1    LN:159169    UR:file:/data/Reference/human_g1k_v37/fixed_g1k_v37.fasta    M5:f40598e2a5a6b26e84a3775e0d1e2c81
@SQ    SN:GL000218.1    LN:161147    UR:file:/data/Reference/human_g1k_v37/fixed_g1k_v37.fasta    M5:1d708b54644c26c7e01c2dad5426d38c
@SQ    SN:GL000220.1    LN:161802    UR:file:/data/Reference/human_g1k_v37/fixed_g1k_v37.fasta    M5:fc35de963c57bf7648429e6454f1c9db
@SQ    SN:GL000213.1    LN:164239    UR:file:/data/Reference/human_g1k_v37/fixed_g1k_v37.fasta    M5:9d424fdcc98866650b58f004080a992a
@SQ    SN:GL000211.1    LN:166566    UR:file:/data/Reference/human_g1k_v37/fixed_g1k_v37.fasta    M5:7daaa45c66b288847b9b32b964e623d3
@SQ    SN:GL000199.1    LN:169874    UR:file:/data/Reference/human_g1k_v37/fixed_g1k_v37.fasta    M5:569af3b73522fab4b40995ae4944e78e
@SQ    SN:GL000217.1    LN:172149    UR:file:/data/Reference/human_g1k_v37/fixed_g1k_v37.fasta    M5:6d243e18dea1945fb7f2517615b8f52e
@SQ    SN:GL000216.1    LN:172294    UR:file:/data/Reference/human_g1k_v37/fixed_g1k_v37.fasta    M5:642a232d91c486ac339263820aef7fe0
@SQ    SN:GL000215.1    LN:172545    UR:file:/data/Reference/human_g1k_v37/fixed_g1k_v37.fasta    M5:5eb3b418480ae67a997957c909375a73
@SQ    SN:GL000205.1    LN:174588    UR:file:/data/Reference/human_g1k_v37/fixed_g1k_v37.fasta    M5:d22441398d99caf673e9afb9a1908ec5
@SQ    SN:GL000219.1    LN:179198    UR:file:/data/Reference/human_g1k_v37/fixed_g1k_v37.fasta    M5:f977edd13bac459cb2ed4a5457dba1b3
@SQ    SN:GL000224.1    LN:179693    UR:file:/data/Reference/human_g1k_v37/fixed_g1k_v37.fasta    M5:d5b2fc04f6b41b212a4198a07f450e20
@SQ    SN:GL000223.1    LN:180455    UR:file:/data/Reference/human_g1k_v37/fixed_g1k_v37.fasta    M5:399dfa03bf32022ab52a846f7ca35b30
@SQ    SN:GL000195.1    LN:182896    UR:file:/data/Reference/human_g1k_v37/fixed_g1k_v37.fasta    M5:5d9ec007868d517e73543b005ba48535
@SQ    SN:GL000212.1    LN:186858    UR:file:/data/Reference/human_g1k_v37/fixed_g1k_v37.fasta    M5:563531689f3dbd691331fd6c5730a88b
ADD REPLYlink written 6.6 years ago by agatorano50

That is about all of it although some was cut off

ADD REPLYlink written 6.6 years ago by agatorano50

Hey Agatorano, Your reference fasta sequence and dict file have the same chromosome and they follow the same order. So I think they are not creating any problem. I think your bam file may have contigs that are not part of reference file (I am just guessing. I have no clue whats going on). Can you repaste the BAM header (the one you pasted above is only until chromosome 8). Try this "samtools view -H input.bam | grep "^@SQ | cut -f1" and paste the output. If it is too big then paste 10 lines from the bottom.

ADD REPLYlink written 6.6 years ago by Ashutosh Pandey11k

Doing exactly that I get this output:

@SQ @SQ @SQ @SQ @SQ @SQ @SQ @SQ @SQ @SQ @SQ @SQ @SQ @SQ @SQ @SQ @SQ @SQ @SQ @SQ @SQ @SQ @SQ @SQ @SQ @SQ @SQ @SQ @SQ @SQ @SQ @SQ @SQ @SQ @SQ @SQ @SQ @SQ @SQ @SQ @SQ @SQ @SQ @SQ @SQ @SQ @SQ @SQ @SQ @SQ @SQ @SQ @SQ @SQ @SQ @SQ @SQ @SQ @SQ @SQ @SQ @SQ @SQ @SQ @SQ @SQ @SQ @SQ @SQ @SQ @SQ @SQ @SQ @SQ @SQ @SQ @SQ @SQ @SQ @SQ @SQ @SQ @SQ @SQ @SQ

ADD REPLYlink written 6.6 years ago by agatorano50

Hi, I was stupid to ask you to run that command. Basically i wanted you to print the chromosome names from the Bam header. The command should have been "samtools view -H input.bam | grep "^@SQ | cut -f2" instead of -f1. I again apologize. Can you try the new command. Thanks.

ADD REPLYlink written 6.6 years ago by Ashutosh Pandey11k
1
SN:1
SN:2
SN:3
SN:4
SN:5
SN:6
SN:7
SN:8
SN:9
SN:10
SN:11
SN:12
SN:13
SN:14
SN:15
SN:16
SN:17
SN:18
SN:19
SN:20
SN:21
SN:22
SN:X
SN:Y
SN:MT
SN:GL000207.1
SN:GL000226.1
SN:GL000229.1
SN:GL000231.1
SN:GL000210.1
SN:GL000239.1
SN:GL000235.1
SN:GL000201.1
SN:GL000247.1
SN:GL000245.1
SN:GL000197.1
SN:GL000203.1
SN:GL000246.1
SN:GL000249.1
SN:GL000196.1
SN:GL000248.1
SN:GL000244.1
SN:GL000238.1
SN:GL000202.1
SN:GL000234.1
SN:GL000232.1
SN:GL000206.1
SN:GL000240.1
SN:GL000236.1
SN:GL000241.1
SN:GL000243.1
SN:GL000242.1
SN:GL000230.1
SN:GL000237.1
SN:GL000233.1
SN:GL000204.1
SN:GL000198.1
SN:GL000208.1
SN:GL000191.1
SN:GL000227.1
SN:GL000228.1
SN:GL000214.1
SN:GL000221.1
SN:GL000209.1
SN:GL000218.1
SN:GL000220.1
SN:GL000213.1
SN:GL000211.1
SN:GL000199.1
SN:GL000217.1
SN:GL000216.1
SN:GL000215.1
SN:GL000205.1
SN:GL000219.1
SN:GL000224.1
SN:GL000223.1
SN:GL000195.1
SN:GL000212.1
SN:GL000222.1
SN:GL000200.1
SN:GL000193.1
SN:GL000194.1
SN:GL000225.1
SN:GL000192.1
SN:NC_007605
ADD REPLYlink modified 6.6 years ago • written 6.6 years ago by agatorano50
0
gravatar for Ashutosh Pandey
6.6 years ago by
Philadelphia
Ashutosh Pandey11k wrote:

Hi ,

I assume I pasted a solution sometime last night but I think somehow it got deleted or may be I forgot to push the submit button. Anyways as you can see that there is a contig "NC_007605" in your bam file that is not present in your reference fasta file and as a result the dict file that is created by GATK doesn't contain this particular contig. I am not sure if taking out all the reads mapped to this particular contig from your bam file will help or not. Try running command "samtools view -h in.bam | grep -v "NC_007605" | samtools view -bS - > out.bam " and then use out.bam file in your GATK command.

Thanks.

ADD COMMENTlink modified 6.6 years ago • written 6.6 years ago by Ashutosh Pandey11k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2091 users visited in the last hour