Index Vcf File
1
0
Entering edit mode
10.2 years ago
mad.cichlids ▴ 140

Hi, I was trying to index my vcf file, i first sort it and then zip it before indexing as this post: Tabix -p vcf ERROR However, the similar error message still shows up, could you give some suggestions?

cat z.vcf | vcf-sort > out.vcf
bgzip out.vcf 
tabix -p vcf out.vcf.gz

[ti_index_core] the file out of order at line 19

Here is the first 19 lines:

head -19 out.vcf

##fileformat=VCFv4.1
##fileDate=2014-02-26 22:00:03
##source=VCF_popgen.pl
##reference=file:Genome/AGTA02_WGS.fasta
##contig=N/A
##phasing=none
##INFO=<ID=NS,Number=1,Type=Integer,Description="Number of Samples With Data">
##INFO=<ID=DP,Number=1,Type=Integer,Description="Total Depth">
##INFO=<ID=AN,Number=1,Type=Integer,Description="Number of Alleles in Population">
##FILTER=<ID=q10,Description="Quality below 10">
##FILTER=<ID=s50,Description="Less than 50% of samples have data">
##FORMAT=<ID=GT,Number=1,Type=String,Description="FGX Consensus Genotype (threshold model)">
##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Sample Read Depth">
##FORMAT=<ID=GQ,Number=1,Type=Integer,Description="Genotype Quality (SAMtools bayesian framework)">
##FORMAT=<ID=EC,Number=.,Type=String,Description="Alternate Allele Counts in Sample">
##FORMAT=<ID=SG,Number=.,Type=String,Description="SAMtools Consensus Genotype (diploid model)">
#CHROM    POS    ID    REF    ALT    QUAL    FILTER    INFO    FORMAT    100_sequence_1_pileup.txt    105_sequence_1_pileup.txt    10_sequence_1_pileup.txt    110_sequence_1_pileup.txt    112_sequence_1_pileup.txt    114_sequence_1_pileup.txt    118_sequence_1_pileup.txt    120_sequence_1_pileup.txt    122_sequence_1_pileup.txt    126_sequence_1_pileup.txt    130_sequence_1_pileup.txt    13_sequence_1_pileup.txt    147_sequence_1_pileup.txt    153_sequence_1_pileup.txt    154_sequence_1_pileup.txt    158_sequence_1_pileup.txt    15_sequence_1_pileup.txt    164_sequence_1_pileup.txt    168_sequence_1_pileup.txt    16_sequence_1_pileup.txt    171_sequence_1_pileup.txt    174_sequence_1_pileup.txt    179_sequence_1_pileup.txt    183_sequence_1_pileup.txt    188_sequence_1_pileup.txt    198_sequence_1_pileup.txt    1_sequence_1_pileup.txt    202_sequence_1_pileup.txt    203_sequence_1_pileup.txt    206_sequence_1_pileup.txt    208_sequence_1_pileup.txt    212_sequence_1_pileup.txt    214_sequence_1_pileup.txt    216_sequence_1_pileup.txt    218_sequence_1_pileup.txt    219_sequence_1_pileup.txt    21_sequence_1_pileup.txt    220_sequence_1_pileup.txt    22_sequence_1_pileup.txt    30_sequence_1_pileup.txt    32_sequence_1_pileup.txt    37_sequence_1_pileup.txt    38_sequence_1_pileup.txt    3_sequence_1_pileup.txt    44_sequence_1_pileup.txt    45_sequence_1_pileup.txt    49_sequence_1_pileup.txt    4_sequence_1_pileup.txt    51_sequence_1_pileup.txt    53_sequence_1_pileup.txt    57_sequence_1_pileup.txt    61_sequence_1_pileup.txt    66_sequence_1_pileup.txt    67_sequence_1_pileup.txt    69_sequence_1_pileup.txt    74_sequence_1_pileup.txt    86_sequence_1_pileup.txt    87_sequence_1_pileup.txt    90_sequence_1_pileup.txt    93_sequence_1_pileup.txt    95_sequence_1_pileup.txt    98_sequence_1_pileup.txt    9_sequence_1_pileup.txt    dam01_sequence_1_pileup.txt    dam02_sequence_1_pileup.txt    dam03_sequence_1_pileup.txt    dam04_sequence_1_pileup.txt    FGXCONTROL_sequence_1_pileup.txt    mbsire_sequence_1_pileup.txt    mzdam_sequence_1_pileup.txt    sire01_sequence_1_pileup.txt    sire02_sequence_1_pileup.txt    sire03_sequence_1_pileup.txt    w118_sequence_1_pileup.txt    w119_sequence_1_pileup.txt    w120_sequence_1_pileup.txtw121_sequence_1_pileup.txt    w174_sequence_1_pileup.txt    w175_sequence_1_pileup.txt    w176_sequence_1_pileup.txt    w177_sequence_1_pileup.txt
gi|393925858|gb|AGTA02071966.1|    0000000739    .    G    A    121.20    PASS    NS=74:AN=2:DP=8448    GT:DP:GQ:EC:SG    0/1:262:144:116:R    1:32:93:32:A    0/1:87:42:72:R    .:0:0:0:.    .:0:0:0:.    0/1:222:167:113:R    0/1:93:128:55:R    1:77:186:77:A    0/1:207:144:124:R    1:56:42:52:A    0/1:310:104:203:R    0/1:84:29:17:R    1:153:225:153:A    1:57:149:56:A    0/1:81:127:44:R    0/1:425:110:162:R    0/1:71:117:29:R    .:0:0:0:.    0/1:66:75:53:R    0/1:130:28:103:R    1:101:193:100:A    0:32:123:0:G    1:68:180:68:A    0/1:76:0:66:A    1:30:87:30:A    0/1:72:95:54:R    .:0:0:0:.    1:28:81:28:A    1:40:117:40:A    1:15:42:15:A    1:30:87:30:A    0/1:98:129:53:R    0/1:59:131:36:R    1:93:147:90:A    1:82:189:82:A    0/1:62:28:53:R    1:121:216:121:A    1:136:225:136:A    1:131:225:131:A    0/1:79:37:66:R    0/1:82:119:34:R    0/1:105:98:75:R    1:67:179:67:A    0/1:223:160:116:R    0/1:125:126:81:R    1:147:122:136:A    0/1:30:53:25:R    0/1:176:97:151:A    0/1:167:112:109:R    0/1:145:13:119:A    0/1:76:130:38:R    1:104:206:104:A    0/1:172:129:109:R    1:104:199:104:A    1:45:132:45:A    1:35:102:35:A    1:109:211:109:A    1:53:157:53:A    1:118:220:118:A    0/1:265:166:133:R    1:67:179:67:A    0/1:65:103:48:R    0/1:130:24:109:A    0/1:285:101:195:R    0/1:208:19:162:A    0/1:295:126:189:R    0/1:288:48:221:A    .:0:0:0:.    1:141:225:141:A    0/1:166:141:99:R    0/1:213:115:137:R    1:132:225:132:A    1:126:225:126:A    1:21:60:21:A    0/1:24:123:15:R    0/1:120:129:59:R    .:9:24:0:A    1:10:27:10:A    .:0:0:0:.    1:19:54:19:A    0:15:72:0:G
gi|393925858|gb|AGTA02071966.1|    0000000781    .    G    A    120.61    PASS    NS=74:AN=2:DP=8484    GT:DP:GQ:EC:SG    0/1:264:49:148:R    0:32:123:0:G    0/1:86:105:14:G    .:0:0:0:.    .:0:0:0:.    0:222:255:0:G    0/1:93:3:38:R    0:78:255:0:G    0/1:209:4:84:G    0:56:128:0:G    0/1:313:23:108:G    0/1:85:31:68:G    0:153:255:0:G    0:57:199:0:G    0/1:82:6:38:R    0/1:426:7:263:R    0/1:71:25:42:R    .:0:0:0:.    0/1:66:63:13:G    0/1:131:110:27:G    0:101:255:0:G    1:33:58:33:A    0:69:235:0:G    0/1:76:84:11:G    0:30:117:0:G    0/1:72:33:18:G    .:0:0:0:.    0:28:111:0:G    0:41:150:0:G    0:15:72:0:G    0:30:117:0:G    0/1:98:5:45:R    0/1:59:4:23:R    0:93:253:0:G    0:84:255:0:G    0/1:62:86:9:G    0:122:255:0:G    0:136:255:0:G    0:131:255:0:G    0/1:80:97:13:G    0/1:84:29:48:R    0/1:105:42:30:G    0:66:226:0:G    0/1:224:37:107:R    0/1:126:10:46:G    0:147:255:0:G    0/1:30:45:5:G    0/1:178:239:25:G    0/1:167:33:58:G    0/1:146:160:26:G    0/1:76:14:38:R    0:106:255:0:G    0/1:172:12:63:G    0:104:255:0:G    0:45:162:0:G    0:35:132:0:G    0:110:255:0:G    0:53:187:0:G    0:118:255:0:G    0/1:262:24:131:R    0:67:229:0:G    0/1:66:24:17:G    0/1:130:161:20:G    0/1:286:82:89:G    0/1:210:159:46:G    0/1:296:39:107:G    0/1:288:154:68:G    .:0:0:0:.    0:143:255:0:G    0/1:168:7:68:R    0/1:214:29:77:G    0:132:255:0:G    0:126:255:0:G    0:21:90:0:G    0/1:24:10:9:R    0/1:122:28:63:R    .:9:54:0:G    0:10:57:0:G    .:0:0:0:.    0:19:84:0:G    1:15:42:15:A
vcf tabix • 6.9k views
ADD COMMENT
0
Entering edit mode

show us the line 18,19 and 20 ....

ADD REPLY
0
Entering edit mode

thanks, i did not have a good way to display these lines. i am trying to use head 19 out.vcf. but the display is really messy

ADD REPLY
0
Entering edit mode

My bet is the leading zeros on your positions is screwing up VCF-SORT. Try "sort -k1,1 -k2,2n your.vcf > your.sorted.vcf"

ADD REPLY
0
Entering edit mode

Thank you, still did not do the trick, it has a newer error message "[ti_index_core] the file out of order at line 13", really appreciate your input though

ADD REPLY
0
Entering edit mode

this "should" only strip the leading zeros off your positions. Give it a shot? worth testing it on a couple hundred lines...

perl -lane '$_ =~ s/^0+// ; print $_' your.vcf > stripped.leading.zeros.vcf

ADD REPLY
0
Entering edit mode

Thank you so much! Here is what I did:

perl -lane '$_ =~ s/^0+// ; print $_' z.vcf > stripped.leading.zeros.vcf
head -n 20 stripped.leading.zeros.vcf

It seems that the zero are still there.

##fileformat=VCFv4.1
##fileDate=2014-02-26 22:00:03
##source=VCF_popgen.pl
##reference=file:Genome/AGTA02_WGS.fasta
##contig=N/A
##phasing=none
##INFO=<ID=NS,Number=1,Type=Integer,Description="Number of Samples With Data">
##INFO=<ID=DP,Number=1,Type=Integer,Description="Total Depth">
##INFO=<ID=AN,Number=1,Type=Integer,Description="Number of Alleles in Population">
##FILTER=<ID=q10,Description="Quality below 10">
##FILTER=<ID=s50,Description="Less than 50% of samples have data">
##FORMAT=<ID=GT,Number=1,Type=String,Description="FGX Consensus Genotype (threshold model)">
##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Sample Read Depth">
##FORMAT=<ID=GQ,Number=1,Type=Integer,Description="Genotype Quality (SAMtools bayesian framework)">
##FORMAT=<ID=EC,Number=.,Type=String,Description="Alternate Allele Counts in Sample">
##FORMAT=<ID=SG,Number=.,Type=String,Description="SAMtools Consensus Genotype (diploid model)">
#CHROM    POS    ID    REF    ALT    QUAL    FILTER    INFO    FORMAT    100_sequence_1_pileup.txt    105_sequence_1_pileup.txt    10_sequence_1_pileup.txt    110_sequence_1_pileup.txt    112_sequence_1_pileup.txt    114_sequence_1_pileup.txt    118_sequence_1_pileup.txt    120_sequence_1_pileup.txt    122_sequence_1_pileup.txt    126_sequence_1_pileup.txt    130_sequence_1_pileup.txt    13_sequence_1_pileup.txt    147_sequence_1_pileup.txt    153_sequence_1_pileup.txt    154_sequence_1_pileup.txt    158_sequence_1_pileup.txt    15_sequence_1_pileup.txt    164_sequence_1_pileup.txt    168_sequence_1_pileup.txt    16_sequence_1_pileup.txt    171_sequence_1_pileup.txt    174_sequence_1_pileup.txt    179_sequence_1_pileup.txt    183_sequence_1_pileup.txt    188_sequence_1_pileup.txt    198_sequence_1_pileup.txt    1_sequence_1_pileup.txt    202_sequence_1_pileup.txt    203_sequence_1_pileup.txt    206_sequence_1_pileup.txt    208_sequence_1_pileup.txt    212_sequence_1_pileup.txt    214_sequence_1_pileup.txt    216_sequence_1_pileup.txt    218_sequence_1_pileup.txt    219_sequence_1_pileup.txt    21_sequence_1_pileup.txt    220_sequence_1_pileup.txt    22_sequence_1_pileup.txt    30_sequence_1_pileup.txt    32_sequence_1_pileup.txt    37_sequence_1_pileup.txt    38_sequence_1_pileup.txt    3_sequence_1_pileup.txt    44_sequence_1_pileup.txt    45_sequence_1_pileup.txt    49_sequence_1_pileup.txt    4_sequence_1_pileup.txt    51_sequence_1_pileup.txt    53_sequence_1_pileup.txt    57_sequence_1_pileup.txt    61_sequence_1_pileup.txt    66_sequence_1_pileup.txt    67_sequence_1_pileup.txt    69_sequence_1_pileup.txt    74_sequence_1_pileup.txt    86_sequence_1_pileup.txt    87_sequence_1_pileup.txt    90_sequence_1_pileup.txt    93_sequence_1_pileup.txt    95_sequence_1_pileup.txt    98_sequence_1_pileup.txt    9_sequence_1_pileup.txt    dam01_sequence_1_pileup.txt    dam02_sequence_1_pileup.txt    dam03_sequence_1_pileup.txt    dam04_sequence_1_pileup.txt    FGXCONTROL_sequence_1_pileup.txt    mbsire_sequence_1_pileup.txt    mzdam_sequence_1_pileup.txt    sire01_sequence_1_pileup.txt    sire02_sequence_1_pileup.txt    sire03_sequence_1_pileup.txt    w118_sequence_1_pileup.txt    w119_sequence_1_pileup.txt    w120_sequence_1_pileup.txt    w121_sequence_1_pileup.txt    w174_sequence_1_pileup.txt    w175_sequence_1_pileup.txt    w176_sequence_1_pileup.txt    w177_sequence_1_pileup.txt
gi|393925858|gb|AGTA02071966.1|    0000000739    .    G    A    121.20    PASS    NS=74:AN=2:DP=8448    GT:DP:GQ:EC:SG    0/1:262:144:116:R    1:32:93:32:A    0/1:87:42:72:R    .:0:0:0:.    .:0:0:0:.    0/1:222:167:113:R    0/1:93:128:55:R    1:77:186:77:A    0/1:207:144:124:R    1:56:42:52:A    0/1:310:104:203:R    0/1:84:29:17:R    1:153:225:153:A    1:57:149:56:A    0/1:81:127:44:R    0/1:425:110:162:R    0/1:71:117:29:R    .:0:0:0:.    0/1:66:75:53:R    0/1:130:28:103:R    1:101:193:100:A    0:32:123:0:G    1:68:180:68:A    0/1:76:0:66:A    1:30:87:30:A    0/1:72:95:54:R    .:0:0:0:.    1:28:81:28:A    1:40:117:40:A    1:15:42:15:A    1:30:87:30:A    0/1:98:129:53:R    0/1:59:131:36:R    1:93:147:90:A    1:82:189:82:A    0/1:62:28:53:R    1:121:216:121:A    1:136:225:136:A    1:131:225:131:A    0/1:79:37:66:R    0/1:82:119:34:R    0/1:105:98:75:R    1:67:179:67:A    0/1:223:160:116:R    0/1:125:126:81:R    1:147:122:136:A    0/1:30:53:25:R    0/1:176:97:151:A    0/1:167:112:109:R    0/1:145:13:119:A    0/1:76:130:38:R    1:104:206:104:A    0/1:172:129:109:R    1:104:199:104:A    1:45:132:45:A    1:35:102:35:A    1:109:211:109:A    1:53:157:53:A    1:118:220:118:A    0/1:265:166:133:R    1:67:179:67:A    0/1:65:103:48:R    0/1:130:24:109:A    0/1:285:101:195:R    0/1:208:19:162:A    0/1:295:126:189:R    0/1:288:48:221:A    .:0:0:0:.    1:141:225:141:A    0/1:166:141:99:R    0/1:213:115:137:R    1:132:225:132:A    1:126:225:126:A    1:21:60:21:A    0/1:24:123:15:R    0/1:120:129:59:R    .:9:24:0:A    1:10:27:10:A    .:0:0:0:.    1:19:54:19:A    0:15:72:0:G
gi|393925858|gb|AGTA02071966.1|    0000000781    .    G    A    120.61    PASS    NS=74:AN=2:DP=8484    GT:DP:GQ:EC:SG    0/1:264:49:148:R    0:32:123:0:G    0/1:86:105:14:G    .:0:0:0:.    .:0:0:0:.    0:222:255:0:G    0/1:93:3:38:R    0:78:255:0:G    0/1:209:4:84:G    0:56:128:0:G    0/1:313:23:108:G    0/1:85:31:68:G    0:153:255:0:G    0:57:199:0:G    0/1:82:6:38:R    0/1:426:7:263:R    0/1:71:25:42:R    .:0:0:0:.    0/1:66:63:13:G    0/1:131:110:27:G    0:101:255:0:G    1:33:58:33:A    0:69:235:0:G    0/1:76:84:11:G    0:30:117:0:G    0/1:72:33:18:G    .:0:0:0:.    0:28:111:0:G    0:41:150:0:G    0:15:72:0:G    0:30:117:0:G    0/1:98:5:45:R    0/1:59:4:23:R    0:93:253:0:G    0:84:255:0:G    0/1:62:86:9:G    0:122:255:0:G    0:136:255:0:G    0:131:255:0:G    0/1:80:97:13:G    0/1:84:29:48:R    0/1:105:42:30:G    0:66:226:0:G    0/1:224:37:107:R    0/1:126:10:46:G    0:147:255:0:G    0/1:30:45:5:G    0/1:178:239:25:G    0/1:167:33:58:G    0/1:146:160:26:G    0/1:76:14:38:R    0:106:255:0:G    0/1:172:12:63:G    0:104:255:0:G    0:45:162:0:G    0:35:132:0:G    0:110:255:0:G    0:53:187:0:G    0:118:255:0:G    0/1:262:24:131:R    0:67:229:0:G    0/1:66:24:17:G    0/1:130:161:20:G    0/1:286:82:89:G    0/1:210:159:46:G    0/1:296:39:107:G    0/1:288:154:68:G    .:0:0:0:.    0:143:255:0:G    0/1:168:7:68:R    0/1:214:29:77:G    0:132:255:0:G    0:126:255:0:G    0:21:90:0:G    0/1:24:10:9:R    0/1:122:28:63:R    .:9:54:0:G    0:10:57:0:G    .:0:0:0:.    0:19:84:0:G    1:15:42:15:A
gi|393925983|gb|AGTA02071903.1|    0000000957    .
ADD REPLY
0
Entering edit mode

K i copied your example and fixed my code : perl -lane 'if($_ =~ /^#/){print; next}else{$F[1] =~ s/^0+//; print join "\t", @F}' your.vcf > your.zero.stripped.vcf

ADD REPLY
0
Entering edit mode

This indeed WORKED ! Thanks, man!

ADD REPLY
0
Entering edit mode
10.2 years ago
mad.cichlids ▴ 140

As suggested by Zev.Kronenberg. It is because of excessive 0s in the position column. It has been fixed by Zev's one liner, i just included in the answer in case somebody else got the same problem.

perl -lane 'if($_ =~ /^#/){print; next}else{$F[1] =~ s/^0+//; print join "\t", @F}' your.vcf > your.zero.stripped.vcf
ADD COMMENT

Login before adding your answer.

Traffic: 1590 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6