##contig=<ID=Cla97Chr01,length=36935898,assembly=97103_genome_v2.fa>
##contig=<ID=Cla97Chr02,length=37915939,assembly=97103_genome_v2.fa>
##contig=<ID=Cla97Chr03,length=31872261,assembly=97103_genome_v2.fa>
##contig=<ID=Cla97Chr04,length=27110815,assembly=97103_genome_v2.fa>
##contig=<ID=Cla97Chr05,length=35887987,assembly=97103_genome_v2.fa>
##contig=<ID=Cla97Chr06,length=29507460,assembly=97103_genome_v2.fa>
##contig=<ID=Cla97Chr07,length=31939013,assembly=97103_genome_v2.fa>
##contig=<ID=Cla97Chr08,length=28201227,assembly=97103_genome_v2.fa>
##contig=<ID=Cla97Chr09,length=37727573,assembly=97103_genome_v2.fa>
##contig=<ID=Cla97Chr10,length=35099344,assembly=97103_genome_v2.fa>
##contig=<ID=Cla97Chr11,length=30886124,assembly=97103_genome_v2.fa>
##contig=<ID=Cla97Scf001,length=233319,assembly=97103_genome_v2.fa>
##contig=<ID=Cla97Scf002,length=184572,assembly=97103_genome_v2.fa>
##contig=<ID=Cla97Scf003,length=114230,assembly=97103_genome_v2.fa>
##contig=<ID=Cla97Scf004,length=101662,assembly=97103_genome_v2.fa>
##contig=<ID=Cla97Scf005,length=94675,assembly=97103_genome_v2.fa>
##contig=<ID=Cla97Scf006,length=84208,assembly=97103_genome_v2.fa>
##contig=<ID=Cla97Scf007,length=81764,assembly=97103_genome_v2.fa>
##contig=<ID=Cla97Scf008,length=72605,assembly=97103_genome_v2.fa>
##contig=<ID=Cla97Scf009,length=71887,assembly=97103_genome_v2.fa>
##contig=<ID=Cla97Scf010,length=71582,assembly=97103_genome_v2.fa>
##contig=<ID=Cla97Scf011,length=65584,assembly=97103_genome_v2.fa>
##contig=<ID=Cla97Scf012,length=60969,assembly=97103_genome_v2.fa>
##contig=<ID=Cla97Scf013,length=58660,assembly=97103_genome_v2.fa>
##contig=<ID=Cla97Scf014,length=54089,assembly=97103_genome_v2.fa>
##contig=<ID=Cla97Scf015,length=52299,assembly=97103_genome_v2.fa>
##contig=<ID=Cla97Scf016,length=46704,assembly=97103_genome_v2.fa>
##contig=<ID=Cla97Scf017,length=44379,assembly=97103_genome_v2.fa>
##contig=<ID=Cla97Scf018,length=42155,assembly=97103_genome_v2.fa>
##contig=<ID=Cla97Scf019,length=33490,assembly=97103_genome_v2.fa>
##contig=<ID=Cla97Scf020,length=31069,assembly=97103_genome_v2.fa>
##contig=<ID=Cla97Scf021,length=30825,assembly=97103_genome_v2.fa>
##contig=<ID=Cla97Scf022,length=28329,assembly=97103_genome_v2.fa>
##contig=<ID=Cla97Scf023,length=27981,assembly=97103_genome_v2.fa>
##contig=<ID=Cla97Scf024,length=25595,assembly=97103_genome_v2.fa>
##contig=<ID=Cla97Scf025,length=25293,assembly=97103_genome_v2.fa>
##contig=<ID=Cla97Scf026,length=24210,assembly=97103_genome_v2.fa>
##contig=<ID=Cla97Scf027,length=23408,assembly=97103_genome_v2.fa>
##contig=<ID=Cla97Scf028,length=21950,assembly=97103_genome_v2.fa>
##contig=<ID=Cla97Scf029,length=20859,assembly=97103_genome_v2.fa>
##contig=<ID=Cla97Scf030,length=20840,assembly=97103_genome_v2.fa>
##contig=<ID=Cla97Scf031,length=19595,assembly=97103_genome_v2.fa>
##contig=<ID=Cla97Scf032,length=19375,assembly=97103_genome_v2.fa>
##contig=<ID=Cla97Scf033,length=18359,assembly=97103_genome_v2.fa>
##contig=<ID=Cla97Scf034,length=17995,assembly=97103_genome_v2.fa>
##contig=<ID=Cla97Scf035,length=17722,assembly=97103_genome_v2.fa>
##contig=<ID=Cla97Scf036,length=17502,assembly=97103_genome_v2.fa>
##contig=<ID=Cla97Scf037,length=16522,assembly=97103_genome_v2.fa>
##contig=<ID=Cla97Scf038,length=14416,assembly=97103_genome_v2.fa>
##contig=<ID=Cla97Scf039,length=13492,assembly=97103_genome_v2.fa>
##contig=<ID=Cla97Scf040,length=12677,assembly=97103_genome_v2.fa>
##contig=<ID=Cla97Scf041,length=12562,assembly=97103_genome_v2.fa>
##contig=<ID=Cla97Scf042,length=12490,assembly=97103_genome_v2.fa>
##contig=<ID=Cla97Scf043,length=12451,assembly=97103_genome_v2.fa>
##contig=<ID=Cla97Scf044,length=12340,assembly=97103_genome_v2.fa>
##contig=<ID=Cla97Scf045,length=12169,assembly=97103_genome_v2.fa>
##contig=<ID=Cla97Scf046,length=11870,assembly=97103_genome_v2.fa>
##contig=<ID=Cla97Scf047,length=11623,assembly=97103_genome_v2.fa>
##contig=<ID=Cla97Scf048,length=11207,assembly=97103_genome_v2.fa>
##contig=<ID=Cla97Scf049,length=10862,assembly=97103_genome_v2.fa>
##contig=<ID=Cla97Scf050,length=10717,assembly=97103_genome_v2.fa>
##contig=<ID=Cla97Scf051,length=10458,assembly=97103_genome_v2.fa>
##contig=<ID=Cla97Scf052,length=9933,assembly=97103_genome_v2.fa>
##contig=<ID=Cla97Scf053,length=9867,assembly=97103_genome_v2.fa>
##contig=<ID=Cla97Scf054,length=9408,assembly=97103_genome_v2.fa>
##contig=<ID=Cla97Scf055,length=9274,assembly=97103_genome_v2.fa>
##contig=<ID=Cla97Scf056,length=8131,assembly=97103_genome_v2.fa>
##contig=<ID=Cla97Scf057,length=7996,assembly=97103_genome_v2.fa>
##contig=<ID=Cla97Scf058,length=7778,assembly=97103_genome_v2.fa>
##contig=<ID=Cla97Scf059,length=7696,assembly=97103_genome_v2.fa>
##contig=<ID=Cla97Scf060,length=7606,assembly=97103_genome_v2.fa>
##contig=<ID=Cla97Scf061,length=7597,assembly=97103_genome_v2.fa>
##contig=<ID=Cla97Scf062,length=7489,assembly=97103_genome_v2.fa>
##contig=<ID=Cla97Scf063,length=7365,assembly=97103_genome_v2.fa>
##contig=<ID=Cla97Scf064,length=7286,assembly=97103_genome_v2.fa>
##contig=<ID=Cla97Scf065,length=7025,assembly=97103_genome_v2.fa>
##contig=<ID=Cla97Scf066,length=6976,assembly=97103_genome_v2.fa>
##contig=<ID=Cla97Scf067,length=6921,assembly=97103_genome_v2.fa>
##contig=<ID=Cla97Scf068,length=6573,assembly=97103_genome_v2.fa>
##contig=<ID=Cla97Scf069,length=6212,assembly=97103_genome_v2.fa>
##contig=<ID=Cla97Scf070,length=5790,assembly=97103_genome_v2.fa>
##contig=<ID=Cla97Scf071,length=5405,assembly=97103_genome_v2.fa>
##contig=<ID=Cla97Scf072,length=5173,assembly=97103_genome_v2.fa>
##contig=<ID=Cla97Scf073,length=5121,assembly=97103_genome_v2.fa>
##contig=<ID=Cla97Scf074,length=4784,assembly=97103_genome_v2.fa>
##contig=<ID=Cla97Scf075,length=4023,assembly=97103_genome_v2.fa>
##contig=<ID=Cla97Scf076,length=3821,assembly=97103_genome_v2.fa>
##contig=<ID=Cla97Scf077,length=3610,assembly=97103_genome_v2.fa>
##contig=<ID=Cla97Scf078,length=3275,assembly=97103_genome_v2.fa>
##contig=<ID=Cla97Scf079,length=2787,assembly=97103_genome_v2.fa>
##contig=<ID=Cla97Scf080,length=2584,assembly=97103_genome_v2.fa>
##contig=<ID=Cla97Scf081,length=2456,assembly=97103_genome_v2.fa>
##contig=<ID=Cla97Scf082,length=2421,assembly=97103_genome_v2.fa>
##contig=<ID=Cla97Scf083,length=2406,assembly=97103_genome_v2.fa>
##contig=<ID=Cla97Scf084,length=2401,assembly=97103_genome_v2.fa>
##contig=<ID=Cla97Scf085,length=2127,assembly=97103_genome_v2.fa>
##contig=<ID=Cla97Scf086,length=2108,assembly=97103_genome_v2.fa>
##contig=<ID=Cla97Scf087,length=1913,assembly=97103_genome_v2.fa>
##contig=<ID=Cla97Scf088,length=1838,assembly=97103_genome_v2.fa>
##contig=<ID=Cla97Scf089,length=1798,assembly=97103_genome_v2.fa>
##contig=<ID=Cla97Scf090,length=1621,assembly=97103_genome_v2.fa>
##contig=<ID=Cla97Scf091,length=1603,assembly=97103_genome_v2.fa>
##contig=<ID=Cla97Scf092,length=1527,assembly=97103_genome_v2.fa>
##contig=<ID=Cla97Scf093,length=1513,assembly=97103_genome_v2.fa>
##contig=<ID=Cla97Scf094,length=1512,assembly=97103_genome_v2.fa>
##contig=<ID=Cla97Scf095,length=1462,assembly=97103_genome_v2.fa>
##contig=<ID=Cla97Scf096,length=890,assembly=97103_genome_v2.fa>
##contig=<ID=Cla97Scf097,length=807,assembly=97103_genome_v2.fa>
##contig=<ID=Cla97Scf098,length=802,assembly=97103_genome_v2.fa>
##contig=<ID=Cla97Scf099,length=787,assembly=97103_genome_v2.fa>
##contig=<ID=Cla97Scf100,length=623,assembly=97103_genome_v2.fa>
##contig=<ID=Cla97Scf101,length=600,assembly=97103_genome_v2.fa>
##contig=<ID=Cla97Scf102,length=504,assembly=97103_genome_v2.fa>
I'm researching plants, and I use snpeff to perform annotations on crops such as peppers and radishes. In the case of plants, Chromosomes are often arbitrary, and in many cases, the length must be obtained by directly removing them.
The method I want to use is the method of obtaining the length of Chromosome in Meta information while being Annotated.
I also tried to find a way to filter, sort
1 column, uniq
, and then grep -w
to get it, but it fails frequently because there are contigs remaining after the filter.
I'm trying to check through vcftools or bcftools, but I can't seem to find the direction I want so far.
In the picture, Cla97Chr is the Chromosome information I want, and Cla07Scf is Contig.
I want to extract only the information that contains Chr without removing them one by one.
Please advise so I can choose what I want.
Thank you in advance
Not sure if this is what you are looking for. Following one-liner will extract chromosome names and their lengths.
If it is then I can move this comment to an answer.
Thank you very much for the reply. However, I wanted to extract only the Chromosome, and the purpose was to extract it from other vcfs other than the example file. Again, thanks for the reply.
If you only need to get the entries that have
Chr
in them then you can use the following. This solution should work with any plain text VCF file.Your code is not a perfect answer, but I think it will be of great help to the scripts I'm making. Thanks again for your advice.