Question: how to calculate the number of chromosomes in the human annotation GTF file using Linux command?
1
gravatar for Y Tb
5.2 years ago by
Y Tb150
USA
Y Tb150 wrote:

I am trying to check  the GTF annotation file for human, and I found that in first column which is the chromosome names just 1,2,........X,Y without chr so since the file is big I want to know how can I calculate the number of each chromosome (I mean how many 1, and how many2 ,..........etc). Is there any Linux command to do that. BTW I used the command

   cut -f 1 .gtf_file > test

to cut the first column and save it in test file.

rna-seq • 2.9k views
ADD COMMENTlink modified 5.2 years ago by Jason890 • written 5.2 years ago by Y Tb150
1

look at the sort and uniq command. You may also find grep useful, specially the -v option that you can use to remove headers and comment lines.

ADD REPLYlink written 5.2 years ago by Giovanni M Dall'Olio26k
5
gravatar for Devon Ryan
5.2 years ago by
Devon Ryan90k
Freiburg, Germany
Devon Ryan90k wrote:

cut -f 1 foo.gtf | sort | uniq -c

I should note that this will tell you how many records there are in total for each chromosome or contig. If you instead wanted to know how many genes then it'd take a more involved script.

ADD COMMENTlink modified 5.2 years ago • written 5.2 years ago by Devon Ryan90k
2
gravatar for Jason
5.2 years ago by
Jason890
United States
Jason890 wrote:

awk '{print $1}' foo.gtf | sort | uniq -c

 

The first answer is correct, but I thought I'd share another, it just uses awk instead. 

ADD COMMENTlink written 5.2 years ago by Jason890

+1. Awk is good to know.

ADD REPLYlink written 5.2 years ago by Alex Reynolds28k

+1 from me too. awk is one of those indispensable tools that everyone that does file processing should know at least a little about.

ADD REPLYlink written 5.2 years ago by Devon Ryan90k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1101 users visited in the last hour