how to calculate the number of chromosomes in the human annotation GTF file using Linux command?
2
1
Entering edit mode
10.0 years ago
Y Tb ▴ 230

I am trying to check the GTF annotation file for human, and I found that in first column which is the chromosome names just 1,2,........X,Y without chr so since the file is big I want to know how can I calculate the number of each chromosome (I mean how many 1, and how many2 ,..........etc). Is there any Linux command to do that. BTW I used the command

cut -f 1 .gtf_file > test

to cut the first column and save it in test file.

RNA-Seq • 5.8k views
ADD COMMENT
1
Entering edit mode

look at the sort and uniq command. You may also find grep useful, specially the -v option that you can use to remove headers and comment lines.

ADD REPLY
6
Entering edit mode
10.0 years ago

cut -f 1 foo.gtf | sort | uniq -c

I should note that this will tell you how many records there are in total for each chromosome or contig. If you instead wanted to know how many genes then it'd take a more involved script.

ADD COMMENT
2
Entering edit mode
10.0 years ago
Jason ▴ 920
awk '{print $1}' foo.gtf | sort | uniq -c

The first answer is correct, but I thought I'd share another, it just uses awk instead.

ADD COMMENT
0
Entering edit mode

+1. Awk is good to know.

ADD REPLY
0
Entering edit mode

+1 from me too. awk is one of those indispensable tools that everyone that does file processing should know at least a little about.

ADD REPLY

Login before adding your answer.

Traffic: 1941 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6