Question

Remove all rows that begin with characters and keep only those that begin with numbers in Linux.

0

Entering edit mode

3.0 years ago

salman_96 ▴ 70

I have a tab sep file that looks like this. I need to prepare it so I can import it in R. The file is genome hg19 file.

1      344544     rs30540

2      284783     rs34560

14     384643     rs30567

19     584643     rs31110

Genome_phase,common=1,19,genomes=hg19
11    222643     rs30543

44    544643     rs32345

Genome_phase,common=1,23,genomes=hg19

I want to keep only the rows that start with numbers and drop all others that begin with characters. It is a huge file of a few gbs.Any way to do that in linux

Any assistance will be appreciated. Regards

hg19 linux • 817 views

ADD COMMENT • link updated 3.0 years ago by Alex Reynolds 35k • written 3.0 years ago by salman_96 ▴ 70

score 1 · Answer 1 · 2021-05-02

1

Entering edit mode

3.0 years ago

Alex Reynolds 35k

$ awk -v FS="\t" -v OFS="\t" '($1 ~ /^[0-9]/)' in.txt > out.txt

ADD COMMENT • link 3.0 years ago by Alex Reynolds 35k

score 1 · Answer 2 · 2021-05-03

1

Entering edit mode

3.0 years ago

cpad0112 21k

$ sed -n '/^[0-9]/p' test.txt
$ grep '^[0-9]' test.txt

ADD COMMENT • link 3.0 years ago by cpad0112 21k