Script for extracting atomic position of nucleotide base
1
0
Entering edit mode
9.1 years ago
vahapel ▴ 210

Hi everyone.

I have a tab-delimited tabular file (indicated below) including information about the atomic positions of nucleotide bases. My question is that how can I get first 10 lines of every 20000 lines in a datasheet has 10^7 lines. Basically, is there any script for such a purpose?

BaseAtomNumber        atomic distances    NumberofNeighbour    IndexofAtom
1                     1.94895             655                  153   
1                     2.34545             566                  543
..
..

Many thanks in advance for your help!

next-gen Assembly genome • 1.7k views
ADD COMMENT
2
Entering edit mode
9.1 years ago
george.ry ★ 1.2k

Assuming your files have a single line header that needs stripping first, as shown, then something like:

tail -n+2 <yourfile> \
| split -l 20000 - <yourprefix> \
&& find <yourprefix>* -exec bash -c 'head -n10 {}' \; \
> <youroutfile> \
&& rm <yourprefix>*

Strips the header, splits the file into separate files of size 20k lines, takes the top 10 rows of each to an output file and then deletes the intermediate files afterwards (make sure nothing else shares <yourprefix>*, or it'll be deleted too).

ADD COMMENT
0
Entering edit mode

We tried this and it works well, thanks for your help.

ADD REPLY

Login before adding your answer.

Traffic: 1928 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6