Unique lines among 200 files
1
0
Entering edit mode
4.4 years ago
User000 ▴ 690

I have 200 txt files with 1 line containing the name of the read. The command line below finds the intersection.

cat *.mapped.txt | sort | uniq -d > intersection.out

How to find unique reads among these 200 files?

My files are called:

accepted.name.mapped.txt
...

The reads are like this:

HISEQ1:105:C0A57ACXX:2:1105:12172:84568
HISEQ1:105:C0A57ACXX:2:1108:17762:41110
HISEQ1:105:C0A57ACXX:2:1204:3007:9349
HISEQ1:105:C0A57ACXX:2:1204:11087:160507
HISEQ1:105:C0A57ACXX:2:1301:18982:79651
HISEQ1:105:C0A57ACXX:2:1307:3766:23853
bash NGS • 582 views
ADD COMMENT
2
Entering edit mode
4.4 years ago
cat *.mapped.txt | sort | uniq -u

??

ADD COMMENT
0
Entering edit mode

Yes, I was thinking about this, but I wasn't sure....will it be very slow on my 200 files 800 MB each?

ADD REPLY
1
Entering edit mode

should be faster:

cat *.mapped.txt | LC_ALL=C sort -T .  --buffer-size=5G | LC_ALL=C uniq -u
ADD REPLY
0
Entering edit mode

sort -m *.mapped.txt | uniq -u > uniques.txt may be this one works as well...

ADD REPLY

Login before adding your answer.

Traffic: 2657 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6