Entering edit mode
4.4 years ago
User000
▴
690
I have 200 txt files with 1 line containing the name of the read. The command line below finds the intersection.
cat *.mapped.txt | sort | uniq -d > intersection.out
How to find unique reads among these 200 files?
My files are called:
accepted.name.mapped.txt
...
The reads are like this:
HISEQ1:105:C0A57ACXX:2:1105:12172:84568
HISEQ1:105:C0A57ACXX:2:1108:17762:41110
HISEQ1:105:C0A57ACXX:2:1204:3007:9349
HISEQ1:105:C0A57ACXX:2:1204:11087:160507
HISEQ1:105:C0A57ACXX:2:1301:18982:79651
HISEQ1:105:C0A57ACXX:2:1307:3766:23853
Yes, I was thinking about this, but I wasn't sure....will it be very slow on my 200 files 800 MB each?
should be faster:
sort -m *.mapped.txt | uniq -u > uniques.txt may be this one works as well...