Merging many files with having same columns with bash script
1
0
Entering edit mode
4.2 years ago
mel22 ▴ 100

Hi Please I would like to merge a big number of .txt files in one. They all have the same columns , how to merge and keep only the header from the first ? It's a big number of files so I think I will need to use xargs ?! These are examples of what I have in my files :

CH        SNP   A1  A2  N   P
20        SNP2  A   G   2   3.70E-03
19        SNP3  C   T   2   6.47E-03




CH       SNP    A1  A2  N   P
10       SNP1   G   C   2   3.11E-05

In total I have about 100000 files to merge with one header. What I ant to have is :

    CH        SNP   A1  A2  N   P
    20        SNP2  A   G   2   3.70E-03
    19        SNP3  C   T   2   6.47E-03
    10      SNP1    G   C   2   3.11E-05

Thanks !

bash • 924 views
ADD COMMENT
1
Entering edit mode

Not really a bioinfo question I would say.

Can you add why it might relate to bioinformatics anyway?

And how would you like to see those files merged? It's always helpful to add a small example of what you try to achieve

ADD REPLY
1
Entering edit mode

Ok I updated my post . Thanks

ADD REPLY
1
Entering edit mode
4.2 years ago
rm -f out.txt
find dir1/dir2 -type f -name "*blabla.tsv" | while read F
do
  if [ -f out.txt ] ; then
      tail +2 $F >> out.txt 
  else
     cp $F out.txt
  fi
done
ADD COMMENT
0
Entering edit mode

Thank you very much

ADD REPLY
0
Entering edit mode

I tried this solution but I still have a header line between merged files. My files are .txt space separated

ADD REPLY
1
Entering edit mode

ah, looking at your example ONE file contains the header line more that one time ? isn't it ?

ADD REPLY
1
Entering edit mode

I just tested it:

$  seq 1 10 | while read F ; do echo -e "CHR\n$F" > $F.tmp.txt; done 
$  rm -f out.txt && find . -type f -name "*.tmp.txt" | while read F; do   if [ -f out.txt ] ; then       tail +2 $F >> out.txt ;   else      cp $F out.txt;   fi; done
$ cat out.txt 
CHR
10
4
5
9
3
1
2
6
8
7
ADD REPLY
0
Entering edit mode

They are all outputs from PLINK, so they have all same structure

ADD REPLY
1
Entering edit mode

it doesn't answer my question.

ADD REPLY
0
Entering edit mode

I mean they are exactly same and no one has the header line more that one time.

ADD REPLY

Login before adding your answer.

Traffic: 1613 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6