Question: How to generate a combined read count txt file with header as file name
0
gravatar for Bioinfonext
14 days ago by
Bioinfonext160
Korea
Bioinfonext160 wrote:

I do have multiple txt file for RNAseq read count, is it possible to generate a single txt file with the file name as column header;

txt file having read count like this: so first column is same in all files.

BGIOSGA000001   0
BGIOSGA000002   12
BGIOSGA000003   0
BGIOSGA000004   0
BGIOSGA000005   0
BGIOSGA000006   0
BGIOSGA000007   0
BGIOSGA000008   15

and txt file name are like this:

Root_T3_S_R7_S56_L001.COUNT.txt
Leaf_T2_F_R5_S8_L001.COUNT.txt

so I want out put like this:

                  Root_T3_S_R7_S56       Leaf_T2_F_R5_S8

BGIOSGA000001         0                           4
BGIOSGA000002        12                           0
BGIOSGA000003         0                           3
BGIOSGA000004         0                           2
BGIOSGA000005         0                           4

I will be thankful for your help.

Kind Regards, Bioinfonext

linux awk bash R • 79 views
ADD COMMENTlink written 14 days ago by Bioinfonext160
1

You could have used featureCounts which does this when you feed it multiple BAM's on command line. featureCounts options BAM1 BAM2 BAM3. Provide them in the same order you want to group them by so you you don't need to mess with columns afterwards.

ADD REPLYlink written 14 days ago by genomax71k

Hi genomax,

I used HTSeq for read count and I am having like 60 read count txt files.

Thanks Bioinfonext

ADD REPLYlink written 14 days ago by Bioinfonext160

Consider redoing the counts with featureCounts. You would be done with creating the count matrix in less time than it is going to take you to deal with 60 separate files :-)

ADD REPLYlink written 13 days ago by genomax71k
echo -e '\tfile1\tfile2' && join -t $'\t' -1 1 -2 1 <(sort -t  $'\t' -k1,1 file1.txt) <(sort -t  $'\t' -k1,1 file2.txt)
ADD REPLYlink modified 14 days ago • written 14 days ago by Pierre Lindenbaum123k

Hi Pierre,

I am having 60 read count txt file so should I keep adding all like you have shown with two files.

Thanks Bioinfonext

ADD REPLYlink modified 14 days ago • written 14 days ago by Bioinfonext160
2
gravatar for lieven.sterck
13 days ago by
lieven.sterck5.9k
VIB, Ghent, Belgium
lieven.sterck5.9k wrote:

something I wrote a while back (aka, there is likely a better/more efficient approach ;) )

n=0
for i in *.txt
do
echo $n
name=`echo $i | sed 's/_L001*//g'` 
echo -e "ID\t$name" > ${i}_tmp
head -n-1 $i | cut -f 1,2 | sort -k1 >> ${i}_tmp
((n++))
done

paste *_tmp > tmpOK
rm -f *_tmp

c="-f1"
for j in $(seq $n)
 do
 d=`expr 2 \* $j`
 c=$c,$d
done
echo $c

cut $c tmpOK > final_file
ADD COMMENTlink written 13 days ago by lieven.sterck5.9k

thanks Lieven, your script works perfectly.

Thanks Again bioinfonext

ADD REPLYlink written 13 days ago by Bioinfonext160
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2080 users visited in the last hour