Question: htseq-counts output merge into one matrix ??
1
gravatar for bioinforupesh2009.au
5.6 years ago by
Spain
bioinforupesh2009.au110 wrote:

Dear all,

I just need a little help to merge my all features counts into one matrix. I have counted features using htseq-counts and now want to merge into one file like....

ID     c1 c2 c3..............t1 t2 t2.......etc

 

The problem is that i have 36 sample and corresponding counts so i am lazy to write ...

paste *.txt | cut -d -f1,2,4,6,8...............................72 (because each file contain two column i.e ID and counts )

and moreover, apart from laziness, one problem also happened here that while pasting its not in sequence  like :

paste 1.txt 2.txt 3.txt............

 

Help me please...

 

ADD COMMENTlink modified 14 months ago by Ahmed Alhendi40 • written 5.6 years ago by bioinforupesh2009.au110
2

Finally got a solution to this problem

#!/bin/bash

FILES=$(ls -t -v *.txt | tr '\n' ' ');

awk 'NF > 1{ a[$1] = a[$1]"\t"$2} END {for( i in a ) print i a[i]}' $FILES > merged.tmp

 

This woks like you asked.

ADD REPLYlink written 5.6 years ago by EagleEye6.6k

Thank you man for your kind reply but i have already done, i am sorry for the delay. Was just needed to change the file name like 1.txt -> 01.txt, 2.txt -> 02.txt....................... etc and after i used your suggested awk code and worked fine.

By the way thanks a million.

ADD REPLYlink modified 5.6 years ago • written 5.6 years ago by bioinforupesh2009.au110
1
gravatar for EagleEye
5.6 years ago by
EagleEye6.6k
Sweden
EagleEye6.6k wrote:

This could be the simple way for as many as files:

awk 'NF > 1{ a[$1] = a[$1]"\t"$2} END {for( i in a ) print i a[i]}' read_counts/*.gene_counts.htseq > merged.htseq

ADD COMMENTlink written 5.6 years ago by EagleEye6.6k
1

This outputs: Merge htseq files into one based on geneNames. Sample output

ENSG00000251243.1    0    0    0    0    0
ENSG00000240637.2    0    0    0    0    0
ENSG00000227366.1    0    0    0    0    0
ENSG00000104903.4    665    216    884    3254    689
ENSG00000228626.1    0    0    0    0    1

 

Forgot to mention, the order of pasted column will be in FileName order.

 

Like if you do listing using unix command 'ls -l read_counts/'  

You will get the order in which it has merged.

ADD REPLYlink modified 5.6 years ago • written 5.6 years ago by EagleEye6.6k

Dear Santhilal,

Your code is working as mine too...but the main problem what i mentioned on question is.......ITS NOT IN SEQUENCE.

 

I NEED...

 

ID     C1 C1 C3.............UPTO 36

 

your code or mine work as terminal shows the directory by hit of ls

 

10.txt

11.txt

31.txt

12.txt

13.txt

4.txt

3.txt

1.txt

20.txt

...

....

....

36.txt

 

However, i tried by

for i in $(seq 1 36); do awk 'NF > 1{ a[$1] = a[$1]"\t"$2} END {for( i in a ) print i a[i]}' $i.txt > merged.htseq; done

 

but its not working ???

ADD REPLYlink modified 5.6 years ago • written 5.6 years ago by bioinforupesh2009.au110

you can try this 

ls -l -v | awk '{print $9}'

and then AWK. I don't know whether it will work. But you can try it.

ADD REPLYlink written 5.6 years ago by EagleEye6.6k

Thanks but while using this only its work but not with advised final awk code.

i used...but not worked for me.

ls -l -v | awk '{print $9}' | awk 'NF > 1{ a[$1] = a[$1]"\t"$2} END {for( i in a ) print i a[i]}' *.txt > merged.htseq

however, i have done by my way........

Thanks anyway....very kind of you. :))

by the way.....

Wishing you a very happy and prosperous Diwali!!

ADD REPLYlink written 5.6 years ago by bioinforupesh2009.au110

Could please provide your working solution?

ADD REPLYlink written 2.7 years ago by Ric290
1
gravatar for Devon Ryan
5.6 years ago by
Devon Ryan95k
Freiburg, Germany
Devon Ryan95k wrote:

I wrote this a while ago. It can also change the file labels.

Note that it's so easy to just do this in R that there's really little reason to manually merge files like this.

ADD COMMENTlink modified 5.6 years ago • written 5.6 years ago by Devon Ryan95k
1

and if you use DESeq2 you can simply give the file names in input ( see DESeq2 vignette 1.2.4 HTseq input )

ADD REPLYlink written 5.6 years ago by Nicolas Rosewick8.8k
1

Exactly, and even if you're using something like edgeR, it's simple enough to just use the same method employed by DESeq2 (namely, lapply() reading files and then lapply a function to get the counts out).

ADD REPLYlink written 5.6 years ago by Devon Ryan95k
1

i prefer first by terminal or bash.

 

Thanks by the for your valuable answer.

ADD REPLYlink written 5.6 years ago by bioinforupesh2009.au110
1
gravatar for Arindam Ghosh
24 months ago by
Arindam Ghosh250
India
Arindam Ghosh250 wrote:

I managed to do this by R. The script is available here. It works for me. Please try and check if it works fine.

ADD COMMENTlink written 24 months ago by Arindam Ghosh250
0
gravatar for bmahesh07
3.8 years ago by
bmahesh070
United States
bmahesh070 wrote:

I found a useful R script (htseq-combine_all.R script) for merging HT-seq counts at the following resource http://wiki.bits.vib.be/index.php/NGS_RNASeq_DE_Exercise.4

ADD COMMENTlink written 3.8 years ago by bmahesh070

I tried the the above R script (htseq-combine_all.R script) but I ran into an error

ADD REPLYlink written 2.7 years ago by Ric290

You can use my R script on https://github.com/AAlhendi1707/htseq-merge/blob/master/htseq-merge_all.R

ADD REPLYlink modified 14 months ago • written 14 months ago by Ahmed Alhendi40
0
gravatar for oriolebaltimore
15 months ago by
United States
oriolebaltimore130 wrote:

Is it considered best practice to sum up reads across replicates or to run HTSeq on merged BAM File after ccombing fastqs. Thanks Adrian

ADD COMMENTlink written 15 months ago by oriolebaltimore130
1

No, you should use hts-seq on each replicates otherwise there is no meaning of replicates. By the way, you could use "featureCounts tool" for each bam at a time. Then, further you don't need to merge as its give you a matrix of count for each sample of bam.

Hope this helps Good luck.

ADD REPLYlink written 15 months ago by unique37980
0
gravatar for Ahmed Alhendi
14 months ago by
University of Southampton, UK
Ahmed Alhendi40 wrote:

This is easily done using R script! You can use my R script on https://github.com/AAlhendi1707/htseq-merge/blob/master/htseq-merge_all.R

For more details about how to generate and combine HTSeq outputs into one read count matrix in R https://ahmedalhendi0.wordpress.com/2019/03/20/combine-htseq-outputs-into-one-read-count-matrix-in-r/

ADD COMMENTlink modified 14 months ago • written 14 months ago by Ahmed Alhendi40
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1226 users visited in the last hour