Question

Convert with respect to first column (repeated values) into rows (transposon)

0

Entering edit mode

9 months ago

waqasnayab ▴ 250

Hello,

I have a file like (just a snippet of 2 sample IDs):

Sample Name Target Name CT
17BE0195802 LOC100996437-Hs04403548_m1  Undetermined
17BE0195802 LINC01372-Hs01911576_s1 Undetermined
17BE0195802 CD27-AS1-Hs05048951_s1  33.012
17BE0195802 COMMD10-Hs01024000_m1   33.817
17BE0195802 ABAT-Hs00609436_m1  34.328
17BE0195802 PPARGC1A-Hs00173304_m1  Undetermined
17BE0195802 VN1R2-Hs00545195_s1 Undetermined
17BE0195802 NR_051985.1-APXGXPZ 29.600
17BE0195802 TNFSF10-Hs00921974_m1   33.189
17BE0195802 MEST-Hs00853380_g1  31.964
17BE0195802 SNORA81-APWC343 31.774
17BE0195802 SNORA64-Mm04336067_s1   Undetermined
17BE0195802 RNA45S5-Hs05057753_g1   31.831
17BE0195802 LINC00905-Hs01126887_g1 Undetermined
17BE0195802 FAM72C-Hs00822131_m1    Undetermined
17BE0195802 DTX2P1-UPK3BP1-PMS2P11-Hs03653706_m1    Undetermined
17BE0195802 ADAM32-Hs00991935_m1    35.367
17BE0195802 LOC100996437-Hs04403548_m1  Undetermined
17BE0195802 LINC01372-Hs01911576_s1 Undetermined
17BE0195802 CD27-AS1-Hs05048951_s1  32.527
17BE0195802 COMMD10-Hs01024000_m1   Undetermined
17BE0195802 ABAT-Hs00609436_m1  33.963
17BE0195802 PPARGC1A-Hs00173304_m1  Undetermined
17BE0195802 VN1R2-Hs00545195_s1 Undetermined
17BE0195802 NR_051985.1-APXGXPZ 28.917
17BE0195802 TNFSF10-Hs00921974_m1   32.797
17BE0195802 MEST-Hs00853380_g1  31.926
17BE0195802 SNORA81-APWC343 32.128
17BE0195802 SNORA64-Mm04336067_s1   Undetermined
17BE0195802 RNA45S5-Hs05057753_g1   30.637
17BE0195802 LINC00905-Hs01126887_g1 Undetermined
17BE0195802 FAM72C-Hs00822131_m1    34.719
17BE0195802 DTX2P1-UPK3BP1-PMS2P11-Hs03653706_m1    Undetermined
17BE0195802 ADAM32-Hs00991935_m1    Undetermined
17BE0196102 LOC100996437-Hs04403548_m1  32.596
17BE0196102 LINC01372-Hs01911576_s1 Undetermined
17BE0196102 CD27-AS1-Hs05048951_s1  35.263
17BE0196102 COMMD10-Hs01024000_m1   Undetermined
17BE0196102 ABAT-Hs00609436_m1  Undetermined
17BE0196102 PPARGC1A-Hs00173304_m1  Undetermined
17BE0196102 VN1R2-Hs00545195_s1 Undetermined
17BE0196102 NR_051985.1-APXGXPZ 30.639
17BE0196102 TNFSF10-Hs00921974_m1   33.113
17BE0196102 MEST-Hs00853380_g1  33.556
17BE0196102 SNORA81-APWC343 33.803
17BE0196102 SNORA64-Mm04336067_s1   Undetermined
17BE0196102 RNA45S5-Hs05057753_g1   33.795
17BE0196102 LINC00905-Hs01126887_g1 Undetermined
17BE0196102 FAM72C-Hs00822131_m1    Undetermined
17BE0196102 DTX2P1-UPK3BP1-PMS2P11-Hs03653706_m1    Undetermined
17BE0196102 ADAM32-Hs00991935_m1    Undetermined
17BE0196102 LOC100996437-Hs04403548_m1  Undetermined
17BE0196102 LINC01372-Hs01911576_s1 Undetermined
17BE0196102 CD27-AS1-Hs05048951_s1  34.599
17BE0196102 COMMD10-Hs01024000_m1   Undetermined
17BE0196102 ABAT-Hs00609436_m1  Undetermined
17BE0196102 PPARGC1A-Hs00173304_m1  Undetermined
17BE0196102 VN1R2-Hs00545195_s1 Undetermined
17BE0196102 NR_051985.1-APXGXPZ 30.521
17BE0196102 TNFSF10-Hs00921974_m1   32.791
17BE0196102 MEST-Hs00853380_g1  32.778
17BE0196102 SNORA81-APWC343 35.372
17BE0196102 SNORA64-Mm04336067_s1   Undetermined
17BE0196102 RNA45S5-Hs05057753_g1   33.783
17BE0196102 LINC00905-Hs01126887_g1 Undetermined
17BE0196102 FAM72C-Hs00822131_m1    Undetermined
17BE0196102 DTX2P1-UPK3BP1-PMS2P11-Hs03653706_m1    Undetermined
17BE0196102 ADAM32-Hs00991935_m1    Undetermined

As you can 17BE0195802 is repeated 34 times, since the experiment is repeated on 17 biomarkers (starting from LOC100996437-Hs04403548_m1 to ADAM32-Hs00991935_m1, then the same order is repeated again another time for the same sample).

My desired output would be:

               LOC100996437-Hs04403548_m1   LINC01372-Hs01911576_s1 CD27-AS1-Hs05048951_s1  COMMD10-Hs01024000_m1   ABAT-Hs00609436_m1  PPARGC1A-Hs00173304_m1  VN1R2-Hs00545195_s1 NR_051985.1-APXGXPZ TNFSF10-Hs00921974_m1   MEST-Hs00853380_g1  SNORA81-APWC343 SNORA64-Mm04336067_s1   RNA45S5-Hs05057753_g1   LINC00905-Hs01126887_g1 FAM72C-Hs00822131_m1    DTX2P1-UPK3BP1-PMS2P11-Hs03653706_m1    ADAM32-Hs00991935_m1
17BE0195802 their corresponding values  their corresponding values  .   .                                                   
17BE0195802 their corresponding values  their corresponding values  .   .                                                   
17BE0196102 their corresponding values  their corresponding values  .   .                                                   
17BE0196102 their corresponding values  their corresponding values  .                                                       '

using the thread: https://www.unix.com/shell-programming-and-scripting/158305-convert-columns-into-rows-respect-first-column.html I came up with the following output:

nawk '{a[$1]=($1 in a)?a[$1] OFS sprintf("%-*s",w,$3):sprintf("%-*s",w,$3);h[$2]}END{for(i in h) printf("\t%-*s", w,i);print "";for(i in a) print i,a[i]}' OFS='\t' waqas_trial.txt

Picture is attached .

As I have repeated experimental conditions, I am not able to write two lines for each sample. Your kind help is highly appreciated.

Thanks,

Waqas.

qPCR transposons • 628 views

ADD COMMENT • link updated 9 months ago by Trivas ★ 1.7k • written 9 months ago by waqasnayab ▴ 250

score 0 · Answer 1 · 2023-07-12

0

Entering edit mode

9 months ago

biofalconch ★ 1.1k

You could use R and tidyr, something like this:

library(tidyr)
File <- read.table("File.txt",header=T,sep="\t")
CoolMatrix <- pivot_wider(File,  names_from = "Sample Name", values_from = "Target Name")

Just a note: If you are using single spaces as separators, you might want to change your column names as they also include them :)

ADD COMMENT • link 9 months ago by biofalconch ★ 1.1k

0

Entering edit mode

Thanks biofalconch

But this is the best which I got:

> CoolMatrix <- pivot_wider(File,  names_from = "Target.Name", values_from = "CT")

Attaching the output

Looking for your cooperation.

Waqas.

ADD REPLY • link 9 months ago by waqasnayab ▴ 250

0

Entering edit mode

This won't get OP's desired output since they want to have two rows per sample name, I'm assuming for each replicate. I would probably add _1 or _2 to the sample name using some wizardry with duplicated() which would then make your above code work.

ADD REPLY • link 9 months ago by Trivas ★ 1.7k

score 0 · Answer 2 · 2023-07-13

Thanks Travis.

Even when I use _1 or _2, the output is:

enter image description here

Is there any way to ask whenever there is 18th line of the same sample ID occurrence, consider it as a second single line of the same sample and write it below the same sample?:

            LOC100996437-Hs04403548_m1   LINC01372-Hs01911576_s1 CD27-AS1-Hs05048951_s1  COMMD10-Hs01024000_m1   ABAT-Hs00609436_m1  PPARGC1A-Hs00173304_m1  VN1R2-Hs00545195_s1 NR_051985.1-APXGXPZ TNFSF10-Hs00921974_m1   MEST-Hs00853380_g1  SNORA81-APWC343 SNORA64-Mm04336067_s1   RNA45S5-Hs05057753_g1   LINC00905-Hs01126887_g1 FAM72C-Hs00822131_m1    DTX2P1-UPK3BP1-PMS2P11-Hs03653706_m1    ADAM32-Hs00991935_m1
17BE0195802 their corresponding values  their corresponding values  .   .                                                   
17BE0195802 their corresponding values  their corresponding values  .   .                                                   
17BE0196102 their corresponding values  their corresponding values  .   .                                                   
17BE0196102 their corresponding values  their corresponding values  .                                                       '

Waqas.