Question: curating file with sort(largest to smallest) and then extract unique values
0
gravatar for sanjalidhole
4 months ago by
sanjalidhole0 wrote:

i have a file with library-ID-count as follows: `*

Searching cagcaccaccaagauucacau*            
CC2_B ta_iwgsc_7bs_v1_3148968_39029 33  
CC2_B ta_iwgsc_7bs_v1_3150171_39041 38  
CC2_D ta_iwgsc_7ds_v1_3966917_41463 156 
CC3_B ta_iwgsc_7bs_v1_3148968_41273 56
CC2_A ta_iwgsc_6al_v1_5830987_31258 18  
CC2_B ta_iwgsc_6bl_v1_4279451_30909 18  
CC2_D ta_iwgsc_6dl_v1_3311975_32342 18  
CI2_A ta_iwgsc_6al_v1_5830987_27002 30  
CI2_B ta_iwgsc_6bl_v1_4279451_26849 30  
CI2_D ta_iwgsc_6dl_v1_3311975_28474 30  
*Found(s) in 6 file(s)*         

*Searching ugccuggcucccugaaugcca*
CC2_B ta_iwgsc_6bs_v1_1636307_32644 3275    
CC2_B ta_iwgsc_6bs_v1_1636307_32645 3575
CC3_B ta_iwgsc_6bs_v1_1636307_34610 3449    
CI1_B ta_iwgsc_6bs_v1_1636307_28706 3509            
CC2_A ta_iwgsc_7as_v1_4255214_39664 1809    
CC2_B ta_iwgsc_7bs_v1_3149865_39035 1809    
CC2_D ta_iwgsc_7ds_v1_3850348_38998 1809    
*Found(s) in 3 file(s)*

` i want each library(CC1_A,CC2_B ETC) to have its highest count,as you can see the counts differ for same library. and print each library with following format for each seperate block(paragraph) :

'Searching cagcaccaccaagauucacau:
CC2_B 38
CC2_D 156
CC3_B 56
CC2_A 18
CI1_A 30
CI1_B 30
CI1_D 30
Searching ugccuggcucccugaaugcca:
CC2_B 3575
CC3_B  3449 
CI1_B 3509          
CC2_A 1809  
CC2_B 1809  
CC2_D 1809 '
next-gen • 179 views
ADD COMMENTlink modified 4 months ago • written 4 months ago by sanjalidhole0

please click button 101010 to format code or file content!

ADD REPLYlink written 4 months ago by shenwei3563.1k

similar post: how to remove rows based on certain characters

ADD REPLYlink written 4 months ago by shenwei3563.1k
2
gravatar for shenwei356
4 months ago by
shenwei3563.1k
China
shenwei3563.1k wrote:

With the help of rush (parallelly execute shell commands. A GNU parallel like tool in Go. It supports Linux/OS X/Windows!) Love it so much...

$ cat d.txt | rush -d "\n" -D "file(s)*" -T b \
    'echo "{1}"; \
    echo "{}" | sed 1d | sed "$ d" | \
        sort -k 1,1 -k 3,3nr | sort -k 1,1 -u | cut -d " " -f 1,3;\
    echo '

*Searching cagcaccaccaagauucacau*            
CC2_A 18
CC2_B 38
CC2_D 156
CC3_B 56
CI2_A 30
CI2_B 30
CI2_D 30

*Searching ugccuggcucccugaaugcca*
CC2_A 1809
CC2_B 3575
CC2_D 1809
CC3_B 3449
CI1_B 3509

Limitation: it may fail due to limit of parameters of echo when the content of a block (your library) is too long.

ADD COMMENTlink modified 4 months ago • written 4 months ago by shenwei3563.1k

thanks alot its working fine

ADD REPLYlink written 4 months ago by sanjalidhole0

Since you write that this answer is working you should accept it since it solves your question.
I have now accepted this answer as accepted, but please keep this in mind for next time when people are spending time to help you out.

ADD REPLYlink modified 4 months ago • written 4 months ago by WouterDeCoster20k

i tried it and tht wrks its good i got my solution and people helped me.but i have also tried it my way as i should also learn by myself how to slove a difficulty and then i gave it a try .. i think thts a positive attitude and no harm to anyone.thnks

ADD REPLYlink written 3 months ago by sanjalidhole0
1

Fixing things on your own is great, but preferably you should try that before opening a question. And people who spend time helping you with a working solution also deserve recognition for that.

ADD REPLYlink written 3 months ago by WouterDeCoster20k
2
gravatar for sanjalidhole
4 months ago by
sanjalidhole0 wrote:

ADD COMMENTlink modified 3 months ago • written 4 months ago by sanjalidhole0
1

100% working without any flaws:

Are you sure you aren't a bit overconfident? How can you claim there are no flaws, have you tested all possible use cases?

ADD REPLYlink written 4 months ago by WouterDeCoster20k

its working fine for my case:) no offence abt confidence

ADD REPLYlink modified 3 months ago • written 3 months ago by sanjalidhole0
0
gravatar for shenwei356
4 months ago by
shenwei3563.1k
China
shenwei3563.1k wrote:

reference: how to remove rows based on certain characters

$ cat data.tsv 
CC2_B   ta_iwgsc_7bs_v1_3148968_39029   33
CC2_B   ta_iwgsc_7bs_v1_3150171_39041   38
CC2_D   ta_iwgsc_7ds_v1_3966917_41463   156
CC3_B   ta_iwgsc_7bs_v1_3148968_41273   56
CC2_A   ta_iwgsc_6al_v1_5830987_31258   18
CC2_B   ta_iwgsc_6bl_v1_4279451_30909   18
CC2_D   ta_iwgsc_6dl_v1_3311975_32342   18
CI2_A   ta_iwgsc_6al_v1_5830987_27002   30
CI2_B   ta_iwgsc_6bl_v1_4279451_26849   30
CI2_D   ta_iwgsc_6dl_v1_3311975_28474   30

$ cat data.tsv | sort -t $'\t' -k 1,1 -k 3,3nr | sort -t $'\t' -k 1,1 -u | cut -f 1,3
CC2_A   18
CC2_B   38
CC2_D   156
CC3_B   56
CI2_A   30
CI2_B   30
CI2_D   30

ADD COMMENTlink modified 4 months ago • written 4 months ago by shenwei3563.1k

but i need each block to give me separate results particularly.........nt my file as whole

i want each library(CC1_A,CC2_B ETC) to have its highest count,as you can see the counts differ for same library. and print each library with following format for each seperate block(paragraph) :

ADD REPLYlink modified 4 months ago • written 4 months ago by sanjalidhole0

oh, no, a little bug. corrected now.

ADD REPLYlink written 4 months ago by shenwei3563.1k

i want seperate results delimited by Searching cagcaccaccaagauucacau for file single file data.tsv**

Searching cagcaccaccaagauucacau:
    CC2_B 38
    CC2_D 156
    CC3_B 56
    CC2_A 18
    CI1_A 30
    CI1_B 30
    CI1_D 30
    Searching ugccuggcucccugaaugcca:
    CC2_B 3575
    CC3_B  3449 
    CI1_B 3509          
    CC2_A 1809  
    CC2_B 1809  
    CC2_D 1809
ADD REPLYlink modified 4 months ago • written 4 months ago by sanjalidhole0

you have to write scripts by yourself. it's not hard.

ADD REPLYlink written 4 months ago by shenwei3563.1k

highest count is not considered

ADD REPLYlink written 4 months ago by sanjalidhole0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 473 users visited in the last hour