perl script for mapping two files
0
0
Entering edit mode
5.5 years ago

How can I map the two files one is output of getorf and another is cpc output, i want a single file as output which has score < -1 from cpc output and seq which has greater than 30 amino acids from getorf output as a single output file

RNA-Seq perl • 1.4k views
ADD COMMENT
4
Entering edit mode

Please insert input examples and expected output. What did you try to resolve your issue ?

If you found an answer to your last issue please close the thread or detailed your question.

ADD REPLY
0
Entering edit mode

cpcout- I want headers whose score <-1 example file :

hg38_ct_UserTrack_3545_CUFF.4097.1  27422   noncoding   -1.00386
hg38_ct_UserTrack_3545_CUFF.4100.1  1203    noncoding   -0.783661
hg38_ct_UserTrack_3545_CUFF.5257.2  75281   noncoding   -0.560148
hg38_ct_UserTrack_3545_CUFF.5257.1  93082   noncoding   -1.12595
hg38_ct_UserTrack_3545_CUFF.5651.1  2893    noncoding   -0.697666
hg38_ct_UserTrack_3545_CUFF.6611.1  68571   noncoding   -0.576123
hg38_ct_UserTrack_3545_CUFF.7487.1  142900  noncoding   -0.784945
hg38_ct_UserTrack_3545_CUFF.9288.1  170566  noncoding   -0.721422
hg38_ct_UserTrack_3545_CUFF.133.1   129494  noncoding   -0.726058
hg38_ct_UserTrack_3545_CUFF.649.1   50162   noncoding   -0.676869
getorf output- I want sequences >30 aminoacids:

example file:

>hg38_ct_UserTrack_3545_CUFF.4031.2_1 [35 - 292] range=chr1:117367430-117524283 5'pad=0 3'pad=0 strand=+ repeatMasking=none_len156854
QQRRRPRRLAGLRCFGRYCPSRQPSSGQQRDSAGRQVPARQRREARAAENSFLLLRPAPL
LRLPSEVGDLPPCLQTSVGDPYFFHR
>hg38_ct_UserTrack_3545_CUFF.4031.2_2 [19 - 336] range=chr1:117367430-117524283 5'pad=0 3'pad=0 strand=+ repeatMasking=none_len156854
QTRAVATAAAAAAAGRTQVFRTLLPFAPAVEWAAAGLSRAPGSCQAAPGSAGGRELLPAT
SPSAAASASQRSGRPSSLFADVRGRPLFFPPLRLRDSGIEASKEIK
>hg38_ct_UserTrack_3545_CUFF.4031.2_3 [317 - 463] range=chr1:117367430-117524283 5'pad=0 3'pad=0 strand=+ repeatMasking=none_len156854
KRRRRSSEPSTTPRMSPVSLSGRKTTFEHLTEVQEWKNPPCSFSAVWLA
>hg38_ct_UserTrack_3545_CUFF.4031.2_4 [3 - 506] range=chr1:117367430-117524283 5'pad=0 3'pad=0 strand=+ repeatMasking=none_len156854
KFRSLTDPCGSNSGGGRGGWPDSGVSDAIALRASRRVGSSGTQPGARFLPGSAGKRGRPR
TPSCYFAQRRCFGFPAKWETFLPVCRRPWETLIFSTAKVKRFWNRSVEGDQVNLLQLLGC
RQSPFRGGRLRLSISLRCRNGRTHLAAFLQCGLPDLPLGMKRRLVIIR

i need to map these two outputs into a single file as output

ADD REPLY
0
Entering edit mode

Please, modify your primary post to add these input example. Not as an answer like you did. Also, try to format your input examples it is quite unreadable.

i need to map these two outputs into a single file as output

You mean doing a join on the id (hg38_ct_UserTrack_3545_CUFF.4031.2_1) ?

After filtering, what do you want as output ? cpcout results + getorf results like :

hg38_ct_UserTrack_3545_CUFF.4097.1 27422 noncoding -1.00386 [3 - 506] range=chr1:117367430-117524283 5'pad=0 3'pad=0 strand=+ repeatMasking=none_len156854 KFRSLTDPCGSNSGGGRGGWPDSGVSDAIALRASRRVGSSGTQPGARFLPGSAGKRGRPRTPSCYFAQRRCFGFPAKWETFLPVCRRPWETLIFSTAKVKRFWNRSVEGDQVNLLQLLGCRQSPFRGGRLRLSISLRCRNGRTHLAAFLQCGLPDLPLGMKRRLVIIR

?

Share what you tried to solve your problem so we could start from this base.

ADD REPLY
0
Entering edit mode

NO. there are 240 files of both outputs i need to map correponding 2 files as single output

ADD REPLY
1
Entering edit mode

Note that to get help from people with non existant knowledge of what you are doing you need to create a simplify example of what you want first.

Take time to simplify your problems or you will lose time trying to explain them to someone. Also, try to simplify unusefull informations

I'll make an example for you :

Let's say you have 2 files to confront

cpcout result :

cpcout_id1    cpcout_Info1.1    cpcout_Info2.1    cpcout_Score1
cpcout_id2    cpcout_Info1.2    cpcout_Info2.2    cpcout_Score2
....

getorf result :

getorf_id1    getorf_Info1.1    getorf_Info2.1    getorf_Info3.1    getorf_Info4.1    getorf_Info5.1    getorf_Info6.1    getorf_Sequence1
getorf_id2    getorf_Info1.2    getorf_Info2.2    getorf_Info3.2    getorf_Info4.2    getorf_Info5.2    getorf_Info6.2    getorf_Sequence2
....

So if cpcout_id1 and getorf_id1 are identical with cpcout_Score1 < -1 and length(getorf_Sequence1) > 30, what do you want as output ? Something like this :

cpcout_id1    cpcout_Info1.1    cpcout_Info2.1    cpcout_Score1    getorf_Info1.1    getorf_Info2.1    getorf_Info3.1    getorf_Info4.1    getorf_Info5.1    getorf_Info6.1    getorf_Sequence1

?

ADD REPLY
0
Entering edit mode

Your question is terribly bad explained and you are not helpful at all. Please take some time to explain, including examples, what you want and what you tried.

ADD REPLY

Login before adding your answer.

Traffic: 2898 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6