Question: Remove gap from files
0
gravatar for skjobs1234
5 weeks ago by
skjobs12340
skjobs12340 wrote:

I want remove the long unaligned gap from the files with position wise. I have input life this files

 >f1
--------------------------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------------GTTYGVC
SKAFKFLGTPADTGHGTVVLELQYTGTDGPCKVPISSVASLNDLTPVGRLVTVNPFVSVA

>f2
--------------------------------------------------------------------------------------------------------------------
-----------------------------MVHRQWFFDLPLPWA-----------------------------------------GTTYGMC
TEKFSFAKNPADTGHGTVVIELSYSGSDGPCKIPIVSVASLNDMTPVGRLVTVNPFVATS

>f3
MRCVGVGNRDFVEGLSGATWVDVVLEHGGCVTTMAKNKPTLDIELQKTEATQLATLRKLC
IEGKITNITTDSRCPTQGEAVLPEEQDQNYVCKHTYVDRGWGNGCGLFGKGSLVTCAKFQ
CLEPIEGKVVQYENLKYTVIITVHTGDQHQVGNETQGVTAEITPQASTTEAILPEYGTLGGG

And I want the output like this

>f1
------------------------------------------------------------------------------------------------------GTTYGVC
SKAFKFLGTPADTGHGTVVLELQYTGTDGPCKVPISSVASLNDLTPVGRLVTVNPFVSVA

>f2
-----------------------------MVHRQWFFDLPLPWA-----------------------------------------GTTYGMC
TEKFSFAKNPADTGHGTVVIELSYSGSDGPCKIPIVSVASLNDMTPVGRLVTVNPFVATS

>f3
IEGKITNITTDSRCPTQGEAVLPEEQDQNYVCKHTYVDRGWGNGCGLFGKGSLVTCAKFQ
CLEPIEGKVVQYENLKYTVIITVHTGDQHQVGNETQGVTAEITPQASTTEAILPEYGTLGGG

So equal number of gaps are removed from f1 and f2 gaps as well as from f3 remove the one line on the basis of position removed from f1 and f2.

Thanks in advance

sequence alignment • 285 views
ADD COMMENTlink modified 4 weeks ago by shoujun.gu240 • written 5 weeks ago by skjobs12340
5

I want remove the long unaligned gap from the files with position wise

why ? sounds like http://xyproblem.info/

ADD REPLYlink written 5 weeks ago by Pierre Lindenbaum99k
1

How about Gblocks: http://molevol.cmima.csic.es/castresana/Gblocks/Gblocks_documentation.html

ADD REPLYlink written 5 weeks ago by Sej Modha2.0k

I have used this Gblocks and trimal but both are not suitable for your data set. Please help how to write the perl or python script

ADD REPLYlink written 5 weeks ago by skjobs12340

I want remove the long unaligned gap from the files with position wise

Do you know the positions of the gaps to remove? Are they based on the blast result?

ADD REPLYlink written 5 weeks ago by st.ph.n1.8k

No. Please guide Yes.. On the based on blast result. I have got the aligned file PROMAL3D to get aligned file.. yes the position is known.

ADD REPLYlink written 5 weeks ago by skjobs12340

provide examples of the positions in which to remove gaps

ADD REPLYlink written 5 weeks ago by st.ph.n1.8k

Example Input file

>f1

------------------------------------------------------------------------------------------------------GTTYGVC SKAFKFLGTPADTGHGTVVLELQYTGTDGPCKVPISSVASLNDLTPVGRLVTVNPFVSVA

>f2

-----------------------------MVHRQWFFDLPLPWA-----------------------------------------GTTYGMC TEKFSFAKNPADTGHGTVVIELSYSGSDGPCKIPIVSVASLNDMTPVGRLVTVNPFVATS

f3 MRCVGVGNRDFVEGLSGATWVDVVLEHGGCVTTMAKNKPTLDIELQKTEATQLATLRKLC IEGKITNITTDSRCPTQGEAVLPEEQDQNYVCKHTYVDRGWGNGCGLFGKGSLVTCAKFQ CLEPIEGKVVQYENLKYTVIITVHTGDQHQVGNETQGVTAEITPQASTTEAILPEYGTLGGG

Outpur file example

f1 ------------------------------------------------------------------------------------------------------GTTYGVC SKAFKFLGTPADTGHGTVVLELQYTGTDGPCKVPISSVASLNDLTPVGRLVTVNPFVSVA

f2 -----------------------------MVHRQWFFDLPLPWA-----------------------------------------GTTYGMC TEKFSFAKNPADTGHGTVVIELSYSGSDGPCKIPIVSVASLNDMTPVGRLVTVNPFVATS

f3 IEGKITNITTDSRCPTQGEAVLPEEQDQNYVCKHTYVDRGWGNGCGLFGKGSLVTCAKFQ CLEPIEGKVVQYENLKYTVIITVHTGDQHQVGNETQGVTAEITPQASTTEAILPEYGTLGGG

ADD REPLYlink written 5 weeks ago by skjobs12340
1

why you keep the gaps at the beginning of f1 and f2 output? how many gaps do you want to keep?

ADD REPLYlink modified 4 weeks ago • written 4 weeks ago by shoujun.gu240

This does not look like you want to remove gaps but just remove the new line characters on the fasta header line.

ADD REPLYlink modified 4 weeks ago • written 4 weeks ago by genomax34k

This question is unrelated to Perl or Blast so I removed those tags.

ADD REPLYlink written 4 weeks ago by Brian Bushnell14k

Hello skjobs1234!

It appears that your post has been cross-posted to another site: https://bioinformatics.stackexchange.com/questions/2524/r

This is typically not recommended as it runs the risk of annoying people in both communities.

ADD REPLYlink written 4 weeks ago by Pierre Lindenbaum99k
3
gravatar for shoujun.gu
4 weeks ago by
shoujun.gu240
Rockville/MD
shoujun.gu240 wrote:
  1. save the following code in a file called biostar.py (or any other name)
  2. in shell, run: python3 biostar.py input_file output_file
  3. note this code may not work on really big file due to the memory problem

code:

import sys

inp=sys.argv[1]
output=sys.argv[2]

count=0
tempcount=0
with open(inp, 'r') as file:
    for line in file:
        if line[0]=='>':
            if count<tempcount:
                count=tempcount
            tempcount=0
        elif set(line)=={'-','\n'}:
            tempcount=tempcount+1
if count<tempcount:
    count=tempcount

with open(inp, 'r') as file2:
    lines=file2.read().split('>')[1:]

list=[]
i=count+1
for fa in lines:
    fasta=fa.split('\n')
    list=list+['>'+fasta[0]]+fasta[i:]
list=[i+'\n' for i in list if i]

with open(output, 'w') as out:
    out.writelines(list)
ADD COMMENTlink modified 4 weeks ago • written 4 weeks ago by shoujun.gu240

I'm getting this error on Terminal

**Traceback (most recent call last):

File "biostar.py", line 3, in <module>

inp=sys.argv[1]

IndexError: list index out of range**

ADD REPLYlink modified 4 weeks ago • written 4 weeks ago by skjobs12340
1

did you provide the input and output file's absolute path in the command line? or could you copy the command you run here?

ADD REPLYlink modified 4 weeks ago • written 4 weeks ago by shoujun.gu240

Yes.. It's working.. Thank for your valuable suggestion and time

ADD REPLYlink written 4 weeks ago by skjobs12340

Dear Shoujun your script is running well. But the out is not coming which i want. Please see once more this problem. Thanking for helping me

Actually your script is not removing gap properly, I want to remove only unaligned (gap) region from > files. I want to give a brief example with output

tem1 ------------------------------------------------------FHLTTR GGEPHMIVSKQERGKSLLFKTSAGVNMCTLIAMDLGELCEDTMTYKCPRITETEPDDVDC

>tem2

--------------------------TPVECFEPSMLKKKQLTVLDLHPG-G-KTRRVLP

query

MNNQRKKTGKPSINMLKRVRNRVSTGSQLAKRFSKGLLNGQGPMKLVMAFIAFLRFLAIP PTAGVLARWGTFKKSGAIKVLKGFKKEISNMLSIINQRKKTSLCLMMILPAALAFHLTSR

I want output like this

tem1 FHLTTR GGEPHMIVSKQERGKSLLFKTSAGVNMCTLIAMDLGELCEDTMTYKCPRITETEPDDVDC

>tem2

--------------------------TPVECFEPSMLKKKQLTVLDLHPG-G-KTRRVLP

query

MNNQR PTAGVLARWGTFKKSGAIKVLKGFKKEISNMLSIINQRKKTSLCLMMILPAALAFHLTSR

Condition 1 So you can see here, if the gaps are found at same position in both tem1 & tem2 then remove gaps and also remove query (it doesn't matter gap or not but it should be remove equal number of sequences with same position)

Condition 2 if the if gaps found more than 20 residues in all >tem at the same position then it should remove.

ADD REPLYlink modified 4 weeks ago • written 4 weeks ago by skjobs12340
1

The format of your given example is messed up. I cannot really understand how you want to remove the gap.

Could you just upload as fig?

ADD REPLYlink written 4 weeks ago by shoujun.gu240

Yes I can

please give me email ID?

You can ping on my email id

redacted (see below)

ADD REPLYlink modified 4 weeks ago by genomax34k • written 4 weeks ago by skjobs12340
1

You can upload to the forum. Or send to my email: redacted

ADD REPLYlink modified 4 weeks ago by genomax34k • written 4 weeks ago by shoujun.gu240

Personal email addresses are not shared on Biostars.

ADD REPLYlink written 4 weeks ago by genomax34k

Please use the "101" button in the edit window to apply code formatting to you text. Do not use spaces and (") icon to format text. I tried to format your example above but can't make exact sense of what you have/need.

ADD REPLYlink written 4 weeks ago by genomax34k

I want to remove All Positions Containing A Gap In A Multiple Alignment by perl, python or any other scripting language

ADD REPLYlink written 4 weeks ago by skjobs12340

If you want to remove all gaps from the file:

sed -i 's/-//g' input.fasta

Otherwise, you need to know the positions, for specific gaps if you're not removing all of them. The OP is vague on how you intend to identify gaps to be removed.

ADD REPLYlink written 4 weeks ago by st.ph.n1.8k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1304 users visited in the last hour