Question: Trimming gapped ends from a multiple alignment
0
gravatar for marta.brozynska
4.1 years ago by
marta.brozynska0 wrote:

Hi! I have a following problem. I have a multiple alignment that looks something like that:

SpeciesA --GTACCTAGGTACCT
SpeciesB AGGT---TAGGTACCT
SpeciesC AGGTACCTAGGT----

 

What I want to get is the following:

SpeciesA GTACCTAGGT
SpeciesB GT---TAGGT
SpeciesC GTACCTAGGT

i.e. gaps at the ENDS of the alignment trimmed off, the internal gaps preserved and all the sequences of the same length.

 

I’ve been looking everywhere to find a tool to help me but no luck so far. Most of the programmes like Gblocks, trimAl, t-coffee and similar can remove gapped columns but throughout the entire alignment. Due to that I lose the internal indels and the resulting alignment is like that:

SpeciesA GTTAGGT
SpeciesB GTTAGGT
SpeciesC GTTAGGT

Which I’m not interested in.

I know I can do it manually but I have about 5,000 alignment files… If I spent a minute on editing each of them I’d probably need 3,5 days of manual editing non-stop…  That's way I'd appreciate any help VERY VERY much!

Cheers,

 

alignment • 2.1k views
ADD COMMENTlink modified 3.0 years ago by Suzanne60 • written 4.1 years ago by marta.brozynska0

You can edit the alignment and delete columns manually in Jalview and ClustalX.

ADD REPLYlink written 4.1 years ago by Pappu1.9k

Pretty sure Stack Overflow could give you a great awk one-liner in less than one minute. If they do, please post it back here as well :)

ADD REPLYlink modified 4.1 years ago • written 4.1 years ago by 5heikki8.6k
1
gravatar for 5heikki
4.1 years ago by
5heikki8.6k
Finland
5heikki8.6k wrote:

I wrote a very ugly Bash script that should do what you want (not tested at all, assumes there are no spaces in names and names are separated from the alignments by a single space). Name it e.g. trimEnds.sh, then chmod +x trimEnds.sh and use it by ./trimEnds.sh alignmentFile

#!/bin/bash
#separate the alignment from the names
cut -f2 -d " " $1 > $1.ali
cut -f1 -d " " $1 > $1.names

#Set start at 1
START=1

#Set end at alignment length
END=$(head -n1 $1.ali | awk '{print length}')

#Find out where the good stuff starts
for (( i=$START; i<=$END; i++ ))
    do
    nonLetterCount=$(cut -c$i $1.ali | tr -d [A,T,G,C] | grep . | grep -c .)
    if [ "$nonLetterCount" -gt "0" ]
        then
        (( START = $i + 1 ))
    else
        break
    fi
    done

#Find out where the good stuff ends
for (( i=$END; i>=$START; i-- ))
        do
    nonLetterCount=$(cut -c$i $1.ali | tr -d [A,T,G,C] | grep . | grep -c .)
        if [ "$nonLetterCount" -gt "0" ]
                then
                (( END = $i - 1 ))
        else
                break
        fi
    done

#Reunite names with seqs
paste -d " " $1.names <(cut -c$START-$END $1.ali) > $1.mod

#Remove unnecessary files
rm $1.names $1.ali
ADD COMMENTlink modified 4.1 years ago • written 4.1 years ago by 5heikki8.6k

THANKS A LOT! The script works for me and does exactly what I wanted. I run it on a few different cases of alignments and worked like a dream on each and every one of them. I just had to play a little bit with the formats but wasn't difficult. You save me a lot of time and nerves!

Thanks again for your time and effort!

ADD REPLYlink written 4.1 years ago by marta.brozynska0

Hi

Now I have the same problem. I tried using the shell script but I am getting the error. "cut: invalid decreasing range" What would be the reason?

ADD REPLYlink written 3.2 years ago by ThulasiS60

My guess is that your data is not in the same format as OP's.

ADD REPLYlink written 3.2 years ago by 5heikki8.6k

Oh.. Maybe.. But I edited to be one line fasta like in example. Still the same result.. Thank you

ADD REPLYlink written 3.2 years ago by ThulasiS60

Hi I am trying this script but I got this error cut: invalid byte, character or field list. any suggestion what is the problem?

ADD REPLYlink written 3.0 years ago by a.moner0

I am trying it but I have this error cut: invalid byte, character or field list do you have any suggestion

ADD REPLYlink written 3.0 years ago by a.moner0
0
gravatar for Pappu
4.1 years ago by
Pappu1.9k
Pappu1.9k wrote:

You can edit the alignment and delete columns manually in Jalview and ClustalX.

ADD COMMENTlink written 4.1 years ago by Pappu1.9k

Only if I didn't have more than 5,000 files to edit... I'm looking for a way to avoid doing this manually.

ADD REPLYlink modified 4.1 years ago • written 4.1 years ago by marta.brozynska0
0
gravatar for Suzanne
3.0 years ago by
Suzanne60
Dundee, Scotland
Suzanne60 wrote:

Here is link to a Jalview video about editing sequences: https://youtu.be/ZI4NW6kQeHc. It is part of 'Selecting and Editing Sequences' in Jalview YouTube playlist https://www.youtube.com/playlist?list=PLpU3VZmUmrT27N2Uy32qszCOoznvN-07N.

ADD COMMENTlink written 3.0 years ago by Suzanne60
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1781 users visited in the last hour