LAST alignment: different strands for single CDS query?
0
0
Entering edit mode
10.2 years ago
ksamuk • 0

I am aligning CDS sequences I downloaded from Ensembl to some draft assemblies someone has published. I used LAST to align then, and it appears to have worked, see example here:

# name start alnSize strand seqSize alignment
# batch 1
a score=54 EG2=6.6e-09 E=8.8e-16
s L_fuelleborni.EQ177488.1 627 100 + 1093 GTGCTCCACTCTGCTCATTGTAGGCGATAATTCTCCAGCTGTGGAAGCTGTGGTGGACTGCAACTCTAA
s O_niloticus.ENSONIG00000002093_ENSONIT00000002611 704 100 + 1074 GTGCCCTTCTCTGCTGGTGGTTGGCGATAGCTCTCCTGCCGTGGAGGCCGTGGTTGAGTGCAACACTAA
a score=42 EG2=0.0034 E=4.7e-10
s L_fuelleborni.EQ177488.1 278 148 + 1093 AGGATGAAATCCTGACCAATCACGACCTCATCGCCACATACCGCCACCGCatcacaacaacaatgaACC
s O_niloticus.ENSONIG00000002093_ENSONIT00000002611 541 148 + 1074 AGGAGGAAATCCACCACAACCATGATCTAATCGCCACATACCGCCACCACATCATGAATGACATGAACC
a score=41 EG2=0.01 E=1.4e-09
s L_fuelleborni.ABPK01036261.1 317 93 + 1071 ACTGACCTTTGAAGCAGCCCAGTCAATCCAGCCTTCAGCACACGGGTCGACATTAATGAGTACCAGCCCC
s O_niloticus.ENSONIG00000002093_ENSONIT00000002611 582 93 - 1074 ACTGATCTTGTGTGCAGCCCAGTCCATCCATCCCTCAGCACACGAGTTGATGTTGATGAGAACAAGGCCT
view raw gistfile1.txt hosted with ❤ by GitHub

What I can't figure out is that for most of the query CDS's ("O_niloticus" lines in the example), some of the transcript aligns to the + strand of the assembly, and others to - strand. This seems to be the case only when a single CDS aligns to multiple different contigs (e.g. L_fuelleborni.EQ177488 and L_fuelleborni.ABPK01036261.1 are different contigs...I think), i.e. all of the CDS alignments within a contig seem to be on the same strand.

So, my question is: why would this be the case? These are fairly closely related species, so I don't think chimerism/rearrangement is plausible (again, its nearly every CDS). Is it possible the orientation of the contigs assembly is not yet determined? If that's the case can I just reverse compliment the - strand alignments and tack them (with spaces) together with the + strand bit? The end application is a FASTA for each query CDS containing all the matching sequences from a number of other species (for PAML).

Any insight would be greatly appreciated!

blast alignment • 1.8k views
ADD COMMENT

Login before adding your answer.

Traffic: 2064 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6