Msa Preserve Strandedness
1
1
Entering edit mode
13.1 years ago
Lee Katz ★ 3.1k

I am creating a set of alignment output files from an XMFA (one per alignment), but I want to retain strandedness. Is there a format that will retain the strand? I could hack it so that it is on the FASTA defline but I want the strandedness to be preserved when running another BioPerl script.

For example, when I have a fasta defline with the strand

>sequence1 strand|+
AAAAAAAA
>sequence2 strand|-
AAAAAAAA

BioPerl will not retain the strand information. The output file will not tell me that sequence2 has been reverse-complimented. Is there a format that would retain this information?

edit Additionally, is there a format that would retain strandedness through this conversion and then through a refinement by ClustalW?

multiple format bioperl fasta • 2.2k views
ADD COMMENT
1
Entering edit mode
12.2 years ago
Hamish ★ 3.2k

Short of encoding the strand as part of the sequence identifier, something that used to be done for gene predictions where the strand would be encoded using a 'c' (Crick strand) or 'w' (Watson strand) suffix on the identifier, I am not aware of any formats that will preserve stand information through the type of processing you describe.

Probably the simplest method would be to write a file containing a mapping table of the sequence identifiers to their strand (and any other annotations you would like to preserve). The after your processing has completed, use this file of meta-data to add back the required information to your preferred output format. This has the advantage of allowing to cope with any other mangling of the data that occurs during processing (for example identifier truncation).

ADD COMMENT

Login before adding your answer.

Traffic: 1954 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6