parsing fasta file
3
1
Entering edit mode
5.1 years ago
a.rex ▴ 290

I have a fasta file that is formatted in the following way:

> gene1 
atctgtctgct 
atcgtc
at

and I want to put it in a format like the following:

> gene1            atctgtctgctatcgtcat

I am struggling to get rid of the whitespace between the lines under the header. Does anyone have any idea how I can do this in python?

fasta • 1.3k views
ADD COMMENT
3
Entering edit mode

Why? Why not leave it in fasta (a format most things already accept)?

ADD REPLY
1
Entering edit mode

As Devon is getting at, what ever method you're looking to use to 'read' this FASTA file is probably not a good one. Is it, by any chance, awk/sed/td/grep?

If you tell us what your bigger problem is that you're looking to solve, we might be able to help you there. But creating a new format, unless for a really good reason/implementation, is generally a bad idea for everyone.

ADD REPLY
2
Entering edit mode
5.1 years ago

Do you want to convert FASTA format to tab-delimited format?

Writing Python script with Biopython or Perl with BioPerl is very convenient. It can also be achieved by one or more shell commands.

I'd like to introduce the FASTA/Q toolkit SeqKit, which can do this with one command:

seqkit fxtab seqs.fa > formated.txt

Since spaces exited in the sequence of your sample data, a cleaning step was used to remove the spaces. And seqkit fx2tab outputs 3 columns for compatibility of FASTA and FASTQ, cut was used to remove the empty third column:

$ seqkit seq --remove-gaps seqs.fa | seqkit fx2tab | cut -f 1,2
gene1   atctgtctgctatcgtcat

After manipulations of the tabular format, you can use seqkit tab2fx to convert it back to FASTA format.

ADD COMMENT
1
Entering edit mode
5.1 years ago
$ pip install pyfaidx 
$ faidx --transform transposed input.fa | cut -f1,4 > out.tab

If you really want to do this, but I agree it might be better to answer the real question you might have not yet asked...

ADD COMMENT
1
Entering edit mode
5.1 years ago
rkostadi ▴ 60
fold -w 60 input.fa
ADD COMMENT

Login before adding your answer.

Traffic: 1935 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6