fasta filtering & sorting for SAGE [String-overlap Assembly of GEnomes]
1
0
Entering edit mode
8.6 years ago
adhb • 0

Hi all,

I have a single fasta file with paired-end reads intended for mitochondrial SAGE de novo assembly [String-overlap Assembly of Genomes, not Serial Analysis of Gene Expression]. I've gotten it through the correction software RACER already, but there are some lingering format issues I need to clear up to run SAGE.

Unix/perl solutions preferred.

(1) Remove all reads that aren't 90 bases long (discard or write into new file)

(2) Remove unpaired reads - i.e., remove those reads for which the ID does not exactly match any other ID in the file (discard or write into new file)

(3) Reorder reads alphabetically so the forward and reverse reads are interleaved

Sorry to post a multi-part problem, but I think it's a set of simple tasks that I can't find leads for in other posts. Help on one or more task would be greatly appreciated.

next-gen Assembly genome • 1.5k views
ADD COMMENT
0
Entering edit mode
8.6 years ago
h.mon 35k

Use programs that are aware and respect paired reads, so you do not have to worry about (2) and (3). My current Swiss-knife is BBTools - reformat.sh should do all you want.

ADD COMMENT

Login before adding your answer.

Traffic: 2304 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6