Question: Filtering 50 biggest contigs
0
gravatar for stan.aanhane
6 weeks ago by
stan.aanhane30
stan.aanhane30 wrote:

Hi everyone,

After performing a novo assembly, with the followed command, i want to filter the biggest 50 contigs.

spades.py --untrusted-contigs lclav_genome.fa -1 randomnietnfectedFP.fastq.gz -2 randomnietinfectedRP.fastq.gz -t 2 -m 28 NINnovo --phred-offset 33

This creates a directory with the contigs in it. This file is sorted from biggest to smallest, and we want just the top 50 of these contigs. I have tried something with awk, but it is not working how i want it to. CAn someone help me out?

Thank you!

linux novo • 87 views
ADD COMMENTlink written 6 weeks ago by stan.aanhane30
1

You could convert multiline fasta to single line using Multiline Fasta To Single Line Fasta and then extract the first 100 lines using head which should extract the top 50 contigs for you.

ADD REPLYlink written 6 weeks ago by Sej Modha4.7k

Don't forget to change them back to fasta format.

ADD REPLYlink written 6 weeks ago by genomax87k

See past threads for inspiration:
How To Filter Multi Fasta By Length??
how to rearrange fasta file according to its length (add a filter for 50)

ADD REPLYlink written 6 weeks ago by genomax87k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 820 users visited in the last hour