Question

Give a list of contigs as input to CheckM

0

Entering edit mode

2.1 years ago

alexander.brg • 0

Hello! I have run metabat to separate my metagenomic contigs into bins and obtained a number of files that contain a list of all contig names that belong to a particular bin, like this:

Example bin_1:

k105_10322
k105_20691
k105_133304
k105_31104
...

Now, I would like to assess the quality of my binning using CheckM. From all examples I have seen previously, CheckM wants to have fasta-files as input.

Is it possible to provide a list of the contigs in a bin and a fasta-file with the sequences from all bins as input to CheckM? The file with all the sequences looks like this:

>k105_92090 flag=1 multi=2.0000 len=532
TAACTT...
>k105_102322 flag=1 multi=2.0000 len=528
GGAAGA...
>k105_92091 flag=1 multi=2.0000 len=409
AAAAAA...
>k105_92092 flag=1 multi=2.0000 len=332
TGAATC...
>k105_102323 flag=1 multi=1.0000 len=455
GAATAC...
...

The other option I see is to use grep/awk and extract the contigs from the file that contains all of them, but that would be a bit of a hassle...

Thank you for your help!

binning metagenomics metabat • 825 views

ADD COMMENT • link updated 2.1 years ago by andres.firrincieli 3.6k • written 2.1 years ago by alexander.brg • 0

0

Entering edit mode

Just to be clear, you already did metabat but you don't have the fasta file of each bin e.g. bin1.fa, bin2.fa, bin3.fa?

ADD REPLY • link 2.1 years ago by andres.firrincieli 3.6k

0

Entering edit mode

Yes! I haven't figured out how to submit a list of contigs to CheckM, but the grep approach wasn't as hard as I anticipated. Here is an example code that should work for others facing the same issue:

for bin in <bin_contig_list_files>
do
cat $bin | while read line ; do grep -A1 $line <contigs.fa> >> bins/$bin.fa ; done
done

I don't know if it's the most efficient way of doing it, but it creating the fasta bin files for ca 120Mbp of data and 160,000 contigs took just a few seconds on 4 cores.

ADD REPLY • link 2.1 years ago by alexander.brg • 0

0

Entering edit mode

The most efficient way to do this is to launch again metabat without the -l option. By omitting -l you should get a fasta file of each bin

ADD REPLY • link 2.1 years ago by andres.firrincieli 3.6k