Give a list of contigs as input to CheckM
0
0
Entering edit mode
2.1 years ago

Hello! I have run metabat to separate my metagenomic contigs into bins and obtained a number of files that contain a list of all contig names that belong to a particular bin, like this:

Example bin_1:

k105_10322
k105_20691
k105_133304
k105_31104
...

Now, I would like to assess the quality of my binning using CheckM. From all examples I have seen previously, CheckM wants to have fasta-files as input.

Is it possible to provide a list of the contigs in a bin and a fasta-file with the sequences from all bins as input to CheckM? The file with all the sequences looks like this:

>k105_92090 flag=1 multi=2.0000 len=532
TAACTT...
>k105_102322 flag=1 multi=2.0000 len=528
GGAAGA...
>k105_92091 flag=1 multi=2.0000 len=409
AAAAAA...
>k105_92092 flag=1 multi=2.0000 len=332
TGAATC...
>k105_102323 flag=1 multi=1.0000 len=455
GAATAC...
...

The other option I see is to use grep/awk and extract the contigs from the file that contains all of them, but that would be a bit of a hassle...

Thank you for your help!

binning metagenomics metabat • 826 views
ADD COMMENT
0
Entering edit mode

Just to be clear, you already did metabat but you don't have the fasta file of each bin e.g. bin1.fa, bin2.fa, bin3.fa?

ADD REPLY
0
Entering edit mode

Yes! I haven't figured out how to submit a list of contigs to CheckM, but the grep approach wasn't as hard as I anticipated. Here is an example code that should work for others facing the same issue:

for bin in <bin_contig_list_files>
do
cat $bin | while read line ; do grep -A1 $line <contigs.fa> >> bins/$bin.fa ; done
done

I don't know if it's the most efficient way of doing it, but it creating the fasta bin files for ca 120Mbp of data and 160,000 contigs took just a few seconds on 4 cores.

ADD REPLY
0
Entering edit mode

The most efficient way to do this is to launch again metabat without the -l option. By omitting -l you should get a fasta file of each bin

ADD REPLY

Login before adding your answer.

Traffic: 1061 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6