Kmer counts per contig
0
0
Entering edit mode
3.8 years ago
GLR ▴ 20

Hello,

I need to extract kmers and their counts per contig in an assembly file and I was wondering what would be the most efficient way to do this?

For previous full genome kmer counts I've used BBTools kmercountexact.sh and I have considered ways to fed each scaffold into that program, but I have two issues with that potential solution. The first is the sheer number of output files that would result from doing that, although I guess I could just cat them all at the end. The second is I am very unfamiliar with awk/ bioawk and so while I know bioawk allows you to extract sequences very easily I don't know how to set up a for loop using awk/bioawk to do this and then pipe the contigs into another program.

Would anyone be kind enough to help me with this or direct me to a more appropriate solution?

Thank you!

Assembly • 867 views
ADD COMMENT
1
Entering edit mode

You mean split the multifasta file into individual contigs? See here: Splitting A Fasta File

ADD REPLY
0
Entering edit mode

Hi Asaf,

Not really. I want to pipe each contig into kmer counting software. I could split them into multiple files, and feed them in individually I suppose but I'm going to imagine that has a high I/O cost that isnt overly efficient although it would certainly achieve what I need I suppose.

ADD REPLY

Login before adding your answer.

Traffic: 2420 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6