How To Find The Occurrence Of A Set Of Given K-Mer In A List Dna Sequences
2
0
Entering edit mode
11.3 years ago
wjlgatech • 0

I want to scan a list of DNA sequences against a list of given k-mers; each element on the k-mer list is a set of similar k-mers of equal length, they look like

myKmer1=c("TATGGGTTT", "TAAGGGTTT", ...,"CAAGGGTTT")
...
myKmer10=c("GGATTCCAG","CCATTCTTT",..., "CGATTCCTT")

What software/ R-script are available to attain the occurrences of list of k-mers on each sequence--the outcome should be a table looks like:

k-mers occurrence table1: showing the counts of k-mer in the sequences

             myKmer1  myKmer2  ...myKmer10
seq1        2             0                   3
seq2        1             3                   0
...
seq1000   0             1                   0

k-mers occurrence table2: showing the location of k-mer in the sequences

             myKmer1  myKmer2  ...myKmer10
seq1       111, 888   0                 123,456,3333
seq2       123          111,223,333  0
...
seq1000   0             1234            0
sequence • 4.9k views
ADD COMMENT
2
Entering edit mode
11.3 years ago

I'd do that with DSK -

it counts the k-mers in your reads converted to FASTA and writes the counts to a binary file. In the DSK archive there's a Python-script called parse_results.py which prints the counts for each k-mer, I think it shouldn't be too hard to modify that script to involve reads, as well.

ADD COMMENT
1
Entering edit mode
11.3 years ago
Josh Herr 5.8k

To second Philipp, DSK is a good option. I would also try khmer and kmer-genie -- I guess choice of tool depends on the source of your sequences and what the next step of your analysis is.

ADD COMMENT

Login before adding your answer.

Traffic: 1430 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6