Question: k-mer enrichment analysis
0
gravatar for yuabrahamliu
2.8 years ago by
yuabrahamliu60
yuabrahamliu60 wrote:

Hi all, I have 2 groups of RNAs, one is the experimental group, the other is the control group. What I need to do is to check if there is some k-mer sequences highly enriched in the experimental group compared to the control group, or vice versa. My question is that,

1) How can I get the RNA sequences from coordinates. I think maybe it is right to use bedtools to extract the FASTA sequences of each exons respectively and then combine them together, and then convert the 'T' in DNA to 'U' in RNA, but is there any method more efficient?

2) Actually I don't know any tools can do such a k-mer comparison between different RNA groups. Could anyone recommend some tools? Thank you so much.

rna-seq • 1.4k views
ADD COMMENTlink modified 2.8 years ago by Devon Ryan97k • written 2.8 years ago by yuabrahamliu60

What is your end goal in the comparison?

ADD REPLYlink written 2.8 years ago by Devon Ryan97k

An RNA binding protein can bind to the experimental group while not to the control group. So want to see if any k-mer sequence is enriched in the experimental group. Maybe it is a kind of bind motif of the protein.

ADD REPLYlink written 2.8 years ago by yuabrahamliu60
0
gravatar for Devon Ryan
2.8 years ago by
Devon Ryan97k
Freiburg, Germany
Devon Ryan97k wrote:

The obvious choice would be to run khmer on the fastq files from both datasets to quantify k-mer abundance and then load that information into R, where one would basically treat the data as if it were RNAseq.

Having said that, at least with the protocol people are using here, I can see the binding motif already in FastQC.

ADD COMMENTlink written 2.8 years ago by Devon Ryan97k

That's a good idea. But the problem is that I have the exact gene name list of the mRNAs able to bind the protein, but don 't have fastq files. So I want to do the kmer analysis on the FASTA sequence of the binding fragments of the mRNAs. I have checked khmer, but it seems it only works on sequencing data, not FASTA sequences. Is there any method to perform such things on FASTA sequences?

ADD REPLYlink written 2.8 years ago by yuabrahamliu60

I'm not sure how useful that'll be unless the gene list is quite long, as there will be a LOT of spurious hits. The term you need to google for is "motif search".

ADD REPLYlink written 2.8 years ago by Devon Ryan97k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1196 users visited in the last hour