Question

Comparing libraries of siRNA to a template and extracting exact matches.

0

Entering edit mode

7.1 years ago

wowbaggerz • 0

I am taking on an ambitious project but have little experience with Bioinformatics so any help with the following problem would be really appreciated!

I have four large libraries of raw siRNA reads in .fa format. I would like to clean-up this data, merge the libraries, and then compare the combined library to a template strand of DNA.

I would like an output of all siRNA from the combined library that align exactly with the template strand.

I've been recommended fastQC as a program to clean-up my data, followed by Bowtie 2 to compare my library to the template strand.

However, I have no experience with these programs and realized I was out of my depth when I couldn't even figure out how to open my .fa file in fastQC to begin my tinkering. (That should give you an idea of my level of competence)

I would love some recommendations from this community on any resources that will help me figure this all out, any general advice or any software recommendations (OS: windows 7).

alignment sequence • 1.3k views

ADD COMMENT • link 7.1 years ago by wowbaggerz • 0

2

Entering edit mode

Hi, I need to ask a few points for clarification:

Where is your data coming from, siRNA reads in fasta format sound like a contradiction. I would expect fastQ format. Do you know the differences between those formats?
how large is large?
siRNA means small-interfering RNA, these are normally produced in a technical process as defined sequences. Do you mean small RNA instead?
Is your data sensitive? Could you submit it to a web service?
As you say you have no experience with these programs, would you mind using a web service, like Galaxy where all your needs are most likely covered?
How much are you invested into using your legacy operating system, related to question above?

ADD REPLY • link 7.1 years ago by Michael 54k

0

Entering edit mode

The data is coming from unpublished studies conducted by other members of my lab, and I believe I was mistaken, they are indeed small RNA libraries. I only know that fastQ has extra values attached pertaining to quality vs fa not having these qualities.

Not especially large. I'm sorry I can't give a more detailed answer at this stage, but I doubt it will be an issue.

I have permission to submit my data to a web service for purposes of analysis.

I have looked at galaxy briefly, I will begin reading the documentation. Thank you for the recommendation. I would rather not change my OS, but I can gain access to a windows 10 machine if need be.

I will edit my OP to correct my mistakes regarding siRNA/small RNA

ADD REPLY • link 7.1 years ago by wowbaggerz • 0

0

Entering edit mode

You cannot use fastQC if you do not have quality values, I am guessing that these are either derived, assembled, or coming from another sequencing technology. You could try to get raw data or skip QC. Few remaining steps could be to scan for adapter sequences, contamination, and collapse duplicates. Then use the aligner of your choice to map those sequences to the genome. In addition, blast against miRBase and Rfam for annotation comes to mind.

ADD REPLY • link 7.1 years ago by Michael 54k