I am taking on an ambitious project but have little experience with Bioinformatics so any help with the following problem would be really appreciated!
I have four large libraries of raw siRNA reads in .fa format. I would like to clean-up this data, merge the libraries, and then compare the combined library to a template strand of DNA.
I would like an output of all siRNA from the combined library that align exactly with the template strand.
I've been recommended fastQC as a program to clean-up my data, followed by Bowtie 2 to compare my library to the template strand.
However, I have no experience with these programs and realized I was out of my depth when I couldn't even figure out how to open my .fa file in fastQC to begin my tinkering. (That should give you an idea of my level of competence)
I would love some recommendations from this community on any resources that will help me figure this all out, any general advice or any software recommendations (OS: windows 7).
Hi, I need to ask a few points for clarification:
The data is coming from unpublished studies conducted by other members of my lab, and I believe I was mistaken, they are indeed small RNA libraries. I only know that fastQ has extra values attached pertaining to quality vs fa not having these qualities.
Not especially large. I'm sorry I can't give a more detailed answer at this stage, but I doubt it will be an issue.
I have permission to submit my data to a web service for purposes of analysis.
I have looked at galaxy briefly, I will begin reading the documentation. Thank you for the recommendation. I would rather not change my OS, but I can gain access to a windows 10 machine if need be.
I will edit my OP to correct my mistakes regarding siRNA/small RNA
You cannot use fastQC if you do not have quality values, I am guessing that these are either derived, assembled, or coming from another sequencing technology. You could try to get raw data or skip QC. Few remaining steps could be to scan for adapter sequences, contamination, and collapse duplicates. Then use the aligner of your choice to map those sequences to the genome. In addition, blast against miRBase and Rfam for annotation comes to mind.