I would like to download some public small RNA libraries, clean up the data if needed (using FastQC), and then merge these libraries into a single .FA file that I can align to a sequence template.
I am very inexperienced in the domain of bioinformatics and would really appreciate some help in the initial steps I need to take. At this point, I am even struggling to be able to download the right files.
I would like to download the following libraries of small RNA:
https://www.ncbi.nlm.nih.gov/sra/SRX065853[accn]
https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=gsm1495677
I am not even sure how to download these libraries, as so far my attempts have come out with file formats I was not expecting (I was told they would be in either FastA or FastQ format).
This problem is probably trivial to anybody who isn't just starting out, so if I could get a walk-through on how get the libraries I want in the file format I want, I would really appreciate it.
For context, I plan to use Galaxy to clip adapters and collapse duplicates using FastQC. After that, I would like to merge my libraries and then search for perfect alignments against a template strand that I have. I have been told to use Bowtie2, also through the galaxy interface.
Where possible just use EBI-ENA to download the original fastq files (use the same SRA accession # to search). Like in this case: http://www.ebi.ac.uk/ena/data/search?query=SRX065853 avoiding SRAtoolkit complexities.
wowbaggerz : Since you are planning to use Galaxy (@PSU or local) you are able to send the data directly to PSU galaxy.