I downloaded the .sra file(SRR052645.sra)(http://www.ncbi.nlm.nih.gov/sra?term=SRR052645) from NCBI SRA, and I want to mapping these reads to the Arabidopsis genome by bowtie, but I'm not sure whether the procedures of data processing is right...
the following statement is what I do:
1. Download the .sra file and convert it to the fastq file by the command line: ./fastq-dump -SL -SF SRR052645.sra
2.Mapping reads to genome by the command line:
../bowtie -q -m 1 -n 0 ../genomeindex/athaliana SRR052645.fastq > nucleosomeuniquebowtie.out
The Question is:
(1)My dataset is generated by Illumina sequencing, I don't know whether -SL and -SF parameter I need in my command line, because someone told me -SL and -SF is only needed in pair-end sequencing, but I can not tell whether this is a pair-end sequencing...
(2) I don't know whether I have to trim the adaptor sequences in .sra file , because I didn't see the adaptor sequences provided in NCBI, so does this mean that I don't have to trim the adaptor?
(3) After mapping reads to genome by bowtie, there are a large proportion of reads fail to aligned, follows are the output message :
reads processed: 3572622
reads with at least one reported alignment:158809(43.63%)
reads that fail to align: 1180787(33.05%)
reads with alignments suppressed due to -m: 833026(23.32%)
I think it is very strange, does anyone has the same experience? (Or just because I don't clip the adaptor?)