Question: Where To Get The Adaptor Sequence And Mapping Reads With Bowtie
gravatar for Sunflow
9.2 years ago by
Sunflow30 wrote:

Hello all, I downloaded the .sra file(SRR052645.sra)( from NCBI SRA, and I want to mapping these reads to the Arabidopsis genome by bowtie, but I'm not sure whether the procedures of data processing is right... the following statement is what I do:
1. Download the .sra file and convert it to the fastq file by the command line: ./fastq-dump -SL -SF SRR052645.sra

2.Mapping reads to genome by the command line:
../bowtie -q -m 1 -n 0 ../genomeindex/athaliana SRR052645.fastq > nucleosomeuniquebowtie.out

The Question is:
(1)My dataset is generated by Illumina sequencing, I don't know whether -SL and -SF parameter I need in my command line, because someone told me -SL and -SF is only needed in pair-end sequencing, but I can not tell whether this is a pair-end sequencing...
(2) I don't know whether I have to trim the adaptor sequences in .sra file , because I didn't see the adaptor sequences provided in NCBI, so does this mean that I don't have to trim the adaptor?
(3) After mapping reads to genome by bowtie, there are a large proportion of reads fail to aligned, follows are the output message : alt text

reads processed: 3572622
reads with at least one reported alignment:158809(43.63%)
reads that fail to align: 1180787(33.05%)
reads with alignments suppressed due to -m: 833026(23.32%)

I think it is very strange, does anyone has the same experience? (Or just because I don't clip the adaptor?)

Best Regards~

sra adaptor bowtie • 4.0k views
ADD COMMENTlink modified 9.2 years ago by ALchEmiXt1.9k • written 9.2 years ago by Sunflow30
gravatar for ALchEmiXt
9.2 years ago by
The Netherlands
ALchEmiXt1.9k wrote:

I do not use the SRA sequences a lot (aren't they DB formatted? or just plain dumps?Usually you can tell whether an Illumina run was PE or not by inspecting its fastq headers. See the fastq wiki for details on that. Not sure if that will help you here.

Regarding the mapping: yes we have seen similar scores of unmapped reads using bowtie. Inspecting them usually reveals that bowtie is (too) strict in the mapping since it doesn't allow any indels or there si still lots of PhyX. You could quite easily move to bwa that does allow in/dels.

For inspecting quality and for finding overrepresented sequences, adapters and such; have a look at the fastQC suite which can also be found online in the framework.

ADD COMMENTlink written 9.2 years ago by ALchEmiXt1.9k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2801 users visited in the last hour