Alignment of Query-RNA Seq
0
0
Entering edit mode
5.3 years ago

I am trying to align query sequence with RNA-Seq data (100bp PE). I have tried Bowtie2 and BWA. However, it did not give me reads that are matching or mismatched with query sequence. (I am not interested in aligning RNA-Seq data with Reference genome).

RNA-Seq alignment • 1.3k views
ADD COMMENT
2
Entering edit mode

(I am not interested in aligning RNA-Seq data with Reference genome).

That is fine but why are you not using the query sequence as your reference to make an index? Then align your data against it.

Note: Doing this always runs into risk of having some reads align in locations that they did not originate from.

ADD REPLY
0
Entering edit mode

I did indexing of my query sequence and tried (--end-to-end and --local parameters) to align with RNA-Seq data. However, overall alignment rate was 0.00%.

ADD REPLY
0
Entering edit mode

What is the length of the query sequence?

ADD REPLY
0
Entering edit mode

length of query sequence -2600bp

ADD REPLY
1
Entering edit mode

I suggest that you try bbmap.sh from BBTools (https://sourceforge.net/projects/bbmap/ ). Something like this:

bbmap.sh -Xmx10g threads=4 in1=R1.fq.gz in2=R2.fq.gz out=file.bam maxindel=2000 intronlen=10 ambig=random ref=your_query.fa mappedonly=t

This will only write mapped reads to the bam file (this will require samtools to be in your path otherwise SAM format will be used). If you only want to see how many reads are aligning then omit out. All the stats will still be written to STDERR.

Was there no alignment even if you used command defaults for bwa and bowtie2?

ADD REPLY
0
Entering edit mode

There was no alignment with bowtie2 (all default parameters with endtoend and local) and bwa mem.

ADD REPLY
0
Entering edit mode

Hi, I have tried bbmap.sh and it shows 0.0011% mapped reads. I am reading bbmap reference guide (it is new to me.) thank you for help. Please give me more suggestions.

ADD REPLY
1
Entering edit mode

Not something you want to hear but here goes :

  • Your data only has a tiny fraction of reads that map to the query you have
  • You could try mapping individual reads (R1 or R2) as single-end data to the query (remove in1= and in2=, just use in= with R1 and R2 files above command line) and see if the mapping improves (some other explanation can be explored there, if you see a good number of reads aligning)

Take 10 reads (from R1 and R2) and blast them at NCBI to confirm that you are looking at the correct sample set and the reads are aligning to right genome. Would eliminate the possibility that you have contamination of some kind.

ADD REPLY
0
Entering edit mode

It is showing almost similar result (0.0008%). I have checked few reads in NCBI. It is matching with my plant species.

ADD REPLY
0
Entering edit mode

The the only plausible explanation is this:

Your data only has a tiny fraction of reads that map to the query you have
ADD REPLY

Login before adding your answer.

Traffic: 2817 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6