Find number of reads from particular transcript
2
0
Entering edit mode
8.4 years ago
shmaisrael • 0

I got a few fastq files from RNA-seq experiment and somebody ask me to check the number of reads that come from retroelements L1-ORF1/2p. Is it a simple way to perform blast of the whole fastq file agains the sequence of the elements (instead of align it to the reference genome using hisat/bowtie/bwa etc) and check the number of reads in each sample? I can try a grep but it can take a long time. Thank you for advise.

RNA-Seq fastq • 1.9k views
ADD COMMENT
0
Entering edit mode

instead of align it to the reference genome using hisat/bowtie/bwa etc

Since you are talking about RNA-seq you want a splice-aware aligner so bwa and bowtie will not suffice. HISAT is a good one, alternatives are STAR and bbmap.

ADD REPLY
0
Entering edit mode

Thank you. I'll try to run it after indexing. By the way is it possible to run bam files against indexed sequence?

ADD REPLY
1
Entering edit mode

Please use ADD COMMENT/ADD REPLY when responding to existing posts to keep threads logically organized.

If you only need a rough idea of the reads that could be from that transcript you may be able to use kallisto or Salmon to do pseudoalignments.

ADD REPLY
0
Entering edit mode

Thanks. Could you please update the link to instructions?

ADD REPLY
0
Entering edit mode

Again. Please use ADD COMMENT/ADD REPLY when responding to existing posts to keep threads logically organized. I moved your post now.

ADD REPLY
0
Entering edit mode
8.4 years ago

You are better off with NGS aligners like bwa. Reasons:

1) They would be faster than blast / grep.

2) They are quality aware.

3) grep cannot do a fuzzy search. So reads with sequencing error or polymorphism will not be found with grep. grep is terribly slow for NGS searches unless you optimize the search params and restrict the search to specific LOCALE.

4) It's supereasy to do it! (just index the spliced sequenced and run bwa mem, for example)

ADD COMMENT
0
Entering edit mode
8.4 years ago

Since you're interested in a repeat, you might find TETranscript useful. That will allow more accurate quantification. You might also try TECounts from my pull request on that repository.

ADD COMMENT

Login before adding your answer.

Traffic: 3364 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6