Question

Find number of reads from particular transcript

0

Entering edit mode

8.6 years ago

shmaisrael • 0

I got a few fastq files from RNA-seq experiment and somebody ask me to check the number of reads that come from retroelements L1-ORF1/2p. Is it a simple way to perform blast of the whole fastq file agains the sequence of the elements (instead of align it to the reference genome using hisat/bowtie/bwa etc) and check the number of reads in each sample? I can try a grep but it can take a long time. Thank you for advise.

RNA-Seq fastq • 2.0k views

ADD COMMENT • link 8.6 years ago by shmaisrael • 0

0

Entering edit mode

instead of align it to the reference genome using hisat/bowtie/bwa etc

Since you are talking about RNA-seq you want a splice-aware aligner so bwa and bowtie will not suffice. HISAT is a good one, alternatives are STAR and bbmap.

ADD REPLY • link 8.6 years ago by WouterDeCoster 48k

0

Entering edit mode

Thank you. I'll try to run it after indexing. By the way is it possible to run bam files against indexed sequence?

ADD REPLY • link 8.6 years ago by shmaisrael • 0

1

Entering edit mode

Please use ADD COMMENT/ADD REPLY when responding to existing posts to keep threads logically organized.

If you only need a rough idea of the reads that could be from that transcript you may be able to use kallisto or Salmon to do pseudoalignments.

ADD REPLY • link 8.6 years ago by GenoMax 154k

0

Entering edit mode

Thanks. Could you please update the link to instructions?

ADD REPLY • link 8.6 years ago by shmaisrael • 0

0

Entering edit mode

Again. Please use ADD COMMENT/ADD REPLY when responding to existing posts to keep threads logically organized. I moved your post now.

ADD REPLY • link 8.6 years ago by WouterDeCoster 48k

score 0 · Answer 1 · 2017-04-06

You are better off with NGS aligners like bwa. Reasons:

1) They would be faster than blast / grep.

2) They are quality aware.

3) grep cannot do a fuzzy search. So reads with sequencing error or polymorphism will not be found with grep. grep is terribly slow for NGS searches unless you optimize the search params and restrict the search to specific LOCALE.

4) It's supereasy to do it! (just index the spliced sequenced and run bwa mem, for example)

score 0 · Answer 2 · 2017-04-06

0

Entering edit mode

8.6 years ago

Devon Ryan 105k

Since you're interested in a repeat, you might find TETranscript useful. That will allow more accurate quantification. You might also try TECounts from my pull request on that repository.

ADD COMMENT • link 8.6 years ago by Devon Ryan 105k