Question

finding reads from fastq files corresponds to particular transcript from fasta file

0

Entering edit mode

8.4 years ago

amoltej ▴ 100

Hello all,

I have performed mRNA seq for few samples and analysed them by using edgeR pipeline. Here I am looking for one particular trancript's expression profile. After analysis I find out that it is not expressing at all in any sample. This transcript does not have single read matching from the data set after featureCount analysis.

But when I did tblastx with the trinity denovo fasta file, this transcript is present there. which means there is a transcript present matching exactly with the query transcript that I am looking for. This means there are some reads present corresponds to my query transcript. but this is not being detected by the edgeR pipeline.

Can somebody please tell me how can I retrieve all the fastq reads corresponding to the particular transcript from fastq file?

Thank you in advance.

RNA-Seq • 3.1k views

ADD COMMENT • link updated 20 months ago by Ram 43k • written 8.4 years ago by amoltej ▴ 100

Ram · Answer 1 · 2015-11-21

1

Entering edit mode

8.4 years ago

Matt Shirley 10k

You could try a tool like RapMap

You should be able to use your assembled transcripts and map your reads to your assembly. This would answer the question of which reads were used in assembly that were not counted for differential expression analysis.

ADD COMMENT • link updated 4.4 years ago by Ram 43k • written 8.4 years ago by Matt Shirley 10k

0

Entering edit mode

Yup; RapMap should handle this use case well. You could simply map all of your reads to your assembled transcriptome and then look for ones that map to your transcript of interest, or, if you're only interested in knowing what maps to this transcript, simply build the index on that transcript alone.

ADD REPLY • link updated 4.4 years ago by Ram 43k • written 8.4 years ago by Rob 6.5k

Ram · Answer 2 · 2015-11-21

1

Entering edit mode

8.4 years ago

Adrian Pelin ★ 2.6k

You can use the bbmap toolkit as such:

bbduk.sh -Xmx128544m in=YOUR_FASTQ_file.fastq ref=Your_transcript.fasta k=13 outm=OUTPUT_FASTQ_file_with_transcript_reads_only.fastq

This will generated a fastq file with reads matching your transcript as you requested. The fasta file you supply should have only your transcript of interest for this to work as you want it.

ADD COMMENT • link updated 4.4 years ago by Ram 43k • written 8.4 years ago by Adrian Pelin ★ 2.6k

2

Entering edit mode

Additionally, you can separate the reads into one file per fasta sequence in a single command with Seal (also part of BBMap):

seal.sh ref=transcripts.fasta in=reads.fastq pattern=out_%.fastq

ADD REPLY • link updated 4.4 years ago by Ram 43k • written 8.4 years ago by Brian Bushnell 20k

Ram · Answer 3 · 2015-11-21

0

Entering edit mode

8.4 years ago

abascalfederico ★ 1.2k

You could make a fasta file with your transcripts and then align the reads to these transcripts. Or, if you have a genome and transcript annotations, you could use tophat for aligning reads to exons.

If you are only interested in one particular transcript, you can rely on blast as you already have done.

HTH
Federico

ADD COMMENT • link updated 4.4 years ago by Ram 43k • written 8.4 years ago by abascalfederico ★ 1.2k