Question: finding reads from fastq files corresponds to particular transcript from fasta file
0
gravatar for amoltej
3.4 years ago by
amoltej90
Australia
amoltej90 wrote:

Hello all, 

I have performed mRNA seq for few samples and analysed them by using edgeR pipeline. Here I am looking for one particular trancript's expression profile. After analysis I find out that it is not expressing at all in any sample. This transcript does not have single read matching from the data set after featureCount analysis.

but when I did tblastx with the trinity denovo fasta file, this transcript is present there. which means there is a transcript present matching exactly with the query transcript that I am looking for. This means there are some reads present corresponds to my query transcript. but this is not being detected by the edgeR pipeline.

can somebody please tell me how can I retrive all the fastq reads corresponding to the perticular transcript from fastq file?

thank you in advance. 

 

rna-seq searchingdata • 1.4k views
ADD COMMENTlink modified 3.4 years ago by Adrian Pelin2.2k • written 3.4 years ago by amoltej90
1
gravatar for Matt Shirley
3.4 years ago by
Matt Shirley8.9k
Cambridge, MA
Matt Shirley8.9k wrote:

You could try a tool like RapMap: https://github.com/COMBINE-lab/RapMap

You should be able to use your assembled transcripts and map your reads to your assembly. This would answer the question of which reads were used in assembly that were not counted for differential expression analysis.

ADD COMMENTlink modified 3.4 years ago • written 3.4 years ago by Matt Shirley8.9k

Yup; RapMap should handle this use case well.  You could simply map all of your reads to your assembled transcriptome and then look for ones that map to your transcript of interest, or, if you're only interested in knowing what maps to this transcript, simply build the index on that transcript alone.

ADD REPLYlink written 3.4 years ago by Rob3.3k
1
gravatar for Adrian Pelin
3.4 years ago by
Adrian Pelin2.2k
Canada
Adrian Pelin2.2k wrote:

You can use the bbmap toolkit as such:

bbduk.sh -Xmx128544m in=YOUR_FASTQ_file.fastq ref=Your_transcript.fasta k=13 outm=OUTPUT_FASTQ_file_with_transcript_reads_only.fastq

This will generated a fastq file with reads matching your transcript as you requested. The fasta file you supply should have only your transcript of interest for this to work as you want it.

ADD COMMENTlink modified 3.4 years ago • written 3.4 years ago by Adrian Pelin2.2k
2

Additionally, you can separate the reads into one file per fasta sequence in a single command with Seal (also part of BBMap):

seal.sh ref=transcripts.fasta in=reads.fastq pattern=out_%.fastq

 

 

ADD REPLYlink written 3.4 years ago by Brian Bushnell16k
0
gravatar for abascalfederico
3.4 years ago by
abascalfederico1.1k
Spain
abascalfederico1.1k wrote:

You could make a fasta file with your transcripts and then align the reads to these transcripts. Or, if you have a genome and transcript annotations, you could use tophat for aligning reads to exons.

If you are only interested in one particular transcript, you can rely on blast as you already have done.

HTH

Federico

 

 

ADD COMMENTlink written 3.4 years ago by abascalfederico1.1k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2104 users visited in the last hour