finding reads from fastq files corresponds to particular transcript from fasta file
3
0
Entering edit mode
8.4 years ago
amoltej ▴ 100

Hello all,

I have performed mRNA seq for few samples and analysed them by using edgeR pipeline. Here I am looking for one particular trancript's expression profile. After analysis I find out that it is not expressing at all in any sample. This transcript does not have single read matching from the data set after featureCount analysis.

But when I did tblastx with the trinity denovo fasta file, this transcript is present there. which means there is a transcript present matching exactly with the query transcript that I am looking for. This means there are some reads present corresponds to my query transcript. but this is not being detected by the edgeR pipeline.

Can somebody please tell me how can I retrieve all the fastq reads corresponding to the particular transcript from fastq file?

Thank you in advance.

RNA-Seq • 3.1k views
ADD COMMENT
1
Entering edit mode
8.4 years ago

You could try a tool like RapMap

You should be able to use your assembled transcripts and map your reads to your assembly. This would answer the question of which reads were used in assembly that were not counted for differential expression analysis.

ADD COMMENT
0
Entering edit mode

Yup; RapMap should handle this use case well. You could simply map all of your reads to your assembled transcriptome and then look for ones that map to your transcript of interest, or, if you're only interested in knowing what maps to this transcript, simply build the index on that transcript alone.

ADD REPLY
1
Entering edit mode
8.4 years ago
Adrian Pelin ★ 2.6k

You can use the bbmap toolkit as such:

bbduk.sh -Xmx128544m in=YOUR_FASTQ_file.fastq ref=Your_transcript.fasta k=13 outm=OUTPUT_FASTQ_file_with_transcript_reads_only.fastq

This will generated a fastq file with reads matching your transcript as you requested. The fasta file you supply should have only your transcript of interest for this to work as you want it.

ADD COMMENT
2
Entering edit mode

Additionally, you can separate the reads into one file per fasta sequence in a single command with Seal (also part of BBMap):

seal.sh ref=transcripts.fasta in=reads.fastq pattern=out_%.fastq
ADD REPLY
0
Entering edit mode
8.4 years ago
abascalfederico ★ 1.2k

You could make a fasta file with your transcripts and then align the reads to these transcripts. Or, if you have a genome and transcript annotations, you could use tophat for aligning reads to exons.

If you are only interested in one particular transcript, you can rely on blast as you already have done.

HTH
Federico

ADD COMMENT

Login before adding your answer.

Traffic: 2714 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6