SNV and indel calling with RNA-seq data without a reference genome
2
0
Entering edit mode
3.8 years ago
sam891 ▴ 10

Hello everyone,

is it possible to do SNV and indel calling on differentially expressed genes with RNA-seq data and without reference genome? what are the best pipelines?

thank you

rna-seq SNP genome snp RNA-Seq • 1.6k views
ADD COMMENT
2
Entering edit mode
3.8 years ago
eyonesi ▴ 60

Hi, I am running a similar project. First to create reference transcriptome, you should assemble the reads by de novo assembler such asTrinity and Bridger. I guess you did this step because deferentially expressed genes are available to you. Then, the second step is to create supertranscripts and annotation file, if your assembler was Trinity, the following link (GATK pipeline in trinity pipeline) will be helpful for you: https://github.com/trinityrnaseq/trinityrnaseq/wiki/Variant-Calling Otherwise (other assemblers), you can use lace software to create supertranscripts and annotation file using reference transcriptome. Then, the GATK pipeline is used to create the vcf file.

ADD COMMENT
0
Entering edit mode

do you think that this pipeline will work with aligners other than STAR? what about CLC Genomics Workbench?

ADD REPLY
0
Entering edit mode

Dear sam Sorry for the delayed response. Star is a splice aware aligner and is suitable for the snp calling step. I am not sure. Perhaps other splice aware aligner, such as tophat, may also be appropriate. CLC is a comprehensive analysis package and very useful. I didn’t work with it and I don’t know the details and which aligner it uses at this step.

ADD REPLY
1
Entering edit mode
3.8 years ago

Please elaborate further on what you are aiming to do, and be more specific on what data you currently have.

If you have X number of bulk RNA-seq samples, you could technically create a consensus transcriptome via de novo assembly and then find a way to perform variant calling in this way. Be wary of the pitfalls of using RNA-seq data for variant calling, though: A: Inferring genotype based on RNA sequnces

Kevin

ADD COMMENT
0
Entering edit mode

I want to investigate the SNVs and indels of differentially expressed genes between various genotypes of the same species of plant (18 RNA-seq samples). The genotypes are divided in 2 groups according to their differences of specific metabolites content. Unfortunatley the genome is not available yet.

ADD REPLY
0
Entering edit mode

Cool - thanks. Then, perhaps, first creating a reference transcriptome via de novo assembly may be a good starting point.

ADD REPLY
0
Entering edit mode

which pipeline do you recommend for SNVs and indels calling with this setup?

ADD REPLY
0
Entering edit mode

I don't recommend variant calling on RNA-seq data, unless one is very familiar with the pitfalls of doing this ( see my link ).

GATK has a pipeline for it, but it's not well tested, as far as I know.

ADD REPLY
0
Entering edit mode

very useful, thank you!

ADD REPLY
0
Entering edit mode

Please do look at the solution mentioned by eyonesi, too. I have not used Trinity but they may have a good solution for de novo assembly and variant calling.

ADD REPLY

Login before adding your answer.

Traffic: 2477 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6