Question: Best Pipeline For Doing Variant Analysis On Rna-Seq Data?
3
gravatar for adrianj.randall
4.2 years ago by
adrianj.randall30 wrote:

Hello everyone,

I inherited some colorspace data of various samples that I am trying to make use of. I don't have access to lifescope/bioscope, and finding open-source tools that handle colorspace data well seems like a challenge. What I have are basically *.bam files of mapped/unmapped reads as well as *.csfasta and *.quals files. What I am trying to do is perform variant analysis to see if I can find SNPs/indels in my data, both across samples as well as in comparison to the reference genome.

I am thinking about using the GATK pipeline, but I wanted to ask if there is anything 'better' for doing what I'd like to do. The bam files were generated using the hg18 reference genome, and common aligners like bwa don't seem to support colorspace anymore. From what I understand, due to differences in how errors are handled, converting the csfasta/quals files to fastq files isn't recommended, although it seems there are some people who analyze colorspace data this way.

Can anyone here recommend a pipeline for me to basically take my RNA-seq data and either 1) re-align using a newer reference genome or 2) use the existing *.bam files to perform variant analysis to find sequence differences?

Thanks, and all the best.

variant analysis pipeline rna-seq • 4.4k views
ADD COMMENTlink modified 13 months ago by Biostar ♦♦ 10 • written 4.2 years ago by adrianj.randall30
1

Keep in mind that most aligners being discussed here are not splice-aware. Using a non-splice-aware aligner is likely to make your job of finding variants in RNA-seq even harder (and it is a hard problem to begin with).

ADD REPLYlink written 4.2 years ago by Sean Davis22k
3
gravatar for Steve Lianoglou
4.2 years ago by
Steve Lianoglou4.7k
US
Steve Lianoglou4.7k wrote:

From what I understand, due to differences in how errors are handled, converting the csfasta/quals files to fastq files isn't recommended, although it seems there are some people who analyze colorspace data this way.

Yeah, don't do it this way.

Can anyone here recommend a pipeline for me to basically take my RNA-seq data and either 1) re-align using a newer reference genome

If I'm not mistaken, the 0.6+ version of bwa were when colorspace alignment was dropped, so you should be able to use the latest 0.5.x version (0.5.10) to align your *.csfasta and *.qual files to hg19.

I'm pretty sure that SHRiMP also aligns colorspace data, so you can try that -- it's had a later release that the bwa-0.5.x branch.

ADD COMMENTlink written 4.2 years ago by Steve Lianoglou4.7k
3
gravatar for Ashutosh Pandey
4.2 years ago by
Philadelphia
Ashutosh Pandey10k wrote:

Hey

2nd question) If you plan to work with BAM files, then you should better use GATK. Make sure you do the realignment around the indels. Call for SNPs and Indels using Unified Genotyper. I think most of the aligners that can align SOLiD reads output a BAM file that contains both nucleotide (color space gets translated to nucleotide during alignment) and colorspace coded read. Color space coded read still exist in the BAM file so that some downstream SOLiD specific tool can utilize it when calling for SNPs and Indels. I am not aware of any current and open source variant caller that uses color information when calling for SNPs and Indels. I assume GATK, Samtools and other tools use nucleotide reads (colour translated to nucleotide) in the bam file to call for SNPs and Indels and it works fairly well.

1st question ) You can use SHRiMP2 to align your solid data. I think it is the best aligner available for solid reads right now. You will have to convert colorspace reads to csfastq reads using a script that comes with BFAST software or let me know i can get it for you. This csfastq is a different one as it contains color coded read and not the color code translated to nucleotide. SHRiMP2 asks the input in this format. Then you can use the GATK to call for variants. Mapping quality phred score produced by SHRiMP is different from that produced by BWA. so you may have to tweak your bam file when using it with GATK. I think GATK has some option in it to deal with such issues.

let me know if you need some other help. Using RNAseq data for variant calling may give some weird results.

Thanks

ADD COMMENTlink written 4.2 years ago by Ashutosh Pandey10k
3
gravatar for Ting-You Wang
2.7 years ago by
Hong Kong/The University of Hong Kong
Ting-You Wang50 wrote:

Currently, I think you can use STAR and GATK pipeline for RNA-seq variants calling.

http://gatkforums.broadinstitute.org/discussion/3891/calling-variants-in-rnaseq

 

ADD COMMENTlink modified 2.7 years ago • written 2.7 years ago by Ting-You Wang50
1
gravatar for decodenomics
4.2 years ago by
decodenomics10
decodenomics10 wrote:

For RNA-Seq data, you need worry about SNP around the exon-intron junction where the SNP may be actually mismatch.

ADD COMMENTlink written 4.2 years ago by decodenomics10
0
gravatar for Sean Davis
4.2 years ago by
Sean Davis22k
National Institutes of Health, Bethesda, MD
Sean Davis22k wrote:

I think that novoaligncs can handle colorspace reads and can perform splice-aware alignments. There is a license cost, but the last I checked it was VERY reasonable.

ADD COMMENTlink written 4.2 years ago by Sean Davis22k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1501 users visited in the last hour