Question: Ways To Filter Noise From Rnaseq Data
gravatar for Wayne
4.8 years ago by
United States
Wayne910 wrote:

Hello all, I am working with both DNA exome sequencing and RNAseq data. The samples are not directly matched, but they are of the same disease, and therefor I was hoping to use the RNAseq to check for the variants detected in DNA sequencing and visa versa. The problem is the RNAseq data is so noisy that there are far too many variants ( most of which are systematics) to do this analysis. I have thus far filtered using dbSNP, depth and frequency, and filtering out things detected in a separate panel of RNAseq normal cells that I have. Does anyone have any advice on how to trim the list down further? Perhaps some papers or examples of other groups that have done something similar? Any advice would be greatly appreciated! Thanks for your time

ADD COMMENTlink modified 4.8 years ago by JC6.2k • written 4.8 years ago by Wayne910

After your filtering have you determined that the variation is in one particular type: i.e. single nucleotide variation, splice differences, etc? Perhaps you would have to establish a different algorithm for each type of variation?

ADD REPLYlink written 4.8 years ago by Josh Herr5.5k

Yes I have different algorithms for fusions, amplifications, deletions, and variants. What I need is a way to filter the single nucleotide variants specifically. Filtering beyond simply looking at the depth and quality. Using a composite normal to filter (which is a collection of RNAseq samples of "normal" (non tumor) cells that correspond in someway to the particular disease you are looking at) seems to be the way to go, but I cannot find a good source of such samples. Additionally, some have suggested that mapping with both bwa and bowtie and taking only the intersection might be beneficial, but for me this has only removed about 4%.

ADD REPLYlink modified 4.8 years ago • written 4.8 years ago by Wayne910

You are not using a splice-aware aligner such as TopHat, STAR, MapSplice, etc. to align your RNA-seq data to the genome? In my experience attempting to align RNA-seq reads with BWA or Bowtie will lead to many read misplacements, soft-clipping, etc. The end result can be many, many false positive SNVs. If this is happening in your case, investing more effort in achieving high quality RNA-seq BAMs may help your problem considerably...

ADD REPLYlink written 4.8 years ago by Malachi Griffith15k
gravatar for JC
4.8 years ago by
JC6.2k wrote:

Calling variants in RNAseq is noisy but you can improve your calling in several ways:

  1. First be sure that you library is good, remove low quality reads before mapping, also check if you need trimming, Istvan already mentioned some tools for that.
  2. Use a good mapper, Malachi pointed some tools.
  3. Call your variants with a tool which can perform realignment, such as GATK or samtools, use higher quality filters.
  4. Besides dbSNP, you can verify if the variant is known in other databases such as Kaviar.
ADD COMMENTlink written 4.8 years ago by JC6.2k
gravatar for Istvan Albert
4.8 years ago by
Istvan Albert ♦♦ 73k
University Park, USA
Istvan Albert ♦♦ 73k wrote:

Here is a post on different RNA-seq quality filtering options RSeqQC and RNA-SeqQC - quality control software for RNA-Seq data

ADD COMMENTlink written 4.8 years ago by Istvan Albert ♦♦ 73k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 567 users visited in the last hour