Question: 20 millions reads per sample for SNP
0
gravatar for archie
23 months ago by
archie90
India
archie90 wrote:

Dear all,

I have RNASEQ sequencing data (tumour) in which each sample has approximately 20 million reads. My target is to identify SNPs and predict the isoform expression variability using RNAseq data. Before doing all analysis , I wanted to check coverage of sequence data. Just to check coverage , I used formaula N*L/G where N no of reads per sample, L read length (150 as paired end data 75+75), and human genome size. For sample, with 26130832 reads count, I got coverage 1.3. Does this 1.3 indicates sequence coverage is 1X ? Am I right ? My target is to identity SNPs. Can I proceed with this much amount of data per sample?

Thank you in advance

snp rna-seq • 799 views
ADD COMMENTlink modified 23 months ago by swbarnes27.0k • written 23 months ago by archie90
1
gravatar for swbarnes2
23 months ago by
swbarnes27.0k
United States
swbarnes27.0k wrote:

For what it's worth, in my lab, we shoot for 30M reads just to assess expression differences; you'd want more for variants. For variant calling, the average coverage is not helpful. You need a finer breakdown...you need to know what percentage of the transcriptome is adequately covered (say, 25x). You will need something like BEDTools to assess that. My guess is, you might have enough depth to examine the most highly expressed genes, but not much else.

ADD COMMENTlink written 23 months ago by swbarnes27.0k
0
gravatar for Titus
23 months ago by
Titus910
Titus910 wrote:

Hi ,

You should use G = EXOM , you count non coding part in your formula :)

Best

ADD COMMENTlink modified 23 months ago • written 23 months ago by Titus910

Dear Titus,

Yes. You are right. G=EXOM part will represent the actual targeted genome size . Thanks :)

Based on this, I think my data is good enough to proceed for analysis.

ADD REPLYlink written 23 months ago by archie90
2

RNA-seq has a very uneven coverage pattern. Some highly abundant genes will have high coverage (>1000 reads) while you'll also have genes with some expression (~40-80 reads), low abundant genes with just a few reads (<5) and many genes without expression in your tissue of interest (0 reads). As such, simply taking an average of the read count is pointless and meaningless.

I'm not saying that you can't, but RNA-seq is far from optimal for variant calling.

ADD REPLYlink written 23 months ago by WouterDeCoster42k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1969 users visited in the last hour