Question: Genomics/Computational Biology Jargon
0
gravatar for biochemist87
2.4 years ago by
Washington State University
biochemist870 wrote:

Hello,

So I will be starting my third rotation in the next semester, and transitioning from experimental biology work to more computational based biology, where the investigator works with evolutionary genetics in the context of adaptation to environmental stress. I was assigned to do some reading for her lab, and one of the papers already has a bunch of jargon I don't know (thanks to my limited computational background).

They mention that there are, "35,468 transcripts from 29,143 unique loci", which to me sounds like there are alternative splice products, etc. The other thing is the mention of an N50, "When limiting the analyses to the longest transcript from each locus, the transcriptome size was 71,518,404 bp, with an N50 of 3,694 bp and a genome size of approximately 860 Mbp based on C-value estimates. The longest transcript was 66,752 bp in length, stemming from the gene coding for the largest known protein. This indicated that our analysis effectively captured even long transcripts present in the transcriptome."

There is also the mention of WGCNA (Weighted Gene Correlation Network Analysis), "Weighted gene correlation network analysis (WGCNA) of the top 10,000 expressed genes revealed 15 modules of coexpressed genes (fig. 3A). Ten of the 15 modules were significantly correlated with habitat type (presence or absence of H2S), with modules 5 and 10 exhibiting correlation coefficients >0.9 ".

Any help is great, and hoping that I will still want to pursue computational biology after this rotation.

ADD COMMENTlink modified 2.4 years ago by datascientist28390 • written 2.4 years ago by biochemist870
2
gravatar for datascientist28
2.4 years ago by
University of Washington
datascientist28390 wrote:

Google is always your friend:

  1. N50 - A way to test the quality of an assembly. Given a set of contigs, each with its own length, the N50 length is defined as the shortest sequence length at 50% of the genome. So if your assembly has 4000 contigs, what's the length of the 2000th contig. There is a correlation between N50 and genome quality (although it is NOT absolute) https://en.wikipedia.org/wiki/N50,_L50,_and_related_statistics

  2. WGCNA is a clustering algorithm from UCLA's Steve Horvath (https://labs.genetics.ucla.edu/horvath/CoexpressionNetwork/Rpackages/WGCNA/). He's also the author of the biological clock paper that's famous.

  3. the transcripts to loci statement needs more connotation.

ADD COMMENTlink modified 2.4 years ago • written 2.4 years ago by datascientist28390
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 879 users visited in the last hour