Determine whether a gene is totally missing
2
0
Entering edit mode
7.7 years ago
Canh • 0

I'm just a newbie in bioinformatics and need to do some testing on human genome.

After using some tools for SNP calling and annotation (platypus, snpeff), I can see a list of SNPs and which genes these SNPs affects, so I know these genes are modified compared to the reference genome. But how could I know if a gene is totally missing? Which files should I analyze? sam/bam/vcf... Or there's any tool doing this out there?

Thanks in advance.

gene genome • 1.7k views
ADD COMMENT
0
Entering edit mode
7.7 years ago
Biogeek ▴ 470

More info needed...

Have you got a transcriptome? If so, best aligning to genome and doing gene counts. I'd recommend RSEM for counts . Gene which is missing in your reads will have a zero expression value at that locus.

ADD COMMENT
0
Entering edit mode

I have one reference fasta file, 2 fastq files containing reads. I use bwa and samtools to align reads, then use platypus to call SNPs, then snpeff to annotate. That's all I did. So I need a transcriptome? Actually, I don't even know what a transcriptome is.

ADD REPLY
0
Entering edit mode

It's not transcriptome sequencing if I understood correctly.

ADD REPLY
0
Entering edit mode
7.7 years ago
DG 7.3k

Look at your alignment (SAM or BAM) that will show the actual sequencing reads mapped to the reference genome. A tool like IGV or IGB will let you view this and browse yo any area of interest (such as a gene) and visually see the reads as they are mapped. Are you dealing with Whole Genome or Exome sequencing data? In the case of Exome, depending on the gene, lack of data isn't conclusive for the gene actually being missing. Some genes and some exons in particular do not have very effective capture and are sometimes not sequenced, or have very little coverage.

ADD COMMENT
0
Entering edit mode

I wanna write a script to do this, say input a gene (name, location) and output whether it is missing, modified, or the same as in the reference genome. Visual tools would not help me much. Are there any command line tools for this? I'm dealing with Exome, using Python. Thanks.

ADD REPLY
1
Entering edit mode

You can use bedtools coverage to get the depth of coverage for specified regions (can be genes, exons, etc.): http://bedtools.readthedocs.io/en/latest/content/tools/coverage.html

ADD REPLY
0
0
Entering edit mode

Yup, was going to be my suggestion as well

ADD REPLY
0
Entering edit mode

What does it mean for a gene to be missing or modified? Missing = not covered by the sequencing? Modified = contains at least one variant? Does it have to be a coding variant?

I think you need think what exactly you want to achieve.

ADD REPLY
0
Entering edit mode

This is important as well, what you mean by modified is important.

ADD REPLY
0
Entering edit mode

See below for some caveats and suggestions. I'd also stress, again that absence of evidence is not the same thing as evidence of absence when it comes to exome sequencing. You get all sorts of exons and even whole genes with little to know coverage due to failure to capture or sequence, not necessarily because there is anything wrong with the actual gene in your sample.

What exactly are you looking for? Are you trying to identify copy-number variations, whole gene deletions, partial gene deletions, re-arrangements or other Structural Variants? What do you mean by modified? SNPs and small Indels? Or more like what I described above?

ADD REPLY

Login before adding your answer.

Traffic: 2618 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6