First thing to do when you receive your sequenced genome of interest.
6
6
Entering edit mode
9.9 years ago
Medhat 9.7k

As it is clear in the title I am asking for advice

What is the first thing that I shall do when I receive the sequenced data for the genome am interested in,

like for example is there is a tool to check that the whole genome where sequenced probably? "there was no missing parts that was not sequenced". is the sequence good and I can proceed to the other steps? or I need to repeat something "like if some regions are not covered for example", and what is the other thing that I need to but in my consideration from your expertise and point of view.

Thanks in advance,

Assembly genome sequence next-gen • 3.5k views
ADD COMMENT
4
Entering edit mode
9.9 years ago

map it (or a subset of it) against the genome that will tell you right away what the data looks like

ADD COMMENT
1
Entering edit mode

What if I do not have reference genome?

ADD REPLY
1
Entering edit mode

well that's trickier, you can always try a closely related species,

also look for and remove contaminants, we just had two situations where not dealing with contamination right away led to setbacks.

ADD REPLY
0
Entering edit mode

about removing contamination did you meant using blast for example to decide if there is other sequences than the expected specious and then remove it? or I miss understood?

If I rightly understood "How to remove it if there is such thing?"

ADD REPLY
1
Entering edit mode

yes align it with a short read aligner (but not blast because you want to remove only reads that match very closely) to the contaminant then export from the alignment the unaligned reads into a new fastq file.

ADD REPLY
4
Entering edit mode
9.9 years ago
JC 13k

I will do a quality check first with FastQC or simple stats with R, sometimes data is bad from the beginning. After that, as Istvan said, you can map your sequences to the genome and compute the average coverage (BWA, STAR or your prefer mapper) and check uncovered or suspicious regions such as high coverage in repetitive regions.

ADD COMMENT
3
Entering edit mode
9.9 years ago
xb ▴ 420

Check the quality of the sequencing reads before mapping(?), for instance, using fastx and trim (adapters/primers, if any) accordingly. Then map!

ADD COMMENT
3
Entering edit mode
9.9 years ago
FastQC first, then if it is genome re-sequencing map and check coverage metrics, for de-novo genome assemble contigs and check metrics like N50 contig size
ADD COMMENT
2
Entering edit mode
9.9 years ago
lexnederbragt ★ 1.3k

If you do not have a reference genome, it is hard to find regions not covered. A few tips:

  • run SGA's preqc, this will tell you quite a bit about your genome and dataset
  • run assemblies and use tools like blobology to assess species content
  • if your species is a vertebrate, run CEGMA to check whether the gene space of your assembly seems complete (helps also to choose between assemblies)
  • if it is a bacteria, on the other hand, run iMetAMOS, it does much of the above in an automated fashion.
ADD COMMENT
0
Entering edit mode

+1 very informative but, do you have any idea if I'm dealing with plant genome?

ADD REPLY
1
Entering edit mode
9.9 years ago
Prakki Rama ★ 2.7k

Adapters can make an assembly a real mess. Sometimes, there might be partial adapters also present in the reads. So, trimming them atleast within our scope is better. If adapters are not known, requesting them from the sequencing center and running experiments is fruitful.

ADD COMMENT
0
Entering edit mode
ADD REPLY

Login before adding your answer.

Traffic: 1851 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6