Question

Uneven amount of total sequences across individuals in RAD-seq dataset

1

Entering edit mode

6.2 years ago

CaffeSospeso ▴ 50

I have a RAD-seq genomic dataset with around 180 individuals. I want to analyse this data for a phylogenomic project. However, by assessing the quality of the data by using fastqc I realised that there is an uneven amount of total sequences across individuals, going from 6.5 millions for some individuals to 0.2 millions for others.

I'm afraid this could lead to some issue during the analyses. The only problem I see is that I will have some individuals with more or less MISSING data. This obviously will affect the total number of loci that I could use for phylogenetics.

What do you think? Do you have any suggestion? Shall I exclude some individuals? Is there some established criteria, like exclude the 5% individuals with lowers total amount of sequence?

next-gen sequencing • 1.3k views

ADD COMMENT • link updated 5.4 years ago by Gio12 ▴ 220 • written 6.2 years ago by CaffeSospeso ▴ 50

0

Entering edit mode

Were the libraries not QC'ed before pooling?

ADD REPLY • link 6.2 years ago by GenoMax 147k

0

Entering edit mode

The samples were controlled for a sufficiently high concentration. And they were quality controlled before pooling.

ADD REPLY • link 6.2 years ago by CaffeSospeso ▴ 50

score 1 · Answer 1 · 2019-06-24

1

Entering edit mode

5.4 years ago

Gio12 ▴ 220

You may find this paper interesting. A second read can also be found here.

ADD COMMENT • link 5.4 years ago by Gio12 ▴ 220

0

Entering edit mode

These are some nice papers that specifically relate to RAD-Seq!

If people have generally encountered problems with matching expected read counts, that is also something that I would like to hear about (since unexpected differences in observed versus expected reads can cause reads to need to be combined between runs). In other words, that relates to the proposed QC flag b) in this post: Calling Single-Barcode Samples from Mixed Runs as Dual-Barcode Samples | Possible Illumina Run QC Flags?

However, that post is not specifically related to RAD-Seq (in fact, there were 0 RAD-Seq samples among those runs).

ADD REPLY • link 5.4 years ago by Charles Warden 8.3k