I have a RAD-seq genomic dataset with around 180 individuals. I want to analyse this data for a phylogenomic project. However, by assessing the quality of the data by using fastqc I realised that there is an uneven amount of total sequences across individuals, going from 6.5 millions for some individuals to 0.2 millions for others.
I'm afraid this could lead to some issue during the analyses. The only problem I see is that I will have some individuals with more or less MISSING data. This obviously will affect the total number of loci that I could use for phylogenetics.
What do you think? Do you have any suggestion? Shall I exclude some individuals? Is there some established criteria, like exclude the 5% individuals with lowers total amount of sequence?
Were the libraries not QC'ed before pooling?
The samples were controlled for a sufficiently high concentration. And they were quality controlled before pooling.