I have wheat transcriptome data generated by illumina and I wish to know if this data is reliable or not. Wheat genome size is 17Gb, number of reads in each fastq file is ~32 million. Its a paired end sequencing. The problem is when I identify SNPs using this data, read depth is very less. many have 0,1 or 2 as read depth. Average of all the DP values is around 3 which seems very less. I also mapped the reads on to the wheat genome. These are the mapping statistics generated by STAR.
Number of input reads | 28480666 Average input read length | 200 UNIQUE READS: Uniquely mapped reads number | 20282536 Uniquely mapped reads % | 71.22% Average mapped length | 194.59 Number of splices: Total | 998719 Number of splices: Annotated (sjdb) | 604635 Number of splices: GT/AG | 759478 Number of splices: GC/AG | 25924 Number of splices: AT/AC | 3437 Number of splices: Non-canonical | 209880 Mismatch rate per base, % | 0.69% Deletion rate per base | 0.01% Deletion average length | 1.72 Insertion rate per base | 0.02% Insertion average length | 1.92 MULTI-MAPPING READS: Number of reads mapped to multiple loci | 5096669 % of reads mapped to multiple loci | 17.90% Number of reads mapped to too many loci | 445314 % of reads mapped to too many loci | 1.56% UNMAPPED READS: % of reads unmapped: too many mismatches | 0.00% % of reads unmapped: too short | 9.21% % of reads unmapped: other | 0.12%