Dear Biostars team,
I'm relatively new to using the Galaxy online platform and have been using it to run RNAseq with some paired end RNA data from an Illumina run with the rat rn5 genome. After completing our RNAseq analysis we're trying to look for SNP/Variants within the reads but I am having issue getting the files pre-processed with Picard tools with the Mark Duplicate Reads tool throwing up a couple of errors halting progress. These are the steps that I have taken so far:
- Raw Illumina Fastq files ftp'd to usegalaxy public instance (FOR and REV for 2 lanes)
- FASTQ Groomer - convert to fastqsanger
- Trim by FASTQ quality score >=20
- Map with BWA for Illumina using rn5 and paired end reads
- Convert SAM to BAM for both mapped lane files
- Reorder BAM for both files
- Add read groups for both files
- Mark Duplicate reads - removing duplicates from output -> here is where we get the issue.
The two bugs that are thrown up are - "MAPQ should be 0 for unmapped read." and "Value was put into PairInfoMap more than once" which halt this pre-processing step before moving the BAM files onto GATK Variant analysis.
In addition, for running the GATK analysis, is the best practice for using a custom genome just ftping the USCS rn5.fa file into a history and using that or should there be an additional index file for this?
Any help with regards to these issues would be greatly appreciated and please let me know if I need to clarify anything for a solution to be found!