Hi, sorry if this is a very basic question!
I'm trying to analyse a paired end sequence using UMI-tools. I was instructed by my supervisor to first extract the UMIs from the R1 and R2 files seperately. This gives me a Processed_R1_fastq.gz file, a Processed_R2_fastq.gz file and a processed.log file.
On the UMItools guide, it doesn't really explain what to do next. In regards to paired end reads, it just says: "After paired-end mapping, paired end deduplication can be achieved by adding the --paired option to the call to dedup".
My question is, how should I map the files? Do I map them both separately? What do I do after that?
Also, I wanted to check that I am using the correct file for genomic indexing/mapping/alignment. Basically, I would like to use the hg19 genome. I downloaded hg19.fa.gz and hg19.2bit from https://hgdownload.soe.ucsc.edu/goldenPath/hg38/bigZips/ and unzipped it and placed it in my directory. Are these the correct files need for mapping?
Thanks in advance!
Are you certain your data has UMI's?
Yes, I have been told there are UMIs as this data was successfully analysed by someone else previously.