Hello,
I am trying to learn how to map ribosome profiling data but have a few questions about the process.
So I have about a dozen FASTQ files that are the result of sequencing on a MiSeq machine. The files are named like this:
23_S1_L001_I1_001.fastq.gz
23_S1_L001_R1_001.fastq.gz
...
Undetermined_S0_L001_I1_001.fastq.gz
Undetermined_S0_L001_R1_001.fastq.gz
According to this helpful documention, the file names containing R1
are the reads (the nucleotide sequences). The file names containing I1
are the index reads.
To map this data, I believe I need to follow this process (please advise):
- Upload data to Galaxy
- Clip adapter sequence
- NGS: QC and Manipulation > Clip
- min. length: 25 nt
- custom adapter seq: CTGTAGGCACCATCAAT
- do not discard sequences with unknown bases
- output only clipped sequences
- Trim adaper sequence
- NGS: QC and Manipulation > Trim
- first base to keep: 2
- last base to keep: 50
- Map with Bowtie
- NGS: Mapping/Map with Bowtie for Illumina
- settings:
-v 1 -k 1 -m 16
My questions are:
- What is the purpose of the index read files (the file names containing
I1
)? - What do I do with the index read files (discard, merge with
R1
, send them through the mapping process)? - ...same regarding the
Undetermined
files.
Thanks, I appreciate your help.
Thank you, Devon. Question: is it possible that I received the FASTQ read files (R1 files) with the adapter sequence already removed? The reason I ask is because I could not use the Clip tool without first grooming the FASTQ files. Then, when I sent one file to Clip, I received an empty file. So perhaps the adapter was removed during sequencing?
Yes, there's a setting on the MiSeq to have it trim the adapters automatically. Run FastQC on an R1 file and scroll down to the length distribution plot.
BTW, you'll only see adapters if your fragments were short and your reads were long (i.e., if you sequenced the whole fragment plus some).