Question

Mapping Ribosome Profiling Data

0

Entering edit mode

9.4 years ago

fire_water ▴ 80

Hello,

I am trying to learn how to map ribosome profiling data but have a few questions about the process.

So I have about a dozen FASTQ files that are the result of sequencing on a MiSeq machine. The files are named like this:

23_S1_L001_I1_001.fastq.gz
23_S1_L001_R1_001.fastq.gz
...
Undetermined_S0_L001_I1_001.fastq.gz
Undetermined_S0_L001_R1_001.fastq.gz

According to this helpful documention, the file names containing R1 are the reads (the nucleotide sequences). The file names containing I1 are the index reads.

To map this data, I believe I need to follow this process (please advise):

Upload data to Galaxy
Clip adapter sequence
- NGS: QC and Manipulation > Clip
- min. length: 25 nt
- custom adapter seq: CTGTAGGCACCATCAAT
- do not discard sequences with unknown bases
- output only clipped sequences
Trim adaper sequence
- NGS: QC and Manipulation > Trim
- first base to keep: 2
- last base to keep: 50
Map with Bowtie
- NGS: Mapping/Map with Bowtie for Illumina
- settings: -v 1 -k 1 -m 16

My questions are:

What is the purpose of the index read files (the file names containing I1)?
What do I do with the index read files (discard, merge with R1, send them through the mapping process)?
...same regarding the Undetermined files.

Thanks, I appreciate your help.

sequencing • 4.4k views

ADD COMMENT • link updated 2.4 years ago by Cristian • 0 • written 9.4 years ago by fire_water ▴ 80

score 1 · Answer 1 · 2016-03-05

1

Entering edit mode

9.4 years ago

Devon Ryan 105k

Answers, in order:

You can ignore the index read files, they serve no purpose for you (most people never even create them).
Delete them, they're just wasting space.
Delete them, they're just wasting space.

BTW, you might use tophat2 or hisat2 (at least tophat2 tends to be available in most Galaxy instances, but hisat/hisat2 are faster). This would allow you to handle spliced reads.

ADD COMMENT • link 9.4 years ago by Devon Ryan 105k

0

Entering edit mode

Thank you, Devon. Question: is it possible that I received the FASTQ read files (R1 files) with the adapter sequence already removed? The reason I ask is because I could not use the Clip tool without first grooming the FASTQ files. Then, when I sent one file to Clip, I received an empty file. So perhaps the adapter was removed during sequencing?

ADD REPLY • link 9.4 years ago by fire_water ▴ 80

0

Entering edit mode

Yes, there's a setting on the MiSeq to have it trim the adapters automatically. Run FastQC on an R1 file and scroll down to the length distribution plot.

BTW, you'll only see adapters if your fragments were short and your reads were long (i.e., if you sequenced the whole fragment plus some).

ADD REPLY • link 9.4 years ago by Devon Ryan 105k

score 1 · Answer 2 · 2016-03-07

Hi,

You could try using RiboGalaxy (on riboseq.org) which is a Galaxy based platform specifically for ribosome profiling data. The adapter sequence can be removed with Cutadapt (available in the Pre-processing Tools suite). As suggested by Devon, you can run FastQC (also in the Pre-processing Tools suite) if you are not sure if the adapter sequence has already been removed.

With ribosome profiling data, it is also useful to remove sequences that map to ribosomal RNA (use Remove rRNA using Bowtie). The remaining reads can then be mapped using bowtie to your transcriptome/genome (see the Help pages on RiboGalaxy).

After mapping, there are a number of tools under RiboSeq Analysis which you may find useful. The RiboGalaxy forum may also be of interest (http://gwips.ucc.ie/Forum/forumdisplay.php?fid=24).