Question

Pipeline for >100 RNA-Seq PE samples

0

Entering edit mode

8.2 years ago

umn_bist ▴ 390

Although I am familiar with RNA-seq workflow for n<20, this is my first time handling a large set of RNA-seq data. These are tumor (and matched normals) RNA-seq.

Are there any automated pipelines that are commonly used in the field for pre-alignment QC, alignment, post-alignment QC for a large set of RNA-seq data?

Two areas that I am having difficulty automating are:

[cutadapt] Providing an adapter list for both forward and reverse PE strands. I have a single list but I do not know if cutadapt will automatically reverse the adapter sequences. Also determining a value for -overlap=LENGTH. I may use BBMap in place of cutadapt.
```
cutadapt -q 10,10 -a "${adapter}" -A "${adapter}" -o "${file1%_1.fastq}_1_trimmed.fastq" -p "${file2%_2.fastq}_2_trimmed.fastq" "${file1}" "${file2}"
```

[TopHat2] Providing --mate-inner-dist and --mate-std-dev - as these will vary from sample to sample.

tophat -p 10 --mate-inner-dist {} --mate-std-dev {} --no-coverage-search --output-dir "${file}" --transcriptome-index

RNA-Seq • 1.9k views

ADD COMMENT • link updated 21 months ago by Ram 43k • written 8.2 years ago by umn_bist ▴ 390

0

Entering edit mode

Also using cutadapt I am trimming bases of quality score <10. Is this acceptable if I'm going to filter variants that are <30 MQ and <20 QUAL using snpSift?

ADD REPLY • link 8.2 years ago by umn_bist ▴ 390