Single Cell RNA-Seq : Should I trim Poly-T & Clontech sequences called as "Overrepresented sequences" by FastQC ?
0
2
Entering edit mode
4.4 years ago
gsr9999 ▴ 300

Dear Biostars Leaders,

I am a bioinformatician in our lab and I have received raw data(bcl files) on ~300 single cells RNA-Seq samples from a biologist in our lab. I ran bcl2fastq and then run FastQC tool on the fastq files. In FastQC output’s "Overrepresented sequences" category, it has classified half of my sample’s fastq files (~160) with WARNING annotation due to the presence of Poly-T tail sequence(s), and other Clontech sequences. I wonder if I need to trim these sequences before the alignment step ? I appreciate any advice . Other FastQC metrics like Basic Stats, Adapter Content, Per Base seq quality , etc have PASSED for all of my samples.

The Single Cell RNA-Seq was performed using TakaBio/Clontech's SMART-Seq v4 Ultra Low Input RNA Kit chemistry, and the samples were indexed with illumina's Nextera XT adapaters and Index sequences. I made sure to put Adapter Sequence and Sample indexes (i5 & i7) in the Sample Sheet file that was given as input for bcl2fastq. I used default settings of bcl2fastq and I believe it performed adapter trimming and demultiplexing automatically.

"GTATCAACGCAGAGTACTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT" is the top Poly-T sequence Overrepresented that is present in at least 160 sample fastq files, and FastQC reports its Possible Source as "No Hit". I did a simple google search for this above mentioned Poly-T sequence, and other people seem to have observed the same. Should I ignore this ? or remove/trim this sequence :

http://bioinformatics.stemcells.cam.ac.uk/Files_for_transfer_LILA/Fernando/extras/SLX-9555.C89V9ANXX.s_1.r_1.fastqc.html

http://single-cell.clst.riken.jp/fastqc/GSE68981_QC/SRR2031413_2_fastqc.html

http://waxmanlabvm.bu.edu/waxmanlab/FASTQC/SRR/SRR6576929_1_fastqc.html

Other Poly-T tail sequences are reported by FastQC at lower frequencies. Other Overrepresented Sequences are annotated as “Clontech SMARTer…”, “Clontech Universal Primer Mix…” .

Thanks,

GSR

Here is the complete list of 29 "Overrepresented Sequences" reported on my fastq files. Please advice me on how to proceed :

"Overrepresented_Sequence"   \t   "Possible_Source"   \t   "Affected_Samples_Count"

GTATCAACGCAGAGTACTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT  No Hit  160

TATCAACGCAGAGTACTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT  No Hit  19

GGTATCAACGCAGAGTACTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT  No Hit  11

GGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG  No Hit  3

GTACATGGGAAGCAGTGGTATCAACGCAGAGTACATGGGAAGCAGTGGTA  Clontech SMARTer II A Oligonucleotide (100% over 25bp)  2

AAGAAGAAGAAGAAGAAGAAGAAGAAGAAGAAGAAGAAGAAGAAGAAGAA  No Hit  1

ACGCAGAGTACTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT  No Hit  1

CAACAACAACAACAACAACAACAACAACAACAACAACAACAACAACAACA  No Hit  1

ATCAACGCAGAGTACTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT  No Hit  1

GTATCAACGCAGAGTACTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTG  No Hit  1

TATCAACGCAGAGTACATGGGAAGCAGTGGTATCAACGCAGAGTACATGG  Clontech Universal Primer Mix Long (96% over 26bp)  1

ACGCAGAGTACATGGGAAGCAGTGGTATCAACGCAGAGTACATGGGAAGC  Clontech Universal Primer Mix Long (96% over 26bp)  1

TTGTTGTTGTTGTTGTTGTTGTTGTTGTTGTTGTTGTTGTTGTTGTTGTT  No Hit  1

GGTATCAACGCAGAGTACATGGGAAGCAGTGGTATCAACGCAGAGTACAT  Clontech SMARTer II A Oligonucleotide (100% over 25bp)  1

CTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCT  No Hit  1

TATCAACGCAGAGTACTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTA  No Hit  1

GAGTACATGGGAAGCAGTGGTATCAACGCAGAGTACATGGGAAGCAGTGG  Clontech Universal Primer Mix Long (96% over 26bp)  1

CCCATGTACTCTGCGTTGATACCACTGCTTCCCATGTACTCTGCGTTGAT  Clontech Universal Primer Mix Long (96% over 26bp)  1

GTATCAACGCAGAGTACTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTA  No Hit  1

GTATCAACGCAGAGTACTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTAA  No Hit  1

GTATCAACGCAGAGTACATGGGAAGCAGTGGTATCAACGCAGAGTACATG  Clontech SMARTer II A Oligonucleotide (100% over 25bp)  1

TTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTT  No Hit  1

GAAGAAGAAGAAGAAGAAGAAGAAGAAGAAGAAGAAGAAGAAGAAGAAGA  No Hit  1

GTACTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT  No Hit  1

TCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTC  No Hit  1

AGAAGAAGAAGAAGAAGAAGAAGAAGAAGAAGAAGAAGAAGAAGAAGAAG  No Hit  1

TATCAACGCAGAGTACTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTAA  No Hit  1

GTATCAACGCAGAGTACTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTAAA  No Hit  1

GGTATCAACGCAGAGTACTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTA  No Hit  1
RNA-Seq rna-seq next-gen sequencing • 3.5k views
ADD COMMENT
0
Entering edit mode

While many aligners will handle these oddities you may want to scan/re-trim the data (even though bcl2fastq did it).

Did you look at Appendix C in Takara's manual for this kit which has instructions on what you need to do.

ADD REPLY

Login before adding your answer.

Traffic: 1895 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6