Question

Please advise me on PE sequencing reads prepared by NEBNex kit and Hiseq2000

0

Entering edit mode

7.0 years ago

seta ★ 1.9k

Hi all friends,

I have got 10 sequencing files of Illumina paired-end reads resulted from NEBNex kit (Prep Master Mix Set for Illumina, E6040, BioLabs) and sequencing by HiSeq 2000. Based on FastQC analysis, for all samples, the length of one set read (from paired-end) is 100bp, the per base sequence quality was passed, adapter contamination has existed and the over-represented sequences are various of Truseq adapter with different indexes, like index 4, index 12, index 10,…. The length of the second read is 80 bp, the per base sequence quality was failed, the adapter contamination has existed and the over-represented sequences are Illumina Single End PCR Primer 1 or some sequences with the description of No hit. Could you please advise me about the below issues:

1) Why the length of two set reads, corresponding to paired-end reads, are different? Is it normal or there is something wrong?

2) What are the exact adapter sequences that should be used for adapter trimming?

3) Please kindly tell me how I can find if data are either stranded or un-stranded?

Thank you in advance

NEBNex kit Illumina Fastqc adapter length • 1.7k views

ADD COMMENT • link 7.0 years ago by seta ★ 1.9k

0

Entering edit mode

1) Probably some problem during sequencing. Did you pay for this service? If yes, the sequencing provider should run your samples again.

2) You do not need to know the exact adapter sequences, use BBDuk (or other software) with the supplied list (under resources/adapters.fa).

3) If you know the library prep kit, read its manual. If you don't, you can use RSeQC or this method from Trinity wiki.

edit: what is the level of adapter contamination FastQC infers?

ADD REPLY • link 7.0 years ago by h.mon 35k

0

Entering edit mode

No, I didn't pay for it, I downloaded data from SRA. However, the related paper has published in the PNAS, how it published if there was a serious problem with sequencing data? RSeQC sounds great, thanks for introducing it.

ADD REPLY • link 7.0 years ago by seta ★ 1.9k