Understanding NCBI vs ENA data
0
0
Entering edit mode
11 weeks ago
wes ▴ 90

I want to download PacBio RSII data with accession number SRR6037732 for further analysis. In NCBI, there are both SRA archive files and original format data available.

How can I identify which file contains subreads without PacBio adapter contamination?

Under the original format section, there is a file listed as type pacbio_native, available at this AWS link: https://sra-pub-src-1.s3.amazonaws.com/SRR6037732/D1_filtered_subreads.fastq.gz.1. Since the file is named "filtered_subreads", can I assume it is free from PacBio adapter contamination?

A similar file is available on the ENA, named SRR6037732_subreads.fastq.gz. Is this file identical to the D1_filtered_subreads.fastq.gz.1 file from NCBI?

ENA NCBI SRA • 672 views
ADD COMMENT
2
Entering edit mode

Both places should have checksums available - you could look if they were the same.

ADD REPLY
0
Entering edit mode

Thanks for pointing me to check their checksums.

ADD REPLY
0
Entering edit mode

How can I ensure that the subread file is free of PacBio adapter contamination, apart from checking with FastQC? Although the FastQC results show no obvious adapter contamination, I’m concerned there might be residual adapters that were not detected, which could potentially affect the assembly process. Since the data from the PacBio RSII system undergoes primary read preprocessing onboard the instrument, does that mean the output is guaranteed to be free of adapter contamination?

ADD REPLY
1
Entering edit mode

While the file name seems to indicate the data is filtered, you can use lima (LINK) to confirm that. You will need to know which library prep method was used for your data. You can also use one of the workflows PacBio provides, if it fits your needs: https://github.com/PacificBiosciences

ADD REPLY

Login before adding your answer.

Traffic: 5113 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6