Entering edit mode
3.8 years ago
c910816946
•
0
I download data from ENA, the bioproject is: PRJNA545730
I split SRA files with:
fastq-dump --split-files SRR9167437
It generates 4 files:
SRR9167437_1.fastq
SRR9167437_2.fastq
SRR9167437_3.fastq
SRR9167437_4.fastq
head each of these files output:
head SRR9167437_1.fastq
@SRR9167437.1 700175F:CAPTEANXX170817:CAPTEANXX:1:1101:10000:48224 length=50
GATGCAGATTAAGCAAGCACCACACACCACCCCCAACAACCGCCCCGGGG
+SRR9167437.1 700175F:CAPTEANXX170817:CAPTEANXX:1:1101:10000:48224 length=50
<BB###############################################
@SRR9167437.2 700175F:CAPTEANXX170817:CAPTEANXX:1:1101:10000:6480 length=50
AAGTTTAAGGTACTGCCTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT
+SRR9167437.2 700175F:CAPTEANXX170817:CAPTEANXX:1:1101:10000:6480 length=50
/BBBBFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFBBF##
@SRR9167437.3 700175F:CAPTEANXX170817:CAPTEANXX:1:1101:10001:25061 length=50
TTCCGGTTGATCGCTTCTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT
head SRR9167437_2.fastq
@SRR9167437.1 700175F:CAPTEANXX170817:CAPTEANXX:1:1101:10000:48224 length=50
GTTCCTCTCACCATAAAATGAGGAATCCAGATTGTTTCAAAGGATGGTGC
+SRR9167437.1 700175F:CAPTEANXX170817:CAPTEANXX:1:1101:10000:48224 length=50
BBBBBFFFFFFFFFFFFFFFBFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
@SRR9167437.2 700175F:CAPTEANXX170817:CAPTEANXX:1:1101:10000:6480 length=50
TCCCAGGGGTTCGATAGAAGGAGGATTTCAGCTTTGCCCAAGAATGTCTA
+SRR9167437.2 700175F:CAPTEANXX170817:CAPTEANXX:1:1101:10000:6480 length=50
BBBBBFFFBFFFF/FFFFFFFFFBFF<FFFFFF<BFFFFFFFFFB<BFF<
@SRR9167437.3 700175F:CAPTEANXX170817:CAPTEANXX:1:1101:10001:25061 length=50
GTAAACATATTTTTAATGCATACTTAAGTAATATTTAAGAAACTAAACAA
head SRR9167437_3.fastq
@SRR9167437.1 700175F:CAPTEANXX170817:CAPTEANXX:1:1101:10000:48224 length=10
ACCAGGCGCA
+SRR9167437.1 700175F:CAPTEANXX170817:CAPTEANXX:1:1101:10000:48224 length=10
BBBBBFFFFF
@SRR9167437.2 700175F:CAPTEANXX170817:CAPTEANXX:1:1101:10000:6480 length=10
ACCAGGCGCA
+SRR9167437.2 700175F:CAPTEANXX170817:CAPTEANXX:1:1101:10000:6480 length=10
BBBBBFFFFF
@SRR9167437.3 700175F:CAPTEANXX170817:CAPTEANXX:1:1101:10001:25061 length=10
ACCAGGCGCA
head SRR9167437_4.fastq
@SRR9167437.1 700175F:CAPTEANXX170817:CAPTEANXX:1:1101:10000:48224 length=10
GATGCAGTTC
+SRR9167437.1 700175F:CAPTEANXX170817:CAPTEANXX:1:1101:10000:48224 length=10
BBBBBFFFFF
@SRR9167437.2 700175F:CAPTEANXX170817:CAPTEANXX:1:1101:10000:6480 length=10
GATGCAGTTC
+SRR9167437.2 700175F:CAPTEANXX170817:CAPTEANXX:1:1101:10000:6480 length=10
BBBBBFFFF<
@SRR9167437.3 700175F:CAPTEANXX170817:CAPTEANXX:1:1101:10001:25061 length=10
GATGCAGTTC
I tried with not split files:
fastq-dump SRR9167437
head SRR9167437.fastq
@SRR9167437.1 700175F:CAPTEANXX170817:CAPTEANXX:1:1101:10000:48224 length=120
GATGCAGATTAAGCAAGCACCACACACCACCCCCAACAACCGCCCCGGGGGTTCCTCTCACCATAAAATGAGGAATCCAGATTGTTTCAAAGGATGGTGCACCAGGCGCAGATGCAGTTC
+SRR9167437.1 700175F:CAPTEANXX170817:CAPTEANXX:1:1101:10000:48224 length=120
<BB###############################################BBBBBFFFFFFFFFFFFFFFBFFFFFFFFFFFFFFFFFFFFFFFFFFFFFBBBBBFFFFFBBBBBFFFFF
@SRR9167437.2 700175F:CAPTEANXX170817:CAPTEANXX:1:1101:10000:6480 length=120
AAGTTTAAGGTACTGCCTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTCCCAGGGGTTCGATAGAAGGAGGATTTCAGCTTTGCCCAAGAATGTCTAACCAGGCGCAGATGCAGTTC
+SRR9167437.2 700175F:CAPTEANXX170817:CAPTEANXX:1:1101:10000:6480 length=120
/BBBBFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFBBF##BBBBBFFFBFFFF/FFFFFFFFFBFF<FFFFFF<BFFFFFFFFFB<BFF<BBBBBFFFFFBBBBBFFFF<
@SRR9167437.3 700175F:CAPTEANXX170817:CAPTEANXX:1:1101:10001:25061 length=120
TTCCGGTTGATCGCTTCTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTGTAAACATATTTTTAATGCATACTTAAGTAATATTTAAGAAACTAAACAAACCAGGCGCAGATGCAGTTC
I'm wordering why it generate 4 files?
Is there any ways to split paired end to read1 and read2 for my data?
Thanks!
Judging by the length of reads in fastqs 3 and 4, my guess would be that they had barcodes and/or UMIs present in the forward and/or reverse adapters that got sequenced. In order to definitively answer this you would need to know the library prep kit or sequencing adapter structure. That info is sometimes included in the GEO submission, and should be in the paper.
Thanks rpolicastro, I read the SRA page that author of the paper write annotation for the data. _3.fastq and _4.fastq is acturally barcodes.
Thanks again! Baoqiang.
A couple of points:
code
option) to present your post better. You can use backticks for inline code (`text` becomestext
), or select a chunk of text and use the highlighted button to format it as a code block. I've done it for you this time.