Question: cannot read data from simulated FASTQ files
1
gravatar for szuszmok
3.6 years ago by
szuszmok40
Hungary
szuszmok40 wrote:

Hi All,

after some suffering I realized, that perhaps my simulated paired-end fastq files are not correct in some way. (I found in Bowtie2 log 'no input read files were valid' and Trimmomatic couldn't handle these files either)

So, below there is a part of one of my paired end sequences, I don't know what's wrong with that:

@s_forw_0_1-17:0-81195210_1-5957-4/1
TCCCTCAGTTTCCCCATCTGTGAAATGGGCTGGCCATGCTTAACCCCTGGAGTTGCCAAGGTAGCCCATCAGGGAACACAGCGCCCCTGTACCTCAGGCACTCCC
+
??AA??B?DDD?D.DBGGCCCFIEIEHIIHIIIIIEIIHFHIGHIIHIHHIIIIIHIFFHIIHHHGHHIIHHHIIHHHFDI-IIHHHHIHCHHHIDIFHI,*HHH
@s_forw_0_1-17:0-81195210_1-5957-2/1
CCCTCAGTTTCCCCATCTGTGAAATGGGCTGGCCATGCTTAACCCCTGGAGTTGCCAAGGTAGCCCATCAGGGAACACAGCGCCCCTGTACCTCAGGCACTCCCT
+
?A???BBBDDADBDDBGGGGGGIIIFIHFHIIFHIHIDCHIFEIHIIGHIEGHDFGH+IHGIHHEC@5FEHIHHHIHH?IGFFIGHDHHIFIHHFHHGHIHCHHH
@s_revc_0_2-17:0-81195210_1-5957-4/1
GGGCGGGCTCAGGAAGAGGAGGCCAAGGACGGCCAAGAGAGGGGTGGGAGCAGGTGAGGACTGAGTGACCGTCCCGTCCAACAGGTGGACAAGACCGAAGACAAG
+
?????B?BBBDDEDDDFGGEGGHCHIHBHEHIIIHHIHHIIIHIHIHIHGIHEHHHHIHI==IIIFHHIIHHIAFAAIIHDIIGDHIHFHHGHFFHHIDFHHHGH
@s_revc_0_2-17:0-81195210_1-5957-2/1
GGTCACTCAGTCCTCACCTGCTCCCACCCCTCTCTTGGCCGTCCTTGGCCTCCTCTTCCTGAGCCCGCCCATCCGGCTGCTGCAGCCGGGCCTGGTCACGGTCCC
+
??AAABBBDDDDBDDEGGFCGFFIHFHIIIHHHFGGDFIICHHFIFFHDHHIIIHHIIIHHFIIHEEHHAFHHIHFIIH=GEIFIFHIIHHHIHHEHHHHHHFDH
@s_forw_1_1-17:0-81195210_1-5957-4/1
CAGGTGAGGACTGAGTGACCGTCCCGTCCAACAGGTGGACAAGACCGAAGACAAGTCCCTGGAGGAGCGGGGCCGCATCCTCATCTCGCTCAAGTACAGCTCACA
+
?,A??BBBBDDDBD@DFFGCFFIIIFIIFIIIIHIHHHIHHHHHFHIHIIIA=HIIIIIIHHICDHHHIHIFIHIEIHIH=HIBIH8HFIHHHGHHIG=HHI=HF
@s_forw_1_1-17:0-81195210_1-5957-2/1
CAGCTCACAGAAGCAAGGCCTGCTGGTAGGCATCGTGCGGTGCGCCCACCTGGCCGCCATGGACGCCAACGGCTACTCGGACCCCTACGTAAAAACGTGAGTGTG
+
??9??B?@DDDDDD6EGFGGGCHIB@ICICIIIIHHFIIIDHHIAHIIIHIHHFGIFIIIBHIIFHHHHHDHIHFIDIIIHHHCIII5HH+IAIHGHFHIHHGHH
@s_forw_1_2-17:0-81195210_1-5957-4/1
ACGGCACACTCACGTTTTCACGTAGGGGTCCGAGTAGCCGTTGGCGTCCATGGCGGCCAGGTGGGCGCACCGCACGATGCCTACCAGCAGGCCTTGCTTCTGTGA
+
?,????B?BBD-DDBDGGFGGCIAIHH/IFHHGAEIFIDHIIHHHIIHIF?HGIHIFIIFFHIHHIHIHHDHGHIIFH?IICGCII>EI.HIHCHIIHHHFH4HF
@s_forw_1_2-17:0-81195210_1-5957-2/1
GGTGCGCCCACCTGGCCGCCATGGACGCCAACGGCTACTCGGACCCCTACGTGAAAACGTGAGTGTGCCGTGCGCGTGACCACCTGCCACGTCTTCACCTCCAAG
+
????ABBBDDBDDDDDFCGGGFHHIIFHHIHIC=H9IEHHIIHGHIHICHHIHIFHIHIEHFIHFHHEHIHGHIHIICHIH@IAIIHIHFIIHIFHHHI=HHGDG
@s_forw_0_1-17:0-81195210_1-11156-4/1
CCCGATGAACTCATTGTGCCGGAATTTGTCCTCGTCACACACAGAGATCCTAGAGGGGGCGGTGGTGAGGGGCACAGCCAGTGCCTCAGACGCACTGGGCATGGT
+
?????B@BD@DDDDDDCGECFGIIFIFHIAIIHFEGIHFHHHFGIIHIIICCHIIIHIHFIEFHHHIGIIIFHIHHIDHHIHHHHFH#EIFDIHHFIGHHGFHHB
@s_forw_0_1-17:0-81195210_1-11156-2/1
CCTTCCTCACCTCCACCCCCTTGACTCTCCATGCTCACCTCCCCGGTCTCCCCTCCCCTCTCACTCTGCCCCTCATGAGTCCCATCACAGGCAGGAAGTTATGCC
+

the Trimmomatic command I used:

java -jar /opt/Trimmomatic-0.33/

trimmomatic-0.33.jar PE -trimlog ./fastq/Trimmomatic/trimlog_control_0_D1.txt ./fastq/control_0_D4_1.fq ./fastq/control_0_D4_2.fq ./fastq/Trimmomatic/Paired/control_0_D1_forward-paired.fastq ./fastq/Trimmomatic/Unpaired/control_0_D1_forward-unpaired.fastq ./fastq/Trimmomatic/Paired/control_0_D1_reverse-paired.fastq ./fastq/Trimmomatic/Unpaired/control_0_D1_reverse-unpaired.fastq ILLUMINACLIP:/opt/Trimmomatic-0.33/adapters/TruSeq3-PE.fa:2:40:10

and the output of it:

TrimmomaticPE: Started with arguments: -trimlog ./fastq/Trimmomatic/trimlog_control_0_D1.txt ./fastq/control_0_D4_1.fq ./fastq/control_0_D4_2.fq ./fastq/Trimmomatic/Paired/control_0_D1_forward-paired.fastq ./fastq/Trimmomatic/Unpaired/control_0_D1_forward-unpaired.fastq ./fastq/Trimmomatic/Paired/control_0_D1_reverse-paired.fastq ./fastq/Trimmomatic/Unpaired/control_0_D1_reverse-unpaired.fastq ILLUMINACLIP:/opt/Trimmomatic-0.33/adapters/TruSeq3-PE.fa:2:40:10
Multiple cores found: Using 4 threads
Using PrefixPair: 'TACACTCTTTCCCTACACGACGCTCTTCCGATCT' and 'GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT'
ILLUMINACLIP: Using 1 prefix pairs, 0 forward/reverse sequences, 0 forward only sequences, 0 reverse only sequences
Quality encoding detected as phred33
Input Read Pairs: 204352 Both Surviving: 204352 (100.00%) Forward Only Surviving: 0 (0.00%) Reverse Only Surviving: 0 (0.00%) Dropped: 0 (0.00%)
TrimmomaticPE: Completed successfully

Thank you for any help!

fileformat fastq • 1.4k views
ADD COMMENTlink modified 3.6 years ago by Antonio R. Franco3.9k • written 3.6 years ago by szuszmok40

I suspect the problem relates to something else. I can't see anything wrong here.

ADD REPLYlink written 3.6 years ago by SmallChess460

It could be a newline or path issue, or something else. Writing out the full command for Trimmomatic and/or bowtie may help to resolve the issue.

ADD REPLYlink written 3.6 years ago by SES8.1k

I'm sure that the path is correct.

ADD REPLYlink written 3.6 years ago by szuszmok40
3
gravatar for John
3.6 years ago by
John12k
Germany
John12k wrote:

Have you tried this: http://genome.sph.umich.edu/wiki/FastQValidator ? Its pretty old, but might work.

If you didnt spot the bug in the first few lines, you're not going to see it manually - I would definitely look for a validator, or write some tests into your code to make sure its doing what you think it should be doing :)

ADD COMMENTlink written 3.6 years ago by John12k
1
gravatar for Antonio R. Franco
3.6 years ago by
Spain. Universidad de Córdoba
Antonio R. Franco3.9k wrote:
@HWI-EAS209_0006_FC706VJ:5:58:5894:21141#ATCACG/1
TTAATTGGTAAATAAATCTCCTAATAGCTTAGATNTTACCTTNNNNNNNNNNTAGTTTCTTGAGATTTGTTGGGGGAGACATTTTTGTGATTGCCTTGAT
+HWI-EAS209_0006_FC706VJ:5:58:5894:21141#ATCACG/1
efcfffffcfeefffcffffffddf`feed]`]_Ba_^__[YBBBBBBBBBBRTT\]][]dddd`ddd^dddadd^BBBBBBBBBBBBBBBBBBBBBBBB

What you can see above is a nice an useful FastQ data taken from http://en.wikipedia.org/wiki/FASTQ_format

@HWI-EAS209_0006_FC706VJ ->You can see the name of the machine, then

5 -> the flowcell line

58: the "parcel" within the flowcell line (tesela)

5894:21141: the coordinates within that tesela that contains the DNA you are sequencing

#ATCACG is optional and is present only when using a barcode

/1 if paired...

I miss this into your data. 

@s_forw_1_2-17:0-81195210_1-5957-4/1 
ACGGCACACTCACGTTTTCACGTAGGGGTCCGAGTAGCCGTTGGCGTCCATGGCGGCCAGGTGGGCGCACCGCACGATGCCTACCAGCAGGCCTTGCTTCTGTGA
+
?,????B?BBD-DDBDGGFGGCIAIHH/IFHHGAEIFIDHIIHHHIIHIF?HGIHIFIIFFHIHHIHIHHDHGHIIFH?IICGCII>EI.HIHCHIIHHHFH4HF
ADD COMMENTlink written 3.6 years ago by Antonio R. Franco3.9k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 548 users visited in the last hour