Question: What could be wrong with my FASTQ files? Picard suggests that there is missing header information.
0
gravatar for kmurph55
19 months ago by
kmurph550
kmurph550 wrote:

Hello, I have two fastq files 3D_1.fastq and 3d_2.fastq. To the best of my knowledge the first file contains forward reads and the second file contains reverse reads. I am able to confirm that the fastq files were generated as paired end reads, 101 base pairs in length, and have Illumina/sanger 1.9+ encoding. The data files that I have are the nucleotide sequences from a single sample and from a highseq machine. For some reason I am getting an error message from Picard that indicates a lack of read group information in the header of my files. I used Bowtie2 to map the reads against a reference genome and used the sorted bam file as the input file in order to validate its information in Picard. These are the first few lines from my first fastq file

 @SN996:194:H5V7HBCXY:1:1108:1872:2028 1:N:0:TCTCGCGC
NTATTTCATAGCATACTTTTCCGGGCTCGCCGGGCCTAAGAAAGTTGCAAAAATTTTTCAATCGAAATACAAATGAAATTAAAACCTACGCGCGTGTGTGG
+
DHHIIIIIIIIIHHFHIHHIHGIIICHHGHIIHIHHHEHIDGHHFEHIHGHHIIHIIHGIIIIIHIHIIECHIIGFFHHIHIHCFHIIG<<E0CFHH
@SN996:194:H5V7HBCXY:1:1108:1995:2062 1:N:0:TCTCGCGC
CATCGATATGTATTTCTATTAACAAATTGCAAACATTACGATTAAATGAAAGAGTTGTGGCGTCCCTCGTTCTTGACCCGCGGACTGACTCACAGTCCCGA

These are the first few lines from my second fastq file

@SN996:194:H5V7HBCXY:1:1108:1872:2028 2:N:0:TCTCGCGC
GCCGGCGGCAGTTTGTGCATTGCTTTTGAAGTGGCAACAATTTCGCCACGATTCTCTTGGTCTTTCTTCGGTTGCTGTTGCTGGAGGAGCCTCCATTATTC
+
DDCDCIICC<ECDHHHEHIHGHEFGGHIHEHHIIIIH?GH1CHH?EGHHHCE<1D@1<<@<FEEFCF1GHHIFHC1<F<<@<E111<EEEHHIIIG1CCD1
@SN996:194:H5V7HBCXY:1:1108:1995:2062 2:N:0:TCTCGCGC
CTGACCGCAGTGAATCGGAAGGTGGCCTACGAGTACCAGTCGAATACGAAGAACGAGGCCCTCAACCAGATGAAGGAAATGCCCAACTTTATGTCGACACT

I know that the fastq files were generated from a single sample, so it would make sense that they do not contain Read Group identification because all reads belong to only a single sample. I would assume that it is fairly common to have sequencing done on a single sample and that if this information was 100% necessary to have in the header that the sequencing company would have formatted the data in such a way that it would not prevent downstream analyses. For what reason would I be getting this error in Picard? Does anyone have a suggestion on how to move past this issue?

sequencing software error • 625 views
ADD COMMENTlink modified 19 months ago by Santosh Anand4.2k • written 19 months ago by kmurph550

is the space before the " @SN996" is a copy+paste problem when you' ve written the current post ? If not, this is your problem.

ADD REPLYlink written 19 months ago by Pierre Lindenbaum114k

Yes this was just an error that I made in my post.

ADD REPLYlink written 19 months ago by kmurph550

Illumina highseq for all your stoner sequencing!

ADD REPLYlink written 19 months ago by WouterDeCoster34k
2
gravatar for Santosh Anand
19 months ago by
Santosh Anand4.2k
Santosh Anand4.2k wrote:

Picard is a complementary toolset of GATK, and the latter obliges you to add RG information for each read and in header (and so Picard too). The RG info is added by user, according to these guidelines

http://gatkforums.broadinstitute.org/gatk/discussion/6472/read-groups

First decide what your RG (ReadGroup) string would be according to above, and since you have already mapped the reads, it is easier to add RG info using another picard tool AddOrReplaceReadGroups

From next time, You may also enter the RG-info at mapping time. Bowtie can do it by

--rg-id <text>
Set the read group ID to <text>. This causes the SAM @RG header line to be printed, with <text> as the value associated with the ID: tag. It also causes the RG:Z: extra field to be attached to each SAM output record, with value set to <text>.

Remember that RG-info is absolutely necessary for most of the GATK analysis

ADD COMMENTlink written 19 months ago by Santosh Anand4.2k

Thanks!! ... I didnt realize that this information needed to be set by the user.

ADD REPLYlink written 19 months ago by kmurph550
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1640 users visited in the last hour