Question: Processing FASTQ files after fastq-dump
2
gravatar for SOHAIL
3.5 years ago by
SOHAIL300
Beijing Institute of Genomics, CAS.
SOHAIL300 wrote:

Hi All,

I downloaded data from SRA archive. and utilize fastq-dump to convert it into FASTQ files.

fastq-dump ----outdir $OUTPUT -I --split-files $INPUT/SRR00000.sra

The FASTQ file headers were something like this:

@SRR1101035.5.1 5 length=100 CCTTGCTCAGACCTTGCCTTGAACTCTTGGCTTCAAGTGATCCNNNNNNNNCGACCTCTCAAAGNGCTGAGGTNATAGGGATGAGCCACTGTGCCTGGCC +SRR1101035.5.1 5 length=100

@@@FFEFDCFCFBHEHHHIIGC@EHHIIII>DEG1@?1?DG########(0(7<;FCGHC;=#--5A5?#############################

@SRR1101035.6.1 6 length=100 AGCCAATCTGAAAAGGTTACATACTNNANNNTNTNAANNNNNNNNNNNNNNNNAAAAGGCNNNNNTGNNNNNNNNGTGAAAAGATCAGTGGTTGCCAGGG +SRR1101035.6.1 6 length=100

<@@FFDEFHHD>?FEGHIEI>HEHI##########################################################################

I removed the info in the FASTQ header section, and with command as suggested here before:

cat your_original_fasta_file | paste - - - - | awk -v OFS="\t" ' {print $1,$4,"+",$8}' | tr "\t" "\n" > new_fasta_file

The results were:

@SRR1101035.5.1 CCTTGCTCAGACCTTGCCTTGAACTCTTGGCTTCAAGTGATCCNNNNNNNNCGACCTCTCAAAGNGCTGAGGTNATAGGGATGAGCCACTGTGCCTGGCC +

@@@FFEFDCFCFBHEHHHIIGC@EHHIIII>DEG1@?1?DG########(0(7<;FCGHC;=#--5A5?#############################

@SRR1101035.6.1 AGCCAATCTGAAAAGGTTACATACTNNANNNTNTNAANNNNNNNNNNNNNNNNAAAAGGCNNNNNTGNNNNNNNNGTGAAAAGATCAGTGGTTGCCAGGG +

<@@FFDEFHHD>?FEGHIEI>HEHI##########################################################################

Here in the above section

@SRR1101035.5.1 define as @samplename.readid.type of pair_in pair-end-mode.

But still when i utilize BWA software to map the first pair and second pair, it prompts an error message, i.e.:

"paired reads have different names: "SRR1101036.5.1", "SRR1101036.5.2"

QUESTION:

I WANT TO EDIT THE FASTQ FILE, SO in future if i use any read-pair info sensitive tool, there will be no problem in processing the data.

Like other conventional files, the one possible way is, i edit the header info in fastq file:

something like convert: @SRR1101035.5.1 >>> @SRR1101035.5#/1

Could you please suggest how i can edit this in linux command-line or by any other possible means, Remember i want to preserve the sample and read ids info in header section???

Thanks

linux command-line sequencing • 2.2k views
ADD COMMENTlink modified 3.5 years ago by Antonio R. Franco4.2k • written 3.5 years ago by SOHAIL300
0
gravatar for Antonio R. Franco
3.5 years ago by
Spain. Universidad de Córdoba
Antonio R. Franco4.2k wrote:

It is a very nice idea if you use fastq-dump with the --split-3 legacy command, because this way, you ensure that your files will be paired and synchronized. Any read that are not following these two rules will be kept in a third file

Give it a try, because it could be a different kind of problem the one you have.

If you get a persistent problem, we will give you a way to edit the names

ADD COMMENTlink modified 3.5 years ago • written 3.5 years ago by Antonio R. Franco4.2k

Thanks for suggestion Antonio!

I tried --split-3 function in command-line..

fastq-dump --gzip --outdir $OUTPUT -I --split-3 $INPUT/SRR1101035.sra

The output files written with same number of spots like with previous command line. So i believe there's no such read that is not following the rule (no third file generated).

But the problem persists. Could you please suggest something command-line in one step so i can replace

@SRR1101035.6.1 6 length=100 AGCCAATCTGAAAAGGTTACATACTNNANNNTNTNAANNNNNNNNNNNNNNNNAAAAGGCNNNNNTGNNNNNNNNGTGAAAAGATCAGTGGTTGCCAGGG

+SRR1101035.6.1 6 length=100

<@@FFDEFHHD>?FEGHIEI>HEHI##########################################################################

Expected Results:

@SRR1101035.6#/1 AGCCAATCTGAAAAGGTTACATACTNNANNNTNTNAANNNNNNNNNNNNNNNNAAAAGGCNNNNNTGNNNNNNNNGTGAAAAGATCAGTGGTTGCCAGGG

+

<@@FFDEFHHD>?FEGHIEI>HEHI##########################################################################

-- Thanks

ADD REPLYlink written 3.5 years ago by SOHAIL300
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1875 users visited in the last hour