Question

Align reads files with diffrent number of reads using bowtie2

0

Entering edit mode

4.1 years ago

Bioinfo ▴ 20

Hello everyone

please i tried to use bowtie2 to align my reads to reference genome , but my forward and reverse reads have different number of reads (R1.fastq = 39123 R2.fastq =38456 ) when i run bowtie2 it shows this message

Error, fewer reads in file specified with -2 than in file specified with -1

can anyone tell me what to do please

Thank you

sequence alignment • 2.1k views

ADD COMMENT • link updated 4.0 years ago by Biostar 20 • written 4.1 years ago by Bioinfo ▴ 20

0

Entering edit mode

First of all find out why there are different read numbers. Did you manipulate the files somehow? Is this paired-end data? You can try and repair with repair.sh from BBmap suite.

ADD REPLY • link 4.1 years ago by ATpoint 81k

0

Entering edit mode

can you please tell me the command line i can use to have two files with same number of reads and one file contain the reads that are not in file 1

Thank you very much

ADD REPLY • link 4.1 years ago by Bioinfo ▴ 20

0

Entering edit mode

repair.sh in=R1.fastq in2=R2.fastq out=R1_repair.fastq out2=R2_repair.fastq

Still, this should not even happen, why are there different read numbers?

First of all find out why there are different read numbers. Did you manipulate the files somehow?

ADD REPLY • link 4.1 years ago by ATpoint 81k

0

Entering edit mode

i tried it but it shows this error

Started output stream.
java.lang.AssertionError:
Error in wPru_2.fastq, line 1867, with these 4 lines:
@HD     VN:1.0  SO:unsorted
@SQ     SN:AJ300578.1   LN:425
@SQ     SN:AJ275212.1   LN:428
@SQ     SN:AJ275211.1   LN:428

        at stream.FASTQ.quadToRead_slow(FASTQ.java:697)
        at stream.FASTQ.toReadList(FASTQ.java:646)
        at stream.FastqReadInputStream.fillBuffer(FastqReadInputStream.java:107)
        at stream.FastqReadInputStream.hasMore(FastqReadInputStream.java:73)
        at stream.ConcurrentGenericReadInputStream$ReadThread.readLists(ConcurrentGenericReadInputStream.java:667)
        at stream.ConcurrentGenericReadInputStream$ReadThread.run(ConcurrentGenericReadInputStream.java:656)

ADD REPLY • link updated 4.1 years ago by GenoMax 141k • written 4.1 years ago by Bioinfo ▴ 20

0

Entering edit mode

Output of head R1.fastq and head R2.fastq? This does not look like fastq files. Rather SAM files.

Again, and please answer otherwise this is not productive:

First of all find out why there are different read numbers. Did you manipulate the files somehow?

In other words: How did you get these files and what did you do with them?

ADD REPLY • link 4.1 years ago by ATpoint 81k

0

Entering edit mode

i'm sorry

yes its fatsq files i checked them , i got them from merging Hiseq data and Miseq data of the same strain but when i checked the number of reads in each file i found that it's different

ADD REPLY • link 4.1 years ago by Bioinfo ▴ 20

0

Entering edit mode

Output of head R1.fastq and head R2.fastq ?

According to the above error message the second file is not a fastq file. It has SAM header.

ADD REPLY • link 4.1 years ago by ATpoint 81k

0

Entering edit mode

Ahhhh , so do i need to delete these lines ?

ADD REPLY • link 4.1 years ago by Bioinfo ▴ 20

1

Entering edit mode

Files you have are in a completely different format (SAM) which is used to store alignments. This is NOT primary sequence data in fastq format. Edit: It is technically possible to store fastq reads in an unaligned SAM format file.

You can use a different tool from BBMap suite if you want to get fastq format files from SAM files. You will do something like:

reformat.sh in=your_current_file_in_SAM out=file.fq (If you have single-end data)
reformat.sh in=your_current_file_in_SAM out1=file_R1.fq out2=file_R2.fq (if you have paired-end data)

ADD REPLY • link 4.1 years ago by GenoMax 141k

1

Entering edit mode

Your second file is apparently not in FASTQ but in SAM format. This is a different format as genomax says. Therefore, please post exactly the code you have used to generate these files (and I really mean exactly). How did you obtain these files? Is there anyone in your lab that can help you? You (no offense) lack some essential basics towards NGS data so it is difficult to help from remote.

ADD REPLY • link 4.1 years ago by ATpoint 81k

0

Entering edit mode

Hello . i found that the number of reads is higher in R1Hiseq than R2Hiseq , and i usethe command you told me and it works well !! AHh i added -outs option repair.sh in=R1.fastq in2=R2.fastq out=R1_repair.fastq out2=R2_repair.fastq After that i merge the outputs with Mi seq data and i did bowtie2 and it works well

THank you very much for your help , i learned new thing , thank you and yea i ve been working for more than 8 hours and i felt super tired

ADD REPLY • link 4.1 years ago by Bioinfo ▴ 20