Question

Having trouble Aligning paired end reads with STAR

0

Entering edit mode

17 months ago

kcarey • 0

Code used:

STAR --runThreadN 20 --genomeDir ~/FinalMayodownloads/Matched/Fastq/Indexes/ncbi-genomes-2022-09-19/ --readFilesIn ~/directory/tofile/OV105.FCD1U68ACXX_L4_IGTTTCG.fastq_R1.fastq ~/directory/tofile/OV105.FCD1U68ACXX_L4_IGTTTCG.fastq_R2.fastq --outFileNamePrefix HG38Aligned/OV105 --outSAMtype BAM Unsorted

I am getting this error message:

ReadAlignChunk_processChunks.cpp:204:processChunks EXITING because of FATAL ERROR in input reads: wrong read ID line format: the read ID lines should start with @ or > 
Offending line for read # 2
-- 
SOLUTION: verify and correct the input read files

Nov 22 11:25:11 ...... FATAL ERROR, exiting

Head of OV105.R1

@R0212989_0257:4:1101:1334:1834#GTTTCG/1
NAAATTTATCCTTTCCTTTAATTTTTATCACGAGGCTATGTTTTATGTTC
+
#1=DDFFFHHHHHJJJJJJJJJJJJJIJJIJJGHIJJIJIIJJJJJJIII
--
@R0212989_0257:4:1101:1389:1874#GTTTCG/1
NTCTGGATTTACTAAAGCCTTTATCATCAATACATATCTCTGTTTCTGTG
+
#1=DDFFFHHHHHJJJJJJJJJIJJJJJJGIIJJJIIGIIJJFHIIIIFH

--

Head of OV105.R2

@R0212989_0257:4:1101:1334:1834#GTTTCG/2
TTGCATTTTACAAGCTCAAGAGTAAAGCACAATAATTATCAGTGCTTTAT
+
CCCFFFFFHHHHHIJJJJJJIICHJJJJIJJJJIJJJIIIIJIIJJJJJJ
--
@R0212989_0257:4:1101:1389:1874#GTTTCG/2
GTGCAGTACTGCCAGAGCTGGTTTCAACATCTCCCCAGAAACAGAGATGG
+
CBBFFFEFHHHHHJJIIJJJJHJJIJJIJJIIGIIJGHGIJJJHHIGGIJ

--

RNA STAR 38 HG Sequencing • 1.1k views

ADD COMMENT • link 17 months ago by kcarey • 0

GenoMax · Answer 1 · 2022-11-22

0

Entering edit mode

17 months ago

GenoMax 142k

Do you actually have -- separator line between each fastq record as we see above? How did those get there?

You will need to remove those separator lines from both files. zgrep -v '^--$' file.fq.gz > clean.fq.gz will work but check.

ADD COMMENT • link 17 months ago by GenoMax 142k

0

Entering edit mode

Yes, I am not sure how it got there....I originally had one file of both the r1 and r2 but I did the command

# first separate pair-end reads  between 1 and 2.

for f in *.fastq

            do 

cat ${f} | grep '^@.*/1$' -A 3 > Separated_fastq/${f}_R1.fastq

cat ${f} | grep '^@.*/2$' -A 3 > Separated_fastq/${f}_R2.fastq

to separate them so I could align them to HG-38

ADD REPLY • link updated 17 months ago by GenoMax 142k • written 17 months ago by kcarey • 0

0

Entering edit mode

It got there because of grep (that is a separator line used to show pattern finds). You should use --no-separator with grep (it may not be available with all versions of grep in that case an additional grep -v '--' will remove those lines. You could simply re-run your command with the additional option and be good to go.

ADD REPLY • link 17 months ago by GenoMax 142k

0

Entering edit mode

Wow, that makes so much sense....thanks!

So would you recommend doing this:

for f in *.fastq

do

cat ${f} | grep -v --no- separator '^@.*/1$' -A 3 > Separated_fastq/${f}_R1.fastq

cat ${f} | grep -v --no- separator '^@.*/2$' -A 3 > Separated_fastq/${f}_R2.fastq

ADD REPLY • link 17 months ago by kcarey • 0

0

Entering edit mode

No -v needed in command above. Just need --no-separator (no space between - and separator).

That said you could have used reformat.sh from BBMap suite to do the separation (as long as the file was interleaved) by doing

reformat.sh -Xmx2g in=orig.fastq.gz out1=Read1.fastq.gz out2=Read2.fastq.gz

ADD REPLY • link 17 months ago by GenoMax 142k

0

Entering edit mode

Oh great, I am not familiar with BBMAp suite...I will give it a look. I was doing all of this on my server... do you recommend this way opposed to the grep command?

I appreciate your advice. Going to give it a go

ADD REPLY • link 17 months ago by kcarey • 0