Having trouble Aligning paired end reads with STAR
1
0
Entering edit mode
17 months ago
kcarey • 0

Code used:

STAR --runThreadN 20 --genomeDir ~/FinalMayodownloads/Matched/Fastq/Indexes/ncbi-genomes-2022-09-19/ --readFilesIn ~/directory/tofile/OV105.FCD1U68ACXX_L4_IGTTTCG.fastq_R1.fastq ~/directory/tofile/OV105.FCD1U68ACXX_L4_IGTTTCG.fastq_R2.fastq --outFileNamePrefix HG38Aligned/OV105 --outSAMtype BAM Unsorted 

I am getting this error message:

ReadAlignChunk_processChunks.cpp:204:processChunks EXITING because of FATAL ERROR in input reads: wrong read ID line format: the read ID lines should start with @ or > 
Offending line for read # 2
-- 
SOLUTION: verify and correct the input read files

Nov 22 11:25:11 ...... FATAL ERROR, exiting

Head of OV105.R1

@R0212989_0257:4:1101:1334:1834#GTTTCG/1
NAAATTTATCCTTTCCTTTAATTTTTATCACGAGGCTATGTTTTATGTTC
+
#1=DDFFFHHHHHJJJJJJJJJJJJJIJJIJJGHIJJIJIIJJJJJJIII
--
@R0212989_0257:4:1101:1389:1874#GTTTCG/1
NTCTGGATTTACTAAAGCCTTTATCATCAATACATATCTCTGTTTCTGTG
+
#1=DDFFFHHHHHJJJJJJJJJIJJJJJJGIIJJJIIGIIJJFHIIIIFH

--

Head of OV105.R2

@R0212989_0257:4:1101:1334:1834#GTTTCG/2
TTGCATTTTACAAGCTCAAGAGTAAAGCACAATAATTATCAGTGCTTTAT
+
CCCFFFFFHHHHHIJJJJJJIICHJJJJIJJJJIJJJIIIIJIIJJJJJJ
--
@R0212989_0257:4:1101:1389:1874#GTTTCG/2
GTGCAGTACTGCCAGAGCTGGTTTCAACATCTCCCCAGAAACAGAGATGG
+
CBBFFFEFHHHHHJJIIJJJJHJJIJJIJJIIGIIJGHGIJJJHHIGGIJ

--

RNA STAR 38 HG Sequencing • 1.1k views
ADD COMMENT
0
Entering edit mode
17 months ago
GenoMax 142k

Do you actually have -- separator line between each fastq record as we see above? How did those get there?

You will need to remove those separator lines from both files. zgrep -v '^--$' file.fq.gz > clean.fq.gz will work but check.

ADD COMMENT
0
Entering edit mode

Yes, I am not sure how it got there....I originally had one file of both the r1 and r2 but I did the command

# first separate pair-end reads  between 1 and 2.

for f in *.fastq

            do 

cat ${f} | grep '^@.*/1$' -A 3 > Separated_fastq/${f}_R1.fastq

cat ${f} | grep '^@.*/2$' -A 3 > Separated_fastq/${f}_R2.fastq

to separate them so I could align them to HG-38

ADD REPLY
0
Entering edit mode

It got there because of grep (that is a separator line used to show pattern finds). You should use --no-separator with grep (it may not be available with all versions of grep in that case an additional grep -v '--' will remove those lines. You could simply re-run your command with the additional option and be good to go.

ADD REPLY
0
Entering edit mode

Wow, that makes so much sense....thanks!

So would you recommend doing this:

for f in *.fastq

        do 

cat ${f} | grep -v --no- separator '^@.*/1$' -A 3 > Separated_fastq/${f}_R1.fastq

cat ${f} | grep -v --no- separator '^@.*/2$' -A 3 > Separated_fastq/${f}_R2.fastq

ADD REPLY
0
Entering edit mode

No -v needed in command above. Just need --no-separator (no space between - and separator).

That said you could have used reformat.sh from BBMap suite to do the separation (as long as the file was interleaved) by doing

reformat.sh -Xmx2g in=orig.fastq.gz out1=Read1.fastq.gz out2=Read2.fastq.gz
ADD REPLY
0
Entering edit mode

Oh great, I am not familiar with BBMAp suite...I will give it a look. I was doing all of this on my server... do you recommend this way opposed to the grep command?

I appreciate your advice. Going to give it a go

ADD REPLY

Login before adding your answer.

Traffic: 1828 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6