Question: fastq file do not look as they should (running Hisat2)
0
gravatar for luzglongoria
4 months ago by
luzglongoria20
luzglongoria20 wrote:

Hi there, I'm working with bird samples that are infected with parasites (malaria parasites) and I need to do the novo assembling but before that I have to filter out my reads (distinguish between parasite and host RNA). For Trinity I want a fastq file input, so I need that the output in Hisay to be fastq file. With this purpose I run Hisat2 and I used this command:

hisat2 --dta -x /home/luz_garcia_longoria/workspace/reference_genomes/hisat2/parasitereference.fasta --al-conc-gz aligned_read_hisat2.fastq -1 /home/luz_garcia_longoria/workspace/s21_1.fq.gz,s22_1.fq.gz,s23_1.fq.gz,s24_1.fq.gz,s25_1.fq.gz,s31_1.fq.gz,s32_1.fq.gz,s33_1.fq.gz,s34_1.fq.gz,s35_1.fq.gz -2 /home/luz_garcia_longoria/workspace/s21_2.fq.gz,s22_2.fq,s23_2.fq.gz,s24_2.fq.gz,s25_2.fq.gz,s31_2.fq.gz,s32_2.fq.gz,s33_2.fq.gz,s34_2.fq.gz,s35_2.fq.gz

When the program finishes I get these two files:

ls
aligned_read_hisat2.1.fastq
aligned_read_hisat2.2.fastq

and if I try to see what there is inside:

head -8 aligned_read_hisat2.1.fastq 
?[??͎?ʲ4
ο?h??
%*?n??`?h?Isp??I.??<d??Ȭ??:?p?U?)Q?*g?????????????t?>????????a??????????????O{j??????:??N??g?????g??|3?~?????????9yys?o?3??Ԩ?.o?ˋ^???2?K???^C????????J?0?
r??;??7??????Z~
Ӵ?ߎ ??!LnyÇ?9??|????2zϗ???|?a?????k?+?r?o??i???}????7?>????Ϲk??>Z??!??L?y?o?????R
                                                                                            ??)??-b???'???W5???7?z~??(w@N?M?K?m??7??,,?40lgV
=?$?\????i{??IO[?z=\?????x9i?tXLJ{?M?K????'G_a??֝?????[
?+Z׮???a??-??^>(?g????x????b???50\ؼ.f??`bp???,?f???>???{^,??e?'?o???Ƚ??|????Y?Sa??|=-?~dkz?*???=???|'???/?垨I婷a?k?????utu??'B??UN??׍???k#܏<??,YӋ???K?/6i]~N?=5;? e?Nt??7$lV?b@{}?{????߄^n??S#?5?>Y??J[????٫ɜ|K???{?n?^oB??<?H?z?=??????>???b??9??>???????܉??;Y??4?kկ??Yw9,k?UK-?~N?~??
?u?at]Œb?Y]?:y?g?8]O,???f?                                                              ?}rA?H?~q<xK/??-??-/:߼?tw??e_<_??1?@?{
??#?E0??a٫!e?Û??W?w?#??#?????W??K?9?t???]6??[?Yӻ??^:??)??}/??9?#*??u?O????Q??]k??/?p?&?vdj?z?8??.??S}$?C舙???
                                                                                                          p;\??c??'F??KgQ?;#?7??Ǔ?q?/?<??Jo{???Χ˵hi}?)H,7k?0??Pd???`p?F*??
                                                                                                                                                                       žGz???????ok`??C?>pO??B-???I???+??b?!?????????҇{?????@???5?$?Y<4]<C??p~?~?tK??f??????+>??t?.?zA?y~?G@????cY?ww?bY???ό?]o2??<|xz??]????k?v9u?\??;??8?A??`섭????G??,6?|of\???]?g????????Ŋ??Q?z?#~??$8I|??ȁ?B????>K߳????KgV~e????|v??מ\NoJj?h?ҏ<G?e;3s@?%???c?ȇ??q???1?????=??]ni}|e?g@???`?lٛ3??h ??H"?p?b?;??e?o?0\M?G????7Н` >
                                               V????
                                                    ???ݾ?7Q.???f?4?|??TY?X
dB\?q?'I1V?~5?????s??Ǒ-]??_?????a??fcHJ:qy????Cwm?y???H??????>?
]??CWϪ?ރ?Fu?b<ŬG??w0??>??G????z?'??
?u?<?WY]Al_Q?z???\n??Tr?̃??Kq??%s3ݶ?Y??Qs?V֌???"G^??Ȃ??ӹ
=B?                                                    ?u5????R?1?V\f??=֫{B?1??????\\W??
 ?n????@?CQD?
w?#?ݒ9?????j8??Q???`M?K?M!??P??[3@?!C?d???^+?A?<~\?r???Aޭ=?s??r??@  ??Ja???C??>#??{?
J+Ni???5K???B?b]OK?Z3Y^m?]?K?I

If I try so see how my raw data looks like, it's fine:

zcat s34_1.fq.gz | head -8
@ST-E00494:182:HCT5FALXX:6:1101:5213:1309 1:N:0:NAAGGAGC
NGGCTCCTTAGTGGTACTTGCGGGCCAGGGCGTGGGCGACCACACGCACCAGCTTCTGCCAGGCAACCTGGCAGTCGGGAGTGAAGTCCTTGCCGAAGTGGGCGGCCAGGACAACGACCAGGATGTCACTGAGGAGCCTGAAGTTCTCGG
+
#AAAFAJFJFJJAJFFJFFJJJFJFJFFFJJJJJJJFF<JJJJJFJFAJJJJJJJJFJJJJJJJ7<JFJFJJFF-7FAJFJFJJFFJJJAJ<JJJF<7AJAAAA<JJF7FAFAFJJ<A7<JF-<FJJFJ---7)-A<<)A)AA--<7AAF
@ST-E00494:182:HCT5FALXX:6:1101:5233:1309 1:N:0:NAAGGAGC
NGGGTGGCCAGTGTCACCTGCAAGCACTGCGACAGGAACTTGAAGTTGACGGGGTCCACACGCCGGTTGTAGGCGTGCAGGTTGCTGAGCTCAGACAGAGCCTGGCTCAGGTTGTCCCGGTTCTTGATGGCATTGCCCAGGGCAGCCACC
+
#AAA<<<JFJJJJJAFJJJJJFFAF7FFFJF<<JJJAJAFF<<JFJJJ7JFA<JJJAFJF<AF-77<-AJ-7<JJAAJ7<AAAFJJA-7FAAAAFJFFFJJ<-A<J<JJJ-7-7AJJ-7AJJJAA777F--A-7-AAF)-<AA<-AF<AF

I wonder if it's something to be with the option '--al-con-gz'. Should I use '--al-con' instead? But my files are compressed so that's why I use this option and not '--al-cont'. So frustrating.

Any help is more than welcome :)

hisat2 rna-seq fastq • 323 views
ADD COMMENTlink modified 4 months ago • written 4 months ago by luzglongoria20

Maybe they are still zipped. What do you see if you check this?

zcat aligned_read_hisat2.1.fastq | head -8
ADD REPLYlink written 4 months ago by b.nota6.1k

Your files aligned_read_hisat2.1.fastq and aligned_read_hisat2.2.fastq look like either indexed or compressed files. Did you try to

zcat aligned_read_hisat2.1.fastq | head -10
ADD REPLYlink written 4 months ago by Bastien Hervé3.3k

damn, too slow ! :)

ADD REPLYlink written 4 months ago by Bastien Hervé3.3k

Haha, do you want me to remove my answer?

ADD REPLYlink written 4 months ago by b.nota6.1k

Sure not ! You answered while I was writing, I didn't see your answer

ADD REPLYlink written 4 months ago by Bastien Hervé3.3k

output of

file *.fastq

?

ADD REPLYlink written 4 months ago by Pierre Lindenbaum116k

Hi there, I'm working with bird samples that are infected with parasites (malaria parasites) and I need to do the novo assembling but before that I have to filter out my reads (distinguish between parasite and host RNA).

If you have reference genome of at least one of the species then you can bin those reads away from others using bbsplit.sh from BBMap suite.

ADD REPLYlink modified 4 months ago • written 4 months ago by genomax62k

If I do that I get:

@ST-E00494:182:HCT5FALXX:5:1101:5233:1309 1:N:0:NCAGATTC    NTTCACCTTCTGCAGAAAGTGATTTATCAAGTAGACAAAGTGACAATTATGAATATTGCGAAAGTCCCTCAACAAGTGGAAGAAGTTCACCTACTGTCAGAAGTGAGTTATCAAGTAGGTGCAGTATCCCTGGTGTTGATATAGGAGGAC
+
#-AAFF<F<A-<JJJJ7J<AJ-FJJFAJFFFFJAJJJJJFJFFJJFJJ-JJ-7FFJJJJJAF-A<7<F<J<JJJF<7FJFAJFFJA-FJJJJFFFFFAA--77-AAF<FAJF-AAF-FJFFJF<FJFF<-AFFJJA-<-777--7A<)7)
@ST-E00494:182:HCT5FALXX:5:1101:5781:1309 1:N:0:NCAGATTC
NTTTTATTTGCTATATTGGTTGCAGTTTTAATTATTTGTTGATATCCTTCTTTTAATATAGGTTCATTAATAGAAATTTCTTTATTTGTTGATTCATCATAAATAAAACAAGACAAAAAAGCACATATATAATCATTATTATATTTACTT
+
#<AAAJJJJFAF7FAJJ<FAFA<-FFFJJ--FJJJJJJJJA-FJFF<FFAJ-FFF7FFJJAF-<F<FJAJJF-AJJJJJJJJJAFJJFJJA<FJ<7F-AA77---7F<<A7FFAF7<<-FFFF-FJ7<77--FF7A-7<<-7-7AF--7-

now the problem is how to change from .fastq to .gzbecause if I do :

gunzip aligned_read_hisat2.1.fastq
gzip: aligned_read_hisat2.1.fastq: unknown suffix -- ignored

Can I do it just with the command 'cp'???

ADD REPLYlink written 4 months ago by luzglongoria20
1

You could do mv aligned_read_hisat2.1.fastq aligned_read_hisat2.1.fastq.gz.

ADD REPLYlink written 4 months ago by genomax62k

What about

cp aligned_read_hisat2.1.fastq aligned_read_hisat2.1.fastq.gz

?

ADD REPLYlink written 4 months ago by luzglongoria20
1

You could do that. It will duplicate the data and you will have two files with different name but identical content.

ADD REPLYlink modified 4 months ago • written 4 months ago by genomax62k
0
gravatar for luzglongoria
4 months ago by
luzglongoria20
luzglongoria20 wrote:

Yes! it works!! thank you so much guys! I wonder if also could run again the program like:

hisat2 --dta -x /home/luz_garcia_longoria/workspace/reference_genomes/hisat2/parasitereference.fasta --al-conc aligned_read_hisat2.fastq -1 /home/luz_garcia_longoria/workspace/s21_1.fq.gz,s22_1.fq.gz,s23_1.fq.gz,s24_1.fq.gz,s25_1.fq.gz,s31_1.fq.gz,s32_1.fq.gz,s33_1.fq.gz,s34_1.fq.gz,s35_1.fq.gz -2 /home/luz_garcia_longoria/workspace/s21_2.fq.gz,s22_2.fq,s23_2.fq.gz,s24_2.fq.gz,s25_2.fq.gz,s31_2.fq.gz,s32_2.fq.gz,s33_2.fq.gz,s34_2.fq.gz,s35_2.fq.gz

So ask to have an output in fastq file. Like this the output would be an fastq file directly, isn't it?

ADD COMMENTlink written 4 months ago by luzglongoria20

Your reads are already in fastq format they are just compressed (--al-conc-gz). Trinity should be fine with using that file.

Edit: Your confusion may be due to expectation that since you provided .fastq extension the file would contain uncompressed fastq reads. While some programs (e.g. bbmap) may honor those extensions (or use them to adjust input/output) they are not required for unix. So the command line option --al-conc-gz kept your files gzip-compressed. If you had used --al-conc-bz2 the files would have been compressed using bzip2.

ADD REPLYlink modified 4 months ago • written 4 months ago by genomax62k

Ok. But now how can I change the format? I know I can compress the file by using 'zip' but my file is already compress. So I only need to change the name of the extension. Is it fine if I just use the command 'cp' (copy)? and change the name of the file to aligned_read_hisat2.fq.gz? and the uncompress the file by using 'gunzip'?

ADD REPLYlink modified 4 months ago • written 4 months ago by luzglongoria20
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2260 users visited in the last hour