Question: hisat2 warning message dunring alignment of fasta file with reference genome
0
gravatar for shuksi1984
9 months ago by
shuksi198450
shuksi198450 wrote:

I ran the following command:

path/to/hisat2 -f -x /path/to/grch38/genome -1 path/to/SRR925687_1.fa -2 path/to/SRR925687_2.fa -S path/to/SRR925687.sam

I got following HISAT2 process statistics:

Warning: Same mate file "//path to/SRR925687_1.fa" appears as argument to both -1 and -2 31525247 reads; of these: 31525247 (100.00%) were paired; of these:

31445543 (99.75%) aligned concordantly 0 times
2010 (0.01%) aligned concordantly exactly 1 time
77694 (0.25%) aligned concordantly >1 times
----
31445543 pairs aligned concordantly 0 times; of these:
  20260019 (64.43%) aligned discordantly 1 time
----
11185524 pairs aligned 0 times concordantly or discordantly; of these:
  22371048 mates make up the pairs; of these:
    7707288 (34.45%) aligned 0 times
    5218508 (23.33%) aligned exactly 1 time
    9445252 (42.22%) aligned >1 times
  

The output is 25G SRR925687.sam file. Everything seems to be fine, except the warning "**Warning: Same mate file "//path to/SRR925687_1.fa" appears as argument to both -1 and -2 31525247 reads;"

Kindly, explain why hisat2 is throwing such warning message.

ADD COMMENTlink modified 9 months ago by singh.vijender30 • written 9 months ago by shuksi198450
4

Are you sure your command wasn't the following one?

path/to/hisat2 -f -x /path/to/grch38/genome -1 path/to/SRR925687_1.fa -2 path/to/SRR925687_1.fa -S path/to/SRR925687.sam

That would make more sense given the warning message.

ADD REPLYlink written 9 months ago by Carlo Yague4.5k

Yes, I am sure. My command is:

path/to/hisat2 -f -x /path/to/grch38/genome -1 path/to/SRR925687_1.fa -2 path/to/SRR925687_2.fa -S path/to/SRR925687.sam

Two separate fasta files for "-1 and -2"

ADD REPLYlink written 9 months ago by shuksi198450
1

That's weird... it might be bug then, although I don't known for sure. The code responsible for that warning in hisat2:

// Check for duplicate mate input files
    if(format != CMDLINE) {
        for(size_t i = 0; i < mates1.size(); i++) {
            for(size_t j = 0; j < mates2.size(); j++) {
                if(mates1[i] == mates2[j] && !gQuiet) {
                    cerr << "Warning: Same mate file \"" << mates1[i].c_str() << "\" appears as argument to both -1 and -2" << endl;
                }
            }
        }
    }

So it should only trigger when (mates1[i] == mates2[j]), i.e, when both -1 and -2 files share exactly the same name.

ADD REPLYlink written 9 months ago by Carlo Yague4.5k

Thank you for your response. Will check

ADD REPLYlink written 9 months ago by shuksi198450
1

Is there a specific reason why you're using fasta files instead of fastq files?

ADD REPLYlink written 9 months ago by Sej Modha4.2k
2

Can you post the exact command you used? (Do not doctor it with /path/to/... etc.)

Also show us the first few lines of each of your fastas.

ADD REPLYlink written 9 months ago by jrj.healey13k

The command is:

/tools/hisat/hisat2 -f -x /references/grch38/genome -1 /inputfile/SRR925687_1.fa -2 /inputfile/SRR925687_2.fa -S /rnaseq/SRR925687.sam

First few lines of SRR925687_1.fa:

>SRR925687.1 HWUSI-EAS053R_0010:2:1:1100:13816/1
GTGAGATCTTGTCTTAGNAACAAACAAANNACGANTAAAAAAAAAANANNNAAGGCCGGGCCTGGNNNNNNNNNNN
>SRR925687.2 HWUSI-EAS053R_0010:2:1:1101:5022/1
GCAGAAGTGACACAGCCATCCTTGGGTGTAGGCTNTGAGCTGGGCCNGNNNGTGGCCTTTAACAANNNNNNNNNNN
>SRR925687.3 HWUSI-EAS053R_0010:2:1:1101:9481/1
GATCGGAAGAGCGGTTCNGCAGGAATGCCGCGACNGACCTCGTCTCNGNNNTTCTGCTTGAACAANNNNNNNNNNN
>SRR925687.4 HWUSI-EAS053R_0010:2:1:1101:11425/1
GGCCAACAGCTCACCTCNAAAACTTCCCCACTGANAATAATGGCATNGNNNGGAAACTCGGGTCCNNNNNNNNNNN
>SRR925687.5 HWUSI-EAS053R_0010:2:1:1102:6924/1
CTCATCATCTTCAGCTGCCCGCTTGCCCGTAGCTNACTCAGCTTCCNCNNNTTCATCTCCATCCCNNNNNNNNNNN

First few lines of SRR925687_2.fa:

>SRR925687.1 HWUSI-EAS053R_0010:2:1:1100:13816/2
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
>SRR925687.2 HWUSI-EAS053R_0010:2:1:1101:5022/2
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
>SRR925687.3 HWUSI-EAS053R_0010:2:1:1101:9481/2
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
>SRR925687.4 HWUSI-EAS053R_0010:2:1:1101:11425/2
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
>SRR925687.5 HWUSI-EAS053R_0010:2:1:1102:6924/2
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
ADD REPLYlink modified 9 months ago by jrj.healey13k • written 9 months ago by shuksi198450

I've edited your markup to remove the uncessesary quotes, but it appears how you've copied them leaves the headers and sequence on a single line. Is this correct or have you made an error in how you've copied the data here?

ADD REPLYlink written 9 months ago by jrj.healey13k

Header and sequence are in two separate lines.

ADD REPLYlink written 9 months ago by shuksi198450

Perhaps the files are named differently but they contain the same reads inside and there was a mixup before alignment?

ADD REPLYlink written 9 months ago by Macspider2.8k

That's my thinking. If hisat inspects the files at all, it could be that R1 got duplicated and renamed to R2, so the R1 file and R2 file are the same but named differently.

ADD REPLYlink written 9 months ago by jrj.healey13k
0
gravatar for singh.vijender
9 months ago by
singh.vijender30 wrote:

The quickest test I would do is to check the insert size distribution of 1st million reads and if the size is zero then both files R1 and R2 have same data otherwise if the insert size distribution matches what is expected from library prep, I would consider it a bug in hisat2 and proceed further in my analysis.

ADD COMMENTlink written 9 months ago by singh.vijender30
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1688 users visited in the last hour