I assembled an exome using the following command:
bwa mem -t 12 -B 4 -O 6 -E 1 -M -R '@RG\tID:SRR1517898\tSM:HG00096\tPL:ILLUMINA'
/home/ims.santos06/reference/hg38.fa /home/ims.santos06/fastq/SRR1517898_1.fastq.gz /home/ims.santos06/fastq/SRR1517898_2.fastq.gz
| samtools view -1 - > /home/ims.santos06/bam/SRR1517`898.bam
I received this message from bwa:
(base) ims.santos06@nodesgi4:~$ bwa mem -t 12 -B 4 -O 6 -E 1 -M -R '@RG\tID:SRR1517898\tSM:HG00096\tPL:ILLUMINA' /home/ims.santos06/reference/hg38.fa /home/ims.santos06/fastq/SRR1517898_1.fastq.gz /home/ims.santos06/fastq/SRR1517898_2.fastq.gz | samtools view -1 - > /home/ims.santos06/bam/SRR1517898.bam
[M::bwa_idx_load_from_disk] read 0 ALT contigs
[W::bseq_read] the 1st file has fewer sequences.
[W::bseq_read] the 1st file has fewer sequences.
[main] Version: 0.7.17-r1188
[main] CMD: bwa mem -t 12 -B 4 -O 6 -E 1 -M -R @RG\tID:SRR1517898\tSM:HG00096\tPL:ILLUMINA /home/ims.santos06/reference/hg38.fa /home/ims.santos06/fastq/SRR1517898_1.fastq.gz /home/ims.santos06/fastq/SRR1517898_2.fastq.gz
[main] Real time: 10.202 sec; CPU: 10.160 sec
I followed the necessary steps to view in igv:
samtools sort /home/ims.santos06/bam/SRR1517898.bam > /home/ims.santos06/bam/SRR1517898.sorted.bam
I did it using two different commands and still couldn't see in igv:
samtools index /home/ims.santos06/bam/SRR1517898.sorted.bam /home/ims.santos06/bam/SRR1517898.bam.bai
An error message appears with the file compression. Since this file has already been downloaded compressed from browser 1000 genome
I do not know how to solve this problem. I don't know if something is wrong with my assembly or with the formats of the generated files. Can someone help me ?
there is a problem with your fastq files: no the same number of reads between R1 and R2
What is the output of:
I figured that out, but check the fastq no fastqc program the r1 and r2 are with the same sequence number
Something went wrong during alignment. Probably shortage in memory so that the file parsing got corrupted. Your input files are probably ok. How much RAM is available? Also, did you manipulate the files after checking with
fastqc
like adapter trimming in non-paired mode?I used fastqc just to evaluate the quality of bases. I believe that the problem happened in the assembly, I could not find the library of my fastq because I took the browser 1000 genomes and only later realized that this could affect the assembly. how do i find out my fastq library?
The memory of my machine is 19.6. Is there a chance the problem is with my reading group? I did not add this information: LB = DNA preparation library identifier
Please use
ADD COMMENT/ADD REPLY
when responding to existing posts to keep threads logically organized.SUBMIT ANSWER
is for new answers to original question.19.6 GB? An odd number? Anyway, try to align with like 4 threads, the file is not big, should not take long, then repeat bam generation and indexing. No, IGV does not need read groups, neither does bwa.
By memory manubiomed20 likely means hard disk space. ATpoint meant RAM, typically something like 16 or 32 GB.