I am trying to take the PE reads (100 bp) aligned to chr2 of one species and then convert it to PE fastq again. Then I want to align those reads to chr2 of another species to see the alignment percentage and mismatch ratios in different regions.
this is what I did:
# extract alignment at chr2. The chromosome has no prefixes so only 2
    samtools view -h -b sorted_2ms04h.bam 2 > chr2.2ms04h.bam  
# then bamtofastq
    bamToFastq -i chr2.sorted.2ms04h.bam -fq chr2.2ms04h.R1 -fq2 chr2.2ms04h.R2
# I am getting lots of warning message I got only about 20 mb of fastq (R1 and R2 each) from almost 500 mb BAM file
    *****WARNING: Query HWI-D00123R2:155:C4AM3ACXX:2:1101:7097:5645 is marked as paired, but its mate does not occur next to it in your BAM file.  Skipping. 
    *****WARNING: Query HWI-D00123R2:155:C4AM3ACXX:2:1101:7103:24283 is marked as paired, but its mate does not occur next to it in your BAM file.  Skipping. 
    *****WARNING: Query HWI-D00123R2:155:C4AM3ACXX:2:1101:7109:36458 is marked as paired, but its mate does not occur next to it in your BAM file.  Skipping. 
    *****WARNING: Query HWI-D00123R2:155:C4AM3ACXX:2:1101:7111:45560 is marked as paired, but its mate does not occur next to it in your BAM file.  Skipping. 
# I also tired Picard
    java -jar picard2.9.jar SamToFastq I=chr2.sorted.2ms04h.bam FASTQ=R1.fq SECOND_END_FASTQ = R2.fq UNPAIRED_FASTQ = unPaired.fq
   # Error message
   [Mon Jun 12 08:04:10 EDT 2017] picard.sam.SamToFastq INPUT=chr2.2ms04h.bam FASTQ=R1.fq SECOND_END_FASTQ=R2.fq UNPAIRED_FASTQ=unPaired.fq    OUTPUT_PER_RG=false RG_TAG=PU RE_REVERSE=true INTERLEAVE=false INCLUDE_NON_PF_READS=false CLIPPING_MIN_LENGTH=0 READ1_TRIM=0 READ2_TRIM=0 INCLUDE_NON_PRIMARY_ALIGNMENTS=false VERBOSITY=INFO QUIET=false VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false GA4GH_CLIENT_SECRETS=client_secrets.json
   [Mon Jun 12 08:04:10 EDT 2017] Executing as everestial007@everestial007-Inspiron-3647 on Linux 4.4.0-57-generic amd64; OpenJDK 64-Bit Server VM 1.8.0_131-8u131-b11-0ubuntu1.16.04.2-b11; Picard version: 2.9.2-SNAPSHOT
   [Mon Jun 12 08:04:11 EDT 2017] picard.sam.SamToFastq done. Elapsed time: 0.01 minutes.
   Runtime.totalMemory()=189267968
   To get help, see http://broadinstitute.github.io/picard/index.html#GettingHelp
   Exception in thread "main" picard.PicardException: Illegal mate state: HWI-D00123R2:155:C4AM3ACXX:2:1101:20700:91575
at picard.sam.SamToFastq.assertPairedMates(SamToFastq.java:385)
at picard.sam.SamToFastq.doWork(SamToFastq.java:194)
at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:205)
at picard.cmdline.PicardCommandLine.instanceMain(PicardCommandLine.java:94)
at picard.cmdline.PicardCommandLine.main(PicardCommandLine.java:104)
But,  samtools flagstat chr2.sorted.2ms04h.bam > chr2.stats 
   # shows there are good amount of paired end/mate paired reads.
       7549722 + 0 in total (QC-passed reads + QC-failed reads)
       1421823 + 0 duplicates
       7549722 + 0 mapped (100.00%:-nan%)
       7549722 + 0 paired in sequencing
       3771662 + 0 read1
       3778060 + 0 read2
       7548507 + 0 properly paired (99.98%:-nan%)
       7548963 + 0 with itself and mate mapped
       759 + 0 singletons (0.01%:-nan%)
       0 + 0 with mate mapped to a different chr
       0 + 0 with mate mapped to a different chr (mapQ>=5)
So, what is wrong with my process?
FYI, there is a
fastqcommand in new version ofsamtools. Check hereyour bam is corrupted: some mates are missing from the BAM.
But, 99.98 % are properly paired. Shouldn't it work ?
You'll need 100%, since the difference aren't marked as singletons. Run
samtools fixmate, that might resolve the issue.I am having issues even after fixmate.
What is really going on? grr