Question: DEXSeq missing mate encountered
0
gravatar for Anil Kesarwani
4.1 years ago by
United States
Anil Kesarwani80 wrote:

Hi,

I have BAM files for pairs-end reads aligned using TopHat2. The BAM files were sorted based on read name using samtools –n. But when I performed read count analysis, I found almost all reads to have missing mates. Below is the error message:
 
/.local/lib/python2.7/site-packages/HTSeq-0.6.1-py2.7-linux-x86_64.egg/HTSeq/__init__.py:622: UserWarning: 197873794 reads with missing mate encountered.

Here are the top few lines from BAM file. The read names from mates are differing by .1 and .2 extension. Is this an issue? I will highly appreciate if you could help me out in this regard.

######
@HD    VN:1.0    SO:queryname
@SQ    SN:chr1    LN:249250621
@SQ    SN:chr10    LN:135534747
@SQ    SN:chr11    LN:135006516
@SQ    SN:chr12    LN:133851895
@SQ    SN:chr13    LN:115169878
@SQ    SN:chr14    LN:107349540
@SQ    SN:chr15    LN:102531392
@SQ    SN:chr16    LN:90354753
@SQ    SN:chr17    LN:81195210
@SQ    SN:chr18    LN:78077248
@SQ    SN:chr19    LN:59128983
@SQ    SN:chr2    LN:243199373
@SQ    SN:chr20    LN:63025520
@SQ    SN:chr21    LN:48129895
@SQ    SN:chr22    LN:51304566
@SQ    SN:chr3    LN:198022430
@SQ    SN:chr4    LN:191154276
@SQ    SN:chr5    LN:180915260
@SQ    SN:chr6    LN:171115067
@SQ    SN:chr7    LN:159138663
@SQ    SN:chr8    LN:146364022
@SQ    SN:chr9    LN:141213431
@SQ    SN:chrM    LN:16571
@SQ    SN:chrX    LN:155270560
@SQ    SN:chrY    LN:59373566
@PG    ID:TopHat    VN:2.0.10
.
.
.    
SRR1660308.1.1    81    chr8    145140609    50    38M138N2M    =    145140331    -456    GTGCCCACTNCTGCGCTTGCCAAGCCTCATGGGCCCCGGA    I
IGGHB;<0#FJIIGJJJIJIJJJJJJHHHHHFFFFFCCC    AS:i:0    XN:i:0    XM:i:1    XO:i:0    XG:i:0    NM:i:1    MD:Z:9G30    YT:Z:UU    XS:A:+    NH:i:1
SRR1660308.1.2    161    chr8    145140331    50    39M    =    145140609    456    CACTGCTGGCGATTTATGCAGCTGGCCTGGCCCTGCCCC    22AFHFFHJIJGIIBEF
BGIJJIEGIJ@HIHIJJG?FIE    AS:i:0    XN:i:0    XM:i:0    XO:i:0    XG:i:0    NM:i:0    MD:Z:39    YT:Z:UU    XS:A:+    NH:i:1
SRR1660308.3.1    97    chr17    40925519    50    28M204N12M    =    40926678    1198    GGCAGTATCGCTTCCCACCCTTCTTTACGTNACAACCGAA    @
CCFFFBDDFHHHIIIIIIIIIIIIIIIII#1:CFHGFIG    AS:i:0    XN:i:0    XM:i:1    XO:i:0    XG:i:0    NM:i:1    MD:Z:30T9    YT:Z:UU    XS:A:+    NH:i:1
SRR1660308.3.2    145    chr17    40926678    50    39M    =    40925519    -1198    GAGTCGATCCAGATTGTATTAGAGGAACTGAGGAAGAAA    GGFAC94IGCJIGIGII
IJHEGGHGE<C4IGFDHDDA22    AS:i:0    XN:i:0    XM:i:0    XO:i:0    XG:i:0    NM:i:0    MD:Z:39    YT:Z:UU    XS:A:+    NH:i:1
SRR1660308.4.1    83    chr5    154194381    50    40M    =    154194287    -134    TTTATTTTTNTCTGACCTTTCCATCCTTGAAAAAACGGGG    JJJJJHG?1
#GJIGGJJIJIIJJJIIJHHHHHFFDFFCCC    AS:i:-6    XN:i:0    XM:i:2    XO:i:0    XG:i:0    NM:i:2    MD:Z:9C25T4    YT:Z:UU    XS:A:+    NH:i:1
SRR1660308.4.2    163    chr5    154194287    50    39M    =    154194381    134    CAGCCTCTCCCCACTCCCCTTGCCCCACACCCTTTACTC    22AFHHHHJJJJJJJ3C
GHJIIHIJIIJJJGHHIGGJII    AS:i:0    XN:i:0    XM:i:0    XO:i:0    XG:i:0    NM:i:0    MD:Z:39    YT:Z:UU    XS:A:+    NH:i:1
SRR1660308.5.1    99    chrM    8899    3    30M    =    9023    154    CCTAGCCCACTTCTTACCACAAGGCACAAC    =;:B7A++::;4D4+<CF@?D3AF=+<B)?    AS:i:-2    X
N:i:0    XM:i:1    XO:i:0    XG:i:0    NM:i:1    MD:Z:28C1    YT:Z:UU    NH:i:2    XS:A:-    HI:i:1

 

######


Best regards,
Anil

 

rna-seq tools • 1.6k views
ADD COMMENTlink modified 4.1 years ago by Devon Ryan90k • written 4.1 years ago by Anil Kesarwani80
0
gravatar for Devon Ryan
4.1 years ago by
Devon Ryan90k
Freiburg, Germany
Devon Ryan90k wrote:

The difference in read names are almost certainly breaking things. You could likely fix this with awk, but next time use the -F option with fastq-dump. That'll prevent this sort of thing from happening any give you more meaningful read IDs in addition.

ADD COMMENTlink written 4.1 years ago by Devon Ryan90k

Thanks a lot for you suggestion. Can awk function work with bam (binary)

ADD REPLYlink written 4.1 years ago by Anil Kesarwani80

No, but I suspect you can pipe to the dexseq counting script.

ADD REPLYlink written 4.1 years ago by Devon Ryan90k

The easy way could be to convert BAM to SAM and then perform awk function

ADD REPLYlink written 4.1 years ago by Anil Kesarwani80

Nice catch, what should the header look like? Guess I ran in to the same problem.

ADD REPLYlink written 2.2 years ago by badribio240
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1573 users visited in the last hour