1

I asked this question a while back and it got bumped to chat because a lot of subjective opinions.

Original message here for reference: https://chat.stackexchange.com/rooms/139176/discussion-on-question-by-sabre-dns-forwarding-issue

And I found a seemingly similar issue, unanswered as well. Conditional Forwarding intermittent failures

So I figured I would try to consolidate it to basic information and try again. With log files for demonstration.

The core question is, without reporting errors, why would a DNS forwarder selectively fail for one host at random and then resume normal operation later? The details of how are as follows...


Edit: I can add to this, the issue happened again this AM (Day after post). The logs show when the incident occurred, one query happened correctly, then less than a second later, asked its WAN forwarder vs its cache or LAN. That cached the external IP, and failed everyone from that point forward until we deleted cache. First query after that followed the forwarder and cached the correct IP. Further making this mysterious, if cached, it should have not asked for a new IP anyway?


I have two DNS servers, two domains, both on LAN, both DC and DNS for their respective AD domains.

One domain is .local so cannot be queried from public DNS, the other is .ORG. The .ORG is split between hosts both on LAN and hosts on internet. We are only concerned in this scenario with intercepting hosts on LAN and let public DNS deal with the rest. So LAN hosts are handled by local server, anything else (Not a LAN host) goes out the next forwarder which is openDNS (And ultimately our SOA is Godaddy). I have learned this is what is referred to as SplitBrains DNS, and apparently a normal thing for hybrid DNS scenario just like mine.

So if you ask the DNS server for A.local where is one of the hosts on B.org, it should and almost always does ask B.org where that is (And never leave LAN unless it does not find a matching hostname there.)

I included a picture of the host that is failing to forward so we do not go down the "there is no such things as DNS forwarding" path again.

DNS config

What is happening, is that randomly a host on the .org domain does not resolve, because .local DNS server does not ask the DNS server at the .org domain, meaning it never tries the conditional forwarder, I have now confirmed this with a simultaneous packet capture on both hosts, the path goes A.local=>openDNS not A.local=Forward=>B.org.

When it fails the .local does not even try to send to the .org, and the .org confirmed never receives any request.

If you query the .org directly not through the forwarder (NSLOOKUP), it works fine, host is there, and I can see its DNS record. As well the forwarder works fine during this time for other hosts on the .org domain. And the particular host that has these failures is not consistently the same.

This happens off and on, very infrequent, and random, with no change in configuration, and resumes normal operation later, again with no change in configuration.

Log files attached (From DNS logging on .local DNS server where failure is occurring), of the correct chain and the incorrect. The IP 10.1.1.250 is the DNS server for the .org, 10.1.0.16 is the IP of the client requesting the host resolution.

LogCorrect

  • Request from client
  • Request to .org DNS server (Forward/10.1.1.250)
  • Response from .org DNS server to .local DNS server
  • Response to client with information obtained via forward.

LogFailed

  • Request from client
  • Response from .local DNS server, directing it to external DNS (Forwarder never asked)

Hopefully those details will keep it in the question realm, not a chat :-)

Thank you.

LogCorrect:

10/12/2022 9:48:12 AM 0758 PACKET  000000883A640200 UDP Rcv 10.1.0.16       001e   Q [0001   D   NOERROR] A      (3)myhost(4)mydomain(3)org(0)
UDP question info at 000000883A640200
  Socket = 592
  Remote addr 10.1.0.16, port 57756
  Time Query=2147027, Queued=0, Expire=0
  Buf length = 0x0fa0 (4000)
  Msg length = 0x001e (30)
  Message:
    XID       0x001e
    Flags     0x0100
      QR        0 (QUESTION)
      OPCODE    0 (QUERY)
      AA        0
      TC        0
      RD        1
      RA        0
      Z         0
      CD        0
      AD        0
      RCODE     0 (NOERROR)
    QCOUNT    1
    ACOUNT    0
    NSCOUNT   0
    ARCOUNT   0
    QUESTION SECTION:
    Offset = 0x000c, RR count = 0
    Name      "(3)myhost(4)mydomain(3)org(0)"
      QTYPE   A (1)
      QCLASS  1
    ANSWER SECTION:
      empty
    AUTHORITY SECTION:
      empty
    ADDITIONAL SECTION:
      empty

10/12/2022 9:48:12 AM 0758 PACKET  000000883A4581A0 UDP Snd 10.1.1.250      d94e   Q [0001   D   NOERROR] A      (3)myhost(4)mydomain(3)org(0)
UDP question info at 000000883A4581A0
  Socket = 10476
  Remote addr 10.1.1.250, port 53
  Time Query=0, Queued=0, Expire=0
  Buf length = 0x0fa0 (4000)
  Msg length = 0x0029 (41)
  Message:
    XID       0xd94e
    Flags     0x0100
      QR        0 (QUESTION)
      OPCODE    0 (QUERY)
      AA        0
      TC        0
      RD        1
      RA        0
      Z         0
      CD        0
      AD        0
      RCODE     0 (NOERROR)
    QCOUNT    1
    ACOUNT    0
    NSCOUNT   0
    ARCOUNT   1
    QUESTION SECTION:
    Offset = 0x000c, RR count = 0
    Name      "(3)myhost(4)mydomain(3)org(0)"
      QTYPE   A (1)
      QCLASS  1
    ANSWER SECTION:
      empty
    AUTHORITY SECTION:
      empty
    ADDITIONAL SECTION:
    Offset = 0x001e, RR count = 0
    Name      "(0)"
      TYPE   OPT  (41)
      CLASS  4000
      TTL    32768
      DLEN   0
      DATA   
        Buffer Size  = 4000
        Rcode Ext    = 0
        Rcode Full   = 0
        Version      = 0
        Flags        = 80 DO

10/12/2022 9:48:12 AM 0758 PACKET  000000883E98E210 UDP Rcv 10.1.1.250      d94e R Q [8085 A DR  NOERROR] A      (3)myhost(4)mydomain(3)org(0)
UDP response info at 000000883E98E210
  Socket = 10476
  Remote addr 10.1.1.250, port 53
  Time Query=2147027, Queued=0, Expire=0
  Buf length = 0x0fa0 (4000)
  Msg length = 0x0039 (57)
  Message:
    XID       0xd94e
    Flags     0x8580
      QR        1 (RESPONSE)
      OPCODE    0 (QUERY)
      AA        1
      TC        0
      RD        1
      RA        1
      Z         0
      CD        0
      AD        0
      RCODE     0 (NOERROR)
    QCOUNT    1
    ACOUNT    1
    NSCOUNT   0
    ARCOUNT   1
    QUESTION SECTION:
    Offset = 0x000c, RR count = 0
    Name      "(3)myhost(4)mydomain(3)org(0)"
      QTYPE   A (1)
      QCLASS  1
    ANSWER SECTION:
    Offset = 0x001e, RR count = 0
    Name      "[C00C](3)myhost(4)mydomain(3)org(0)"
      TYPE   A  (1)
      CLASS  1
      TTL    1200
      DLEN   4
      DATA   10.1.1.218
    AUTHORITY SECTION:
      empty
    ADDITIONAL SECTION:
    Offset = 0x002e, RR count = 0
    Name      "(0)"
      TYPE   OPT  (41)
      CLASS  4000
      TTL    32768
      DLEN   0
      DATA   
        Buffer Size  = 4000
        Rcode Ext    = 0
        Rcode Full   = 0
        Version      = 0
        Flags        = 80 DO

10/12/2022 9:48:12 AM 0758 PACKET  000000883A640200 UDP Snd 10.1.0.16       001e R Q [8081   DR  NOERROR] A      (3)myhost(4)mydomain(3)org(0)
UDP response info at 000000883A640200
  Socket = 592
  Remote addr 10.1.0.16, port 57756
  Time Query=2147027, Queued=2147027, Expire=2147032
  Buf length = 0x0200 (512)
  Msg length = 0x002e (46)
  Message:
    XID       0x001e
    Flags     0x8180
      QR        1 (RESPONSE)
      OPCODE    0 (QUERY)
      AA        0
      TC        0
      RD        1
      RA        1
      Z         0
      CD        0
      AD        0
      RCODE     0 (NOERROR)
    QCOUNT    1
    ACOUNT    1
    NSCOUNT   0
    ARCOUNT   0
    QUESTION SECTION:
    Offset = 0x000c, RR count = 0
    Name      "(3)myhost(4)mydomain(3)org(0)"
      QTYPE   A (1)
      QCLASS  1
    ANSWER SECTION:
    Offset = 0x001e, RR count = 0
    Name      "[C00C](3)myhost(4)mydomain(3)org(0)"
      TYPE   A  (1)
      CLASS  1
      TTL    1199
      DLEN   4
      DATA   10.1.1.218
    AUTHORITY SECTION:
      empty
    ADDITIONAL SECTION:
      empty

LogFailed

10/12/2022 9:39:38 AM 0748 PACKET  000000883EE821D0 UDP Rcv 10.1.0.16       3858   Q [0001   D   NOERROR] A      (3)myhost(4)mydomain(3)ORG(0)
UDP question info at 000000883EE821D0
  Socket = 592
  Remote addr 10.1.0.16, port 62365
  Time Query=2146514, Queued=0, Expire=0
  Buf length = 0x0fa0 (4000)
  Msg length = 0x001e (30)
  Message:
    XID       0x3858
    Flags     0x0100
      QR        0 (QUESTION)
      OPCODE    0 (QUERY)
      AA        0
      TC        0
      RD        1
      RA        0
      Z         0
      CD        0
      AD        0
      RCODE     0 (NOERROR)
    QCOUNT    1
    ACOUNT    0
    NSCOUNT   0
    ARCOUNT   0
    QUESTION SECTION:
    Offset = 0x000c, RR count = 0
    Name      "(3)myhost(4)mydomain(3)ORG(0)"
      QTYPE   A (1)
      QCLASS  1
    ANSWER SECTION:
      empty
    AUTHORITY SECTION:
      empty
    ADDITIONAL SECTION:
      empty

10/12/2022 9:39:38 AM 0748 PACKET  000000883EE821D0 UDP Snd 10.1.0.16       3858 R Q [8081   DR  NOERROR] A      (3)myhost(4)mydomain(3)ORG(0)
UDP response info at 000000883EE821D0
  Socket = 592
  Remote addr 10.1.0.16, port 62365
  Time Query=2146514, Queued=0, Expire=0
  Buf length = 0x0200 (512)
  Msg length = 0x0067 (103)
  Message:
    XID       0x3858
    Flags     0x8180
      QR        1 (RESPONSE)
      OPCODE    0 (QUERY)
      AA        0
      TC        0
      RD        1
      RA        1
      Z         0
      CD        0
      AD        0
      RCODE     0 (NOERROR)
    QCOUNT    1
    ACOUNT    0
    NSCOUNT   1
    ARCOUNT   0
    QUESTION SECTION:
    Offset = 0x000c, RR count = 0
    Name      "(3)myhost(4)mydomain(3)ORG(0)"
      QTYPE   A (1)
      QCLASS  1
    ANSWER SECTION:
      empty
    AUTHORITY SECTION:
    Offset = 0x001e, RR count = 0
    Name      "[C010](4)mydomain(3)ORG(0)"
      TYPE   SOA  (6)
      CLASS  1
      TTL    466
      DLEN   61
      DATA   
        PrimaryServer: (6)pdns07(13)domaincontrol(3)com(0)
        Administrator: (3)dns(5)jomax(3)net(0)
        SerialNo     = 2022083000
        Refresh      = 28800
        Retry        = 7200
        Expire       = 604800
        MinimumTTL   = 600
    ADDITIONAL SECTION:
      empty
7
  • 2
    "so we do not go down the "there is no such things as DNS forwarding"" There is still no DNS "forwarding". There are DNS servers that can be configured to forward queries for some zones to other nameservers. It is not a feature of the protocol (DNS), but of some servers. It is important to understand the distinction. Commented Oct 12, 2022 at 19:57
  • I understand that, but there IS DNS forwarding. in this context, clearly it is named and presented so. It is analogous to saying there is no PDF in your email because the SMTP protocol does not mention PDF, only binary data. There is an argument from the protocol level, but since that is specifically not the context of the question, the context is the implementation of DNS in a Microsoft DNS server... it is an argument on semantics. I get what you are saying, I just do not understand why and what it contributes.
    – Sabre
    Commented Oct 12, 2022 at 21:13
  • Do the local DNS servers use forwarders, and have they been configured to not use root servers, and the %systemroot%\system32\dns\cache.dns file has been deleted?
    – Greg Askew
    Commented Oct 13, 2022 at 17:04
  • Both use forwarders, and are configured not to use root hints if unavailable. They forward through OpenDNS. So DNS queries for any host on LAN goes to one of them depending on which domain you are on. IF you are on one and seeking host on the other domain, it goes through the conditional forwarder to the other domains DC/DNS, all other traffic goes through the open DNS forwarders. That file has not been removed, but it should be being ignored due to setting unless I misunderstand. They are just DC DNS servers serving internet needs as well.
    – Sabre
    Commented Oct 13, 2022 at 17:42
  • How ip assigned to all interfaces on A and B? Static or dhcp?
    – gapsf
    Commented Oct 17, 2022 at 16:18

1 Answer 1

1
+50

This appears to be a cached non-authoritative referral response for a record request that does not exist (did not exist at the time it was cached). That can be a normal response.

You would need to hunt through earlier logs to see what occurred during the previous outbound request from the conditional forwarder for myhost.mydomain.org. I suspect a working cached entry has become stale, been deleted from the cache, a request comes in, the conditional forwarder attempts to forward it, the authoritative server does not respond, the forwarder does a regular lookup through forwarder or root hints instead, and caches that.

It's cached, because you don't see any outgoing packet from the conditional forwarder. It's non-authoritative because the DNS server is operating as a conditional forwarder, so it does not have an authoritative zone file. It's a referral, as the response has no answer record only the authority records.

RFC 1034 describes this response as an example in which the hostname has been mistyped, however this is not the only scenario in which this can happen. E.g. if the record is deleted.

RFC 1034 DOMAIN NAMES - CONCEPTS AND FACILITIES

6.2.5. QNAME=SIR-NIC.ARPA, QTYPE=A

If a user mistyped a host name, we might see this type of query.

C.ISI.EDU would answer it with:

Header OPCODE=SQUERY, RESPONSE, AA, RCODE=NE
Question QNAME=SIR-NIC.ARPA., QCLASS=IN, QTYPE=A
Answer <empty>
Authority . SOA SRI-NIC.ARPA. HOSTMASTER.SRI-NIC.ARPA. 870611 1800 300 604800 86400
Additional <empty>

This response states that the name does not exist. This condition is signalled in the response code (RCODE) section of the header.

The SOA RR in the authority section is the optional negative caching information which allows the resolver using this response to assume that the name will not exist for the SOA MINIMUM (86400) seconds.

The referral response expects that the client will follow up with the authoritative server to get an authoritative answer. In this case, the result appears they are directed to the external rather than internal records.

In a general sense, if the record has not been mistyped, deleted, or otherwise affected then your configuration is correct and should have worked. This could be a defect, however I am not inclined to believe that is the case. Take a very close look at other traffic and logs, and I suspect you will find the DNS is working as it is supposed to even if it is not as you expect it to.

That said, while conditional forwarders are a valid solution they suffer some weaknesses. In particular it requires the two DNS servers to communicate in the moment that the (uncached) query is required. If the server is down, LAN is down, or other communications failures occur the DNS query fails. Note that DNS uses UDP first which is not a reliable protocol (it's "reliable" but does not guarantee data will reach its destination). The conditional forwarder is non-authoritative for the zone, so it cannot directly answer with a negative response. The conditional forwarder is limited to servers entered in the forwarding configuration, which may be less than what exists in the zone NS records.

As an alternative, are your domains joined as a single forest? From your naming it appears not, however if so consider using Active Directory Integrated DNS zones so that all DNS servers in the forest are authoritative and contain complete replicated zones for all the domains via Active Directory. This can have side effects where a conditional forwarder is more "real-time" as it talks to the authoritative server, while an Active Directory replica will be slightly behind due to AD replication delays.

If you do not have a single forest, consider using older style Zone Transfer replicas. Like above with AD, it promotes the DNS server to be authoritative and gives it a copy of the zone file. This can be more resilient, as the forwarder is no longer dependent on communicating with the authoritative server(s) in the moment the query is required.

4
  • Thank you for the comprehensive answer, the situation is temporary, and yes they are independent of one another. Since our count of hosts to need the forwarding is low we reserved some IP space, set them static, and put the forward lookup zone in the DNS servers direct. But I still wanted to know why we had to do that. Where you said "Take a very close look at other traffic and logs" That is what drove me to ask this question to begin with. I have, exhaustively. At the point in time the server requests from its public forwarder a packet capture on both DNS servers shows no attempt...
    – Sabre
    Commented Oct 22, 2022 at 13:30
  • ...Other DNS queries through the forwarder, for other hosts on the same domain and for the victim, occur without issue within 1-2ms before (After the incorrect lookup is cached), as do other queries. The hosts are in continuous use, and no network issues of any kind are observed, logged, or can be detected, even packet level.. They are two VMs in the same Esxi instance, same virtual switch. Possible this occurs in some race condition between the expired cache and the next correct query internally in the DNS server?
    – Sabre
    Commented Oct 22, 2022 at 13:38
  • Best I can say at this point would be "maybe". The conditional forwarder should work as you have described it, yet what you are seeing tells us "something" happened. It would be a deep dive to find what, as you have already seen.
    – Doug
    Commented Oct 22, 2022 at 19:54
  • Good deal, and fair response, since you ave intelligently engaged the question, and had expanded suggestions, I will take that. At the very least it confirms I am not crazy, this should work as configured. As configured it does not represent some abject failure in comprehending "The way it should be" , and the behavior is indeed a transient anomalous vs definable administration. I did not expect resolution on something this fleeting and specific, as much as peer confirmation. Thank you very much for your input.
    – Sabre
    Commented Oct 23, 2022 at 19:46

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .