DNS MX vs IPv4 & IPv6

Yesterday – a very interesting issue cropped up…

Another local provider had email warnings being generated that they could not connect to our frontend MX servers.

The error itself was:

IPv6 is not supported

Which is not very clear for a reason email can not be delivered. I mean, my systems are working fine with IPv6 and we have native IPv6 connections occurring regularly.

So I scour our logs looking for case-insensitive ‘ipv6’ and all I find are hostnames with that listed, no errors, nothing with the word ‘supported’ listed.

The other provider thinks that maybe it is an issue on their side, which looks to be right, but for a different reason. Speculation: Their mail servers were trying IPv6 connections but were (at the time of discussion) not necessarily configured to handle IPv6 connections. No route (and router) for IPv6 means error. This makes sense.

But the culprit looks to be how different resource records are handled, and a probable mistake by me (humans…).

Original DNS record lookup result looked like:

;; ADDITIONAL SECTION:
smtpgrey-1.iphouse.net.	13	IN	A	216.250.190.160
smtpgrey-1.iphouse.net.	83213	IN	AAAA	2001:4980:0:ffff:25::160
smtpgrey-1.iphouse.net.	83213	IN	AAAA	2001:4980:0:1000:25::160
smtpgrey-2.iphouse.net.	71	IN	A	216.250.190.161
smtpgrey-2.iphouse.net.	83213	IN	AAAA	2001:4980:0:ffff:25::161
smtpgrey-2.iphouse.net.	83213	IN	AAAA	2001:4980:0:1000:25::161

New DNS record results now look like:

;; ADDITIONAL SECTION:
smtpgrey-1.iphouse.net.	100	IN	A	216.250.190.160
smtpgrey-1.iphouse.net.	100	IN	AAAA	2001:4980:0:ffff:25::160
smtpgrey-1.iphouse.net.	100	IN	AAAA	2001:4980:0:1000:25::160
smtpgrey-2.iphouse.net.	100	IN	A	216.250.190.161
smtpgrey-2.iphouse.net.	100	IN	AAAA	2001:4980:0:ffff:25::161
smtpgrey-2.iphouse.net.	100	IN	AAAA	2001:4980:0:1000:25::161

and things are now doing much better as TTLs now match up and expiration of IP addresses happens as expected. There are still periods where only 1 IPv4 address is returned but in 2-3 seconds both are again listed. Seems like at all times that all 4 of the IPv6 records are listed.

What was happening, Mavis?

If you look at the original return from the UNIX command of dig, the TTL (time to live) (13 and 71 seconds) of the IPv4 records is lower than the TTL of the IPv6 records. What was happening was that the IPv4 records were expiring (as the TTL said to do) but since there were other valid records in cache it was not looking to refresh the IPv4 records.

I tested this against FreeBSD 6-STABLE, FreeBSD 8-STABLE, Ubuntu 10.04, and RHEL5 and all exhibited the exact same results. The IPv4 addresses were expiring ahead of the IPv6 address. My conjecture is that only systems that were using IPv6 natively were successful at regularly delivering email if their resolver was handing back both types of records and their SMTP subsystems were configured to handle IPv6 connections.

For the other provider (and I’ll update this post if I am wrong, including apology) – I think they are internally dual-stacked but do not have anything more than the link-local auto-configured IPv6 address on their ethernet port(s). Without external IPv6 route auto-configuration or specific IPv6 routing, it is going to fail. There would be others out there that would have the same problem, so some email was probably delayed depending on how their resolvers returned information (and cached said info) to the calling program.

BTW – the not-so-useful error message came from CommuniGate Pro.

Update: RFC4472 has some great info for the other side of the conversation, and I’m going to shorten the TTL of the IPv6 records and see what happens long term.

Comments are closed.