Sign In

Internet Performance Delivered right to your inbox

Recent IoT-based Attacks: What Is the Impact On Managed DNS Operators?

Everyone from the C suite to K Street has seen the news of the most recent rounds of DDoS attacks against the likes of Krebs, OVH and others. Widespread cries for BCP 38 are renewed,  source address validation everywhere (SAVE) is a hot topic, and talks about a solution centered on reputation based peering are bubbling up. But has anything changed for the internet operator community? Or has the social amplification of risk increased awareness of known faults and gaps in internet infrastructure? The focus of this piece is on attack traffic for which BCP 38 / SAVE are not impactful.

In the trenches

Let’s look at this operationally. An attack happens … now what? You have some logs, network usage metrics and a timeline of alerts from monitoring systems tripped during the attack. As an Authoritative DNS provider, the data we have from an attack often isn’t directly actionable without some cooperation from other recursive resolver operators or amplification honey pots. BCP 38 / SAVE would help remove the need for this step of analysis. These changes are needed because the Internet’s design inherently enables certain kinds of attacks. At the risk of oversimplification, here are some quick characterizations of each kind:

Attacks which focus on web service resource exhaustion can be harder to defend against because the attacker is making requests for resources, often in a manner similar to a normal end user. They connect to your web server and they request the images, html and other resources to render a web page. These attacks are higher risk for the attacking botnet operator as connection to the web server and making resource requests requires a TCP handshake which exposes the IP address of the compromised device. The large population of vulnerable connected devices and ease of exploitation has increased the viability and sustainability of layer 7 HTTP / HTTPS attacks.

More common attacks focus on generating large volumes of data which prevent legitimate data from reaching the targeted end point. These attacks don’t require a large botnet, they only require connectivity to a provider which doesn’t perform source address validation. The lack of source address validation allows  requests to be issued seemingly on the behalf of another system and their response is then directed at the unsuspecting device or service. When issuing such a volumetric attack the operator has their choice of protocol DNS, NTP, SSDP, TFTP, even services as benign as TeamSpeak and Valve Source Engine can be used as their responses are larger than the requests made to them. In these scenarios, finding the reflector or issuer of the larger response feels like a waste of time.  ShadowServer, DShield, The Open Resolver project and others have made reporting on these sources available for years. So the problem is not accessibility of data, availability of reporting, or awareness of the issue (If you own and operate IP space please sign up for ShadowServer reports to make sure you aren’t facilitating these attacks ) https://www.shadowserver.org/wiki/pmwiki.php/Services/Reports).

authoritative-resolvers

The goal of an authoritative DNS exhaustion attack is to remove the protection of the recursive caching layer from the authoritative DNS resolvers. To be effective the attackers wants to have each client request result in an authoritative lookup, ideally placing enough strain on the authoritative resolver that it stops functioning. To do this the client needs to request records which will not appear in the cache of the recursive layer, because if the result isn’t found in the cache the recursive resolver will need to request that value from the authoritative. This cache busting technique is used frequently when collecting DNS real time user measurement (RUM) data. In the case of RUM requests, the goal is to force authoritative resolution to collect timing and performance telemetry. The Mirai botnet, recently in the news for being identified as a source of the attack on Krebs on Security, has an authoritative exhaustion function in its arsenal. This is implemented in Mirai by prepending a pseudorandom 12 character subdomain to the target domain. This leaves the authoritative DNS provider with a fingerprint. At time X machine Y requested a domain with a pseudorandom sub domain. With this information, you can go to the owner/operator of machine Y and inform them that at time X you received a request from machine Y for a domain with a the specified sub domain. They can then tie that request to the machine which asked them about that domain with the sub domain. At that point they know the client IP of the infected system or the outbound IP of a carrier grade NAT. However, as the above description outlines, it requires an in-depth logging history for a high volume systems with some potential privacy implications.

Reviewing the data

When we look at our data from these authoritative exhaustion attacks, we can identify what infrastructure was interacting with ours. Allowing for issues with geolocation of IP addresses, we can make some conclusions about where the operators are that we need to work with to resolve the issue. The sample represented in the chart below is taken from some internet of things ( IoT ) based attacks on our authoritative DNS platform. At first, we were surprised to see the US coming in with the top rank when it comes to percent of total queries by recursive resolver autonomous system. The released Mirai source code, combined with the attack fingerprint, give reason to believe a Mirai botnet was the source of these attacks.

Resolver Country Code Number of Prefixes Number of ASNs % of Traffic
US 226 99 31.54
CN 269 42 15.12
BR 580 264 8.33
RU 522 373 5.58
TW 22 12 3.88
UA 181 139 3.03
KR 26 14 2.95
BG 128 86 2.71
CL 18 9 1.95
ID 35 21 1.94

If we break down the US component into its top contributors, we see the autonomous systems with the highest volume are the default resolvers listed in the Mirai malware source: Google, Hurricane Electric, Verisign, and Level 3. If the local resolver isn’t found …

attack_udp.c line 536

    switch (rand_next() % 4)
        {
        case 0:
            return INET_ADDR(8,8,8,8);
        case 1:
            return INET_ADDR(74,82,42,42);
        case 2:
            return INET_ADDR(64,6,64,6);
        case 3:
            return INET_ADDR(4,2,2,2);
        }

This is a proof of the power of defaults; these recursive provider ASNs make up 24% of total traffic we saw. An interesting detail, which we also see when looking at our recursive resolver data, is the varying volume of queries per second by requestor. This might be because the IP we are seeing is the outbound IP of a NAT or it might be an indicator of the type of infected device. The anomalies really stand out when we dig into our query logs and look at a distribution of queries per second the resolver was seeing per IP.

qps-by-endpoint

Summing It Up

If broken devices or devices with poorly configured defaults in any autonomous system can be aggregated into a force that can attack any individual, company or network, then it would seem logical to conclude a susceptible device anywhere poses a risk to everyone. The network isolationist response is a shift toward the “Golden Networks” paradigm. The basic concept is that a set of networks can be whitelisted (Google, Microsoft, Alibaba, Yandex … etc) in a response to internet turbulence. The mention of it here is mainly a red herring, as every operator will tell you it is infeasible! To find out for yourself just ask the folks on the NANOG list.

A more feasible step in this direction is reputation based peering. Reputation based peering would use observations of abuse or neglect over time to rate/rank members of the networking community. This idea might be compared to the tradition of email abuse mitigation. A sender has a reputation with identity components derived from source address range, individual IP reputation of the MTA or sending domain, which is constructed over time based on their observed behaviour. In mail, this notion of sending reputation underlies important internet institutions such as Spamhaus, SURBL, etc.  Email provider consolidation (Yahoo, Google, Microsoft, Mail.ru, Tencent, etc) decreased the surface area of the problem. On the upside, a spammer or phisher left unchecked can fill end user inboxes in the blink of an eye, which is why many reputation services block new domains and rate limit new senders. Timeliness is a key component to email reputation. The feasibility of implementing a similar ranking for infrastructure initially feels more complex.  Reputation based peering should be less time sensitive, which might facilitate wider implementation.

When thinking about operational solutions to modern Internet woes, it’s important to dig into the past. It turns out that, like most things, none of this is new. Back in the mid-1990’s, in the days of the Palo Alto Internet Exchange (PAIX), operators were talking about the importance of peering agreements having teeth. The canines would come in the form of abuse queue serivce level agreements and penalties for violations. But history took a different route; in many cases, settlement free peering and a “beer and peer” style informal arrangement took the place of formal contracts.

Of course, in real life, humans are good at this sort of quick knee-jerk reputation analysis.  It’s the basis of observations that something is a “sketchy neighbourhood”, and the worry we all have had about using a credit card in some establishments.  These quick and dirty analyses are not perfect, but they are part of how we cope as individuals with a big world.  The same kinds of mechanisms may be needed to protect the Internet from bad actors.

As the Internet grows to support more people and societal functions its perceived criticality increases, but its core architecture remains designed for openness and not security. Will we see all device security/stability as a common problem or will we implement a series of rankings and measures to segment the Internet into “internets” based on perceived stability? How does IPv6 play into reputation based peering and does the volume of addressable space play a factor when thinking about a ranking system? If we decided that individual device security is a collective problem, what is the multi-pronged solution we use to move in that direction? Does it start with sweeping trade embargos for devices with insecure defaults paired with a global effort to help fund infected device remediation? We need to act soon, as exploiting the growing number of insecure devices has already proved an effective method of disruption.  A new CVE was issued last Friday detailing yet another vulnerable population of devices ( https://www.flashpoint-intel.com/when-vulnerabilities-travel-downstream/ ) and there is no market pressure in place to slow the production of more of these devices.

What’s Next?

Assess the feasibility of network reputation by drafting criteria for reputation rankings. This will help facilitate an understanding of the complexity of collecting the source data and the rigor of establishing consensus. What is the risk rating of misconfigured NTP servers vs. compromised IP cameras and DVRs? How often does reputation change and what are the implications for this on changes to routing policy?

Plan and structure research to understand why current remediation methods don’t seem to work. If there are still Conficker infections then I’m sure the Morris worm would still be alive and kicking somewhere if fingerd was still in use.

Evaluate why end users / device owners don’t respond or don’t take action. Collect concrete data on this subject, find some ISPs with success stories that they can share and others can replicate. Collect data from the end user / device owners on why they aren’t responding to help describe how to confront the problem. This may seem elementary but think about the history of environmental PSAs from “Don’t pour motor oil in the gutters” to “Food isn’t trash put it in a bio-degradable bin.” We might also find that the approach of relying on the end user is infeasible and we need to exert pressure elsewhere in the ecosystem.

Meme IoT thing


Share Now

Whois: Chris Baker

Chris Baker is a Systems Guru for Dyn,a cloud-based Internet Performance company that helps companies monitor, control, and optimize online infrastructure for an exceptional end-user experience. You can hear more from Chris and other Dyn employees by following us on Twitter (@Dyn) or on Facebook.