This article is the second in a series that is designed to help readers to assess the risk that their Internet-connected systems are exposed to. In the first installment, we established the reasons for doing a technical risk assessment. In this installment, we'll start discussing the methodology that we follow in performing this kind of assessment.
Why all the fuss about a Methodology?
If you ever read anything SensePost publishes on assessments, or if you attend our training, you'll notice that we tend to go on a bit about methodology. The methodology is the set of steps we follow when performing a certain task. We try and work according to methodologies with just about everything we do. We're not fanatical about it, and the methodologies change and adapt to new and different environments, but they always play a big role in the way we approach our work. Why such a big fuss then? There are a few good reasons for performing assessments according to a strict methodology:
Firstly, it gives us a game plan. Rather then stare blankly at a computer screen or a network diagram, an analyst now has a fixed place to start and a clear task to perform. This takes the whole "guru" element out what we do. The exact steps that we will follow are clear, both to us, and to the customer, from the outset.
Secondly, a methodology ensures that our work is consistent and complete. I've worked on projects where the target organization has in excess of 150 registered DNS domains. Can you imagine how many IP addresses that eventually translates to. I don't have to imagine - I know it was almost 2000. Consider how hard it must be to keep track of every DNS domain, every network and every IP to ensure that you don't miss something. Consider also what happens when the actual "hacking" starts (we'll get to this later) and the analyst's heart is racing. A strict methodology ensures that that we always cover all the bases and that our work is always of the same quality. This holds true, no matter how big all small the environment is that you're assessing.
Finally, our methodology gives our customers something to measure us against. Remember, to date there are really no norms or standards for technical assessment work. How does the customer know that she's getting what she paid for? This is an especially pertinent question when the assessment findings are (how can I put this?) dull. By working strictly according to a sensible methodology with clear deliverables at each stage we can verify the quality of the assessment even when there's very little to report.
A Methodology that Works
I'm completely sure that, when it comes to security assessment, there's more then one way to skin the cat. What follows is a description of a methodology that we like to use when performing security assessments over the Internet. It's certainly not the only way to approach this task, but it's one way that works, I believe.
1. Intelligence Gathering
The first thing we do when we begin an assessment is to try and figure out who the target actually is. Primarily we use the Web for this. Starting with the customer's own Web site(s), we mine for information about the customer that might be helpful to an attacker. Miscellaneous tidbits of useful data aside, our primary objective is to derive the DNS domain names that the target uses. If you're assessing your own infrastructure, you may already have this information but if the organization is big, it can be a fascinating exercise. Later, these domain names will be mapped to the IP addresses we will actually analyze. Some companies have a small Internet presence, and discovering the DNS names they use may be simple. Other companies we've worked with have hundreds of domains, and discovering all of them is no mean feat.
How do we get the DNS domain names? Well, usually we have an e-mail address, the company's name or some other logical place to begin. From there we have a number of techniques:
- We use search engines to search all instances of the company's name. This not only provides links to the company's own site (from which DNS domain information can be easily derived), we also obtain information about mergers and acquisitions, partnerships and company structure that may be useful.
- We use a tool like httrack to dump all the relevant Web sites to disk. We then scan those files to extract all mail and HTTP links, which are then parsed again to extract more DNS domains.
-
Then, we use the various domain registries. Tools like geektools.com, register.com and the like are simple and can often be used in one of two ways:
- To help verify whether the domains we have identified actually belong to the organization we are assessing.
- To extract any additional information that may be recorded in a specific domain's record. For example, you'll often find that the technical contact for a given domain has provided an e-mail address at a different domain. The second domain then automatically falls under the spotlight as a potential part of the assessment.
- Many of the registries provide for wildcard searches. This allows us to search for all domains containing a certain string, like "*abc*". I would use such a search to identify all the domains that may be associated with the company ABC Apples Inc, for example.
- Then, we need to apply some human intelligence - using the information we read on Web sites, company reports and news items we attempt to make logical jumps to other domains that may be relevant to our analysis.
The output of this phase is a comprehensive list of DNS domains that are relevant to the target company. You'll notice that the next phase of the assessment may provide additional domain names that weren't found during this phase. In that case, those domains are used as inputs during this phase and the entire process is repeated. Phases 1 and 2 may recur a number of times before we've located all the relevant domains. Typically, we'll check this list with the customer once we're done to ensure that we haven't missed anything or included something inappropriate.
2. Foot Printing
At the start of phase two we have a list DNS domains - things like apples.com, apples-inc.com, applesonline.com, apples.co.uk, etc. The reasons these domains exist is to provide Internet users with a simple way of reaching and using the resources they require. For example, instead of typing , a user simply needs to remember https://sensepost.com/. Within a domain, therefore, there are a number of records - specific mappings between machine names and their actual Internet Protocol (IP) numbers. The objective of this phase is to identify as many of those IP/name mappings as we possibly can in order to understand which address spaces on the Internet are actually being used by the target organization. There are a few different techniques for identifying these mappings. Without going into too much detail, these techniques are all derived from the same assumptions, namely:
- Some IP/name mapping must exist for a domain to be functional. These include the name server records (NS) and the mail exchanger records (MX). If a company is actually using a domain then you will be able to request these two special entries. Immediately you have one or more actual IP addresses to work with.
- Some IP/name mappings are very likely to exist on an active domain. For example, "www" is a machine that exists in just about every domain. Names like "mail", "firewall" and "gateway" are also likely candidates. We have a long list of common names that we test. This is by no means a watertight approach but one is more often lucky then not.
- An organization's machines usually live close together. This means that if we've found one IP address, we have a good idea of where to look for the rest of the addresses.
- The Name -> IP mapping (the forward lookup), and the IP -> Name mapping (the reverse lookup) need not necessarily be the same.
- The technology is fundamentally verbose. DNS, as a technology, was designed for dissemination of what is essentially considered "public" information. With one or two simple tricks we can usually extract all the information there is to be had. The DNS zone transfer - a feature of DNS literarily designed for the bulk transfer of DNS records - is a fine example of this. Other, craftier, techniques fall beyond the scope of this paper.
Once we have all the relevant DNS names we can find, we attempt to identify the distinct network "blocks" in which the target organization operates. As stated previously, IPs tend to be grouped together. The nature of IP networking is to group addresses together in what are known as subnets. The expected output of this phase is a list of all the IP subnets in which the target organization has machines active. At this stage, our broad reasoning is that if we find even a single IP in a given subnet we include that entire subnet in the list. The technically astute among you will already be crying "False assumption! False assumption!" and you'd be right. But bear with me. At this stage we tend rather to over-estimate then to under-estimate. Later, we will do our best to prune the list to a more accurate depiction of what's actually there.
3. Vitality
We ended the last phase with a list of IP subnets in which we believe the target organization to have a presence and a horde of technocrats objecting loudly to our assumptions about the subnet size. Let's quickly make a list of the some of the facts we need to know before we can move on with the process:
- An organization does not need to own the entire subnet in which it operates. IP addresses can be lent, leased or shared. Nor do all an organization's IPs have to be grouped together, they can be as widely spread across the Internet as they wish.
- Just because a Name / IP mapping exists for a machine, doesn't mean that machine actually exists. Conversely, just because a Name / IP mapping doesn't exist for a machine, doesn't mean the machine doesn't exist. There are thousands of nameless addresses on the Internet. Yes, it's sad, but true nevertheless.
- Without a route to describe how an IP address can be reached, that address can never be used on the Internet
So we see that, although DNS gives us a logical starting point for our search, it by no means provides a comprehensive list of potential targets. This is why we work with the rather loose subnet definitions we derived in the previous phase. The objective of the "Vitality" phase of the assessment is to determine, within the subnet blocks that we have, which IP addresses are actually active and being used on the Internet. We now leave the wonderful world of DNS behind us, and begin to concentrate solely on the IP address space.
So how does one determine if an address is active on the Internet or not? Well, let's recall the third "fact" from our list above. If there's no route to a given IP subnet, that subnet is as good as dead. Various core routers on the Internet graciously allow technicians and administrators to query them regarding routes to any given address. At the time of writing, one such router is route-views.oregon-ix.net. Such a router can't tell us that an IP address is alive. If there's no route for a subnet on the core routers, however, then we can conclude that all the IPs in that subnet are dead.
The next, and probably most the obvious technique is the famous IP "ping". Pinging works just like sonar. You send a ping to a specific address and the machine responds with a "pong" indicating that it is alive and received your request. Ping is a standard component of the Internet Protocol (IP), and machines that talk IP are compelled to respond when they receive a ping request. With simple and freely available tools we are able to ping an entire subnet. This is know as a "ping scan". Without going into too much detail, the response of such a ping scan can be interpreted as follows:
- A reply from an IP address indicates that the address is probably in use and accessible from the Internet.
- Multiple replies from a single IP address indicate that the address is probably actually a subnet address or a broadcast address and suggest a subnet border.
- No reply can only be interpreted to mean that the machine is not replying to IP ping requests.
I realize that the latter point is a bit vague, but that really is the only conclusion that can be drawn from the information available. I said that all machines that speak IP are obliged to respond to ping requests. Why not simply conclude that if the IP doesn't respond, it isn't being used? The confusion is introduced by modern network security products like firewalls and screening routers. In the real world, one often sees networks configured in such a way that the IP ping packet is blocked by the firewall before the packet reaches the machine. Thus the machine would respond if it could, but it's prevented from doing so.
So we haul out the heavy artillery. Just about every machine on the Internet works with a series of little Internet "post boxes" called ports. Ports are used to receive incoming traffic for a specific service or application. Each port on a machine has a number and there are 65536 possible port numbers. A modern machine that is connected to the Internet and actually functioning is almost certain to be using at least one port. Thus, if an IP address does not respond to our ping request, we can further probe it using a tool called a "port scanner". Port scanners are freely available software utilities that attempt to establish a connection to every possible port within a specified range. If we can find just one port that responds when probed, we know that the IP is alive. Unfortunately, the amount of work required to probe all 65,000 plus ports is often prohibitive. Such an exercise can takes hours per single IP address and days or even weeks for an entire range. So we're forced to make yet another assumption: If an IP address is active on the Internet, then it's probably there for a reason. And there are only so many reasons to connect a machine to the Net:
- The machine is a Web server (and thus uses port 80 or 443)
- The machine is a mail server (and thus uses port 25)
- The machine is a DNS server (and thus uses port 53)
- The machine is another common server - FTP, database, time, news, etc)
- The machine is a client. In this case it is probably a Microsoft machine and uses port 139.
Thus, we can now modify our scan to search for only a small number of commonly used ports. This approach is called a "host scan". It is by no means perfect, but it generally delivers accurate results and is efficient enough to be used with large organizations. The common ports we scan for can be adjusted to better suite the nature of the organization being analyzed, if required. The nmap network utility (available from www.insecure.org) is a powerful tool that serves equally well as ping scanner and a port scanner.
Thus, by the end of this phase we have fine-tuned our list of IP subnets and generated a list of actual IP addresses that we know to be "alive" and that therefore qualify as targets for the remainder of the process. At this point, our findings are usually presented to the customer to ensure that we're still on the right track.
Conclusion
That concludes the first part of our discussion of Internet assessment methodology. In the next installment in this five-part series on Internet Risk Assessments, we will continue to discuss methodology, including: visibility, vulnerability scanning, and analyses of Web applications.