Active Topics Memberlist Calendar Search Help | |
Register Login |
One Stop Testing Forum : Types Of Software Testing @ OneStopTesting : Manual Testing @ OneStopTesting |
Topic: Technologies for Black Box Security Testing |
|
Author | Message |
Harini
Newbie Joined: 15Feb2007 Online Status: Offline Posts: 1 |
Topic: Technologies for Black Box Security Testing Posted: 15Feb2007 at 5:58pm |
Technologies for Black Box Security Testing Not surprisingly, black box testing for security has a different technological focus than traditional black box testing. [Fink 04] defines positive requirements as those requirements that state what a software system should do, while negative requirements state what it should not do. Although security testing deals with positive requirements as well as negative ones, the emphasis is on negative requirements. In contrast, traditional software testing focuses on positive requirements. This difference in emphasis is reflected in the test tools that support black box test activities. The technology incorporated in such tools can be classified as follows, according to its functionality: * fuzzing: the injection of random or systematically-generated data at various interfaces, with various levels of human intervention to specify the format of the data * syntax testing: generating a wide range of legal and illegal input values, usually with some knowledge of the protocols and data formats used by the software * exploratory testing: testing without specific expectation about test outcomes, and generally without a precise test plan * data analysis: testing the data created by an application, especially in the context of cryptography * test scaffolding: providing testers with support tools they need in order to carry out their own black box tests. For example, if the tester wants to inject a certain error code when an application tries to open a pipe, support technology is needed to actually carry out this test. * monitoring program behavior: When a large number of tests are automatically applied, it is useful to also have automatic techniques for monitoring how the program responds. This saves testers from having to check for anomalous behavior manually. Of course, a human is better at seeing anomalous behavior, but the anomalies that signal the presence of a security vulnerability are often quite obvious. In this section, we do not discuss test automation technology, which is a standard technology used to automate the execution of tests once they have been defined. It is technology for traditional testing, and this fact makes it too broad of a subject to cover within the intended scope of this document. However, any extensive treatment of software testing also covers test automation, and the reader may consult standard references on software testing such as [Beizer 95], [Black 02], and [Kaner 93]. Fuzzing The term fuzzing is derived from the fuzz utility (ftp://grilled.cs.wisc.edu/fuzz), which is a random character generator for testing applications by injecting random data at their interfaces [Miller 90]. In this narrow sense, fuzzing means injecting noise at program interfaces. For example, one might intercept system calls made by the application while reading a file and make it appear as though the file contained random bytes. The idea is to look for interesting program behavior that results from noise injection and may indicate the presence of a vulnerability or other software fault. Since the idea was originally introduced, the informal definition of fuzzing has expanded considerably, and it can also encompass domain testing, syntax testing, exploratory testing, and fault injection. This has the unfortunate consequence that when one author denigrates fuzzing (as in [McGraw 04]) while another extols it (as in [Faust 04]), the two authors might not be talking about the same technology1. The current section is partly meant to emphasize that in this document, “fuzzing” is used in the narrow sense implied by [Miller 90]. Fuzzing, according to the first, narrower definition, might be characterized as a blind fishing expedition that hopes to uncover completely unsuspected problems in the software. For example, suppose the tester intercepts the data that an application reads from a file and replaces that data with random bytes. If the application crashes as a result, it may indicate that the application does not perform needed checks on the data from that file but instead assumes that the file is in the right format. The missing checks may (or may not) be exploitable by an attacker who exploits a race condition by substituting his or her own file in place of the one being read, or an attacker who has already subverted the application that creates this file. For many interfaces, the idea of simply injecting random bits works poorly. For example, imagine presenting a web interface with the randomly generated URL “Ax@#1ZWtB.” Since this URL is invalid, it will be rejected more or less immediately, perhaps by a parsing algorithm relatively near to the interface. Fuzzing with random URLs would test that parser extensively, but since random strings are rarely valid URLs, this approach would rarely test anything else about the application. The parser acts as a sort of artificial layer of protection that prevents random strings from reaching other interesting parts of the software. For this and other reasons, completely random fuzzing is a comparatively ineffective way to uncover problems in an application. Fuzzing technology (along with the definition of fuzzing) has evolved to include more intelligent techniques. For example, fuzzing tools are aware of commonly used Internet protocols, so that testers can selectively choose which parts of the data will be fuzzed. These tools also generally let testers specify the format of test data, which is useful for applications that do not use one of the standard protocols. These features overcome the limitation discussed in the previous paragraph. In addition, fuzzing tools often let the tester systematically explore the input space; for example, the tester might be able to specify a range of input values instead of having to rely on randomly generated noise. As a result, there is a considerable overlap between fuzzing and syntax testing, which is the topic of the next section. Syntax Testing Syntax testing [Beizer 90] refers to testing that is based on the syntactic specification of an application’s input values. The idea is to determine what happens when inputs deviate from this syntax. For example, the application might be tested with inputs that contain garbage, misplaced or missing elements, illegal delimiters, and so on. In security testing, one might present a web-based application with an HTTP query containing metacharacters or JavaScript, which in many cases should be filtered out and not interpreted. Another obvious syntax test is to check for buffer overflows by using long input strings. Syntax testing helps the tester confirm that input values are being checked correctly, which is important when developing secure software. On the other hand, syntactically correct inputs are also necessary for getting at the interesting parts of the application under test, as opposed to having the test inputs rejected right away, like the random-character URL in the section on fuzzing. Typically, it is possible to automate the task of getting inputs into the right form (or into almost the right form, as the case may be). This lets the tester focus on the work of creating test cases instead of entering them in the right format. However, the degree of automation varies. It is common for testers to write customized drivers for syntax testing, which is necessary when the inputs have to be in an application-specific format. Test tools can provide support for this by letting the tester supply a syntax specification and automating the creation of a test harness based on that syntax. On the other hand, the tool may also come with a prepackaged awareness of some common input formats. In security test tools, there is a certain emphasis on prepackaged formats because many applications communicate across the network using standard protocols and data formats. It makes sense for a security test tool to be aware of widely used protocols, including HTTP, FTP, SMTP, SQL, LDAP, and SOAP, in addition to supporting XML and simplifying the creation of malicious JavaScript for testing purposes. This allows the tool to generate test input that almost makes sense but contains random values in selected sections. Creating useful syntax tests can be a complex task because the information presented at the application interface might be mixed, perhaps containing SQL or JavaScript embedded in an HTTP query. Many attacks are injection attacks, where a datastream that is in one format according to the specification actually contains data in another format. Specifically, most data formats allow for user-supplied data in certain locations, such as SQL queries in a URI. The embedded data may be interpreted by the application, leading to vulnerabilities when an attacker customizes that data. Such vulnerabilities may or may not be visible in the design; a classic example of where they are not visible is when a reused code module for interpreting HTML also executes JavaScript. One important variant of syntax testing is the detection of cross-site scripting vulnerabilities. Here, the actual interpreter is in the client application and not the server, but the server is responsible for not allowing itself to be used as a conduit for such attacks. Specifically, the server has to strip JavaScript content from user-supplied data that will be echoed to clients, and the same goes for other data that might lead to undesired behavior in a client application. Testing for cross-site scripting vulnerabilities (see [Hoglund 02]) amounts to ensuring that dangerous content really is being stripped before data is sent to a client application, and this, too, involves specially formatted data. Automated support for syntax testing may or may not provide a good return on investment. Good security testing requires a certain level of expertise, and a security tester will probably be able to write the necessary support tools manually. Custom data formats make it necessary to write some customized test harnesses in any event. It may also be cost effective to write in-house test harnesses for standard protocols, since those harnesses can be reused later on, just as third-party test harnesses can. Although in-house test drivers do not usually come into the world with the same capabilities as a third-party test application, they tend to evolve over time. Therefore, the amount of effort that goes into the development and maintenance of in-house test drivers diminishes over time for commonly used data formats. In spite of these factors, third-party tools often can have usability advantages, especially compared to in-house tools being used by someone who did not develop them originally. Exploratory Testing and Fault Injection In security testing, it may sometimes be useful to perform tests without having specific expectations about the test outcome. The idea is that the tester will spot anomalies—perhaps subtle ones—that eventually lead to the discovery of software problems or at least refocus some of the remaining test effort. This contrasts with most other testing activities because usually the test plan contains information about what kind of outcomes to look for. Exploratory testing is discussed in depth in [Whittaker 02] and [Whittaker 03]. There is no technical reason why a test plan cannot map out these tests in advance, but in practice many testers find it useful to let the outcome of one test guide the selection of the next test. In a sense, the tester is exploring the software’s behavior patterns. This makes sense because a subtle anomaly may create the need to collect further information about what caused it. In a black box test setting, getting more information implies doing more tests. This leads to the concept of exploratory testing. Most test technologies that support exploratory testing can also be used for other test activities, but some techniques are associated more closely with exploratory testing than with other types of testing. For example, fuzzing (in the narrow sense of the word described earlier) falls into this category, because usually testers don’t have any exact idea of what to expect. More generally, certain test techniques make it hard to say exactly what anomalous behavior might occur even though there is interest in seeing how an application will respond. Some of the other techniques that fall into this category are: Security stress testing, which creates extreme environmental conditions such as those associated with resource exhaustion or hardware failures. During traditional stress testing, the idea is to make sure that the application can continue to provide a certain quality of service under extreme conditions. In contrast, during security testing it may be a foregone conclusion that the application will provide poor service—perhaps good performance under stress is not a requirement—and the tester might be looking for other anomalies. For example, extreme conditions might trigger an error-handling routine, but error handlers are notorious for being under-tested and vulnerable. As a second example, slow program execution due to resource exhaustion might make race conditions easier to exploit. Needless to say, an attacker might be able to create whatever extreme conditions are needed for the attack to succeed. Fault injection, which directly modifies the application’s internal state [Voas 97]. Fault injection is often associated with white box testing, since it references the program’s internal state, but in practice certain types of test modify external data so close to the program’s inner workings that they can also be regarded as fault injection. For example, the tester might intercept calls to the operating system and interfere with the data being passed there. Interfering in communication between executable components might also be regarded as a black box technique. Fault injection can clearly be used for stress testing, but it can also be used to help a tester create conditions with relative ease that an attacker might create with greater effort. For example, if the tester interferes with interprocess communication, it might approximate a situation where one of the communicating processes has been subverted by an attacker. Likewise, intercepting calls to the operating system can be used to simulate the effects of an attacker getting control of external resources, since system calls are used to access those resources. It is not always clear how an attacker might exploit problems found using fault injection, but it can still be useful to know that those problems are there. Of course, some of the resulting tests might also be unfair—for example, an attacker intercepting system calls could manipulate all of the application’s memory—and in the end the tester has to ensure that the test results are meaningful in an environment where the operating system protects the application from such attacks. Data Analysis Capabilities By data analysis, we mean the process of trying to understand a program’s internals by examining the data it generates. This might be followed by an attempt to go beyond mere observation and influence the program’s behavior as well. One of the concerns of black box security testing is to try performing this type of analysis in order to determine whether an attacker could do the same thing. Two particularly salient issues are * Stateless protocols use external mechanisms to keep track of the state of a transaction (HTTP uses cookies, for example). It is not always desirable to expose this state information to a potential attacker, but data analysis can be used to deduce state information at the black box level. * It is sometimes necessary to use random numbers to generate cryptographically secure keys or hashes on the fly. If an attacker can collect outputs from a weak random number source and analyze those outputs sufficiently well to predict future random bits, even a strong cryptographic algorithm can be compromised. A related issue that will not be discussed at great length is that random numbers are used in computerized casino gaming, and an attacker who can predict these numbers—even partially—may be able to cheat. In each of these cases, the security issue is the ability to generate random numbers that prevent the attacker from seeing patterns or predict future values. As a rule, this issue should be addressed in the design phase—using weak random number generation is a design flaw—but testing still plays its usual roles. For example, it can be used to probe whether the design is implemented correctly or to examine third-party components whose source code is unavailable. In one case, a municipality needed secure random numbers to secure a specific aspect of law-enforcement-related communications, but problems were encountered in obtaining the necessary source code from a third-party vendor. Black box testing was used to achieve a minimal level of due diligence in the question of whether the random numbers were unpredictable. Cookie analysis deserves its own discussion. It consists of deducing how a web application uses cookies in order to examine the application’s inner workings, or even to hijack sessions by predicting other users’ cookie values. In a well-designed system this should not lead to an immediate compromise—after all, truly sensitive information should be protected by SSL or a similar mechanism—but cookie analysis can provide the toehold that an attacker needs in order to launch a more damaging attack. Of course, not all software systems are well designed, and some are vulnerable to direct compromises using cookie analysis, or even simple replay attacks involving cookies. These issues lead to the idea of randomness testing, which is within the scope of black box testing. Some black box testing tools provide simple statistical tests and visualization tools to support cookie analysis. Furthermore, the analysis and detection of cryptographically weak random-number schemes is not purely the domain of software security, and this works to the advantage of the black box tester because it makes more technology available for that task. For example, weak random number generation is often used to generate events in software-based simulations, an application in which speed is more important than security. This creates a need to know exactly what the weaknesses of the random number generator are so that they do not bias the simulation. There are some standard software packages for evaluating randomness empirically: the NIST battery, the Diehard battery, and ent. As a final note, testers should be aware that even if a random number source passes these test batteries, this does not imply that the source is cryptographically secure. As in many other areas, testing can only demonstrate the presence of problems, not their absence. Monitoring Program Behavior Monitoring program behavior is an important part of any testing process because there must be a way to determine the test outcome. This is often referred to as observability. Usually it means examining the behavior of the program under test and asking whether this observed behavior is symptomatic of a vulnerability in the software. This examination can be harder in security testing than it is in traditional testing, because the tester is not necessarily comparing actual program behavior to expectations derived from specifications. Rather, the tester is often looking for unspecified symptoms that indicate the presence of unsuspected vulnerabilities. Nonetheless, there are cases in which the unusual behavior sought by a security tester can be specified cleanly enough to test for it automatically. For example, if a web application is being tested for cross-site scripting vulnerabilities, an attacker’s ability to make the application echo externally supplied JavaScript is enough to indicate a possible problem. Likewise, a series of tests meant to detect potential buffer overflows may just require the application to be monitored for crashes. There are many test automation tools with the ability to monitor program outputs and behavior. In selecting a black box testing tool, it may be useful to consider whether a tool either provides its own monitoring capabilities or integrates with other existing test automation frameworks. Another aspect of behavior monitoring is that for security testing, one may have to observe black box behavior that is not normally visible to the user. This can include an application’s communication with network ports or its use of memory, for example. This functionality is discussed in the next section, which deals with test support tools. A fault injection tool may also support this type of monitoring because the underlying technologies are similar. A final issue that also applies to traditional testing is that automation is quite useful for spotting anomalous test outcomes. This is especially true during high-volume test activities like fuzzing. In security testing, a great deal of reliance is placed on the tester’s ability to see subtle anomalies, but the anomalies are not always too subtle for automated detection. Thus, some test tools automate monitoring by letting the tester specify in advance what constitutes anomalous behavior. Test Scaffolding By test scaffolding we mean tools that support the tester’s activities, as opposed to actually generating data. This primarily includes test management technology, but test management is in the domain of traditional test automation and we do not cover it here. Instead, we focus on technology for observing and/or influencing application behavior in ways that would not normally be possible for an ordinary user or tester. Technologies for observing program behavior are quite common, since they are needed for numerous other purposes as well, such as debugging and performance monitoring. Of course, their utility in test automation depends somewhat on how easily they can be integrated with other test tools, especially those for monitoring program behavior. Thus debuggers, which are usually interactive, can provide testers with valuable information but might be a bottleneck during automated testing. On the other hand, text-based tools can have their outputs postprocessed even if they are not explicitly supported by a testing tool, while some graphical tools might allow a tester to observe anomalies even with a rapid-fire series of automated tests. here are some testing tools, notably the Holodeck system [Whittaker 02,Whittaker 03], that already include test scaffolding of this kind. Post Resume: Click here to Upload your Resume & Apply for Jobs |
|
IP Logged | |
Forum Jump |
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot delete your posts in this forum You cannot edit your posts in this forum You cannot create polls in this forum You cannot vote in polls in this forum |
© Vyom Technosoft Pvt. Ltd. All Rights Reserved.