Monte Carlo experiences are a powerful statistical technique used to provide approximate answers for questions about complex problems that may include a stochastic component, mostly when analitic and numeric techniques fail to supply, with an acceptable amount of effort, those answers in an exact and/or complete manner. These simulation techniques are essentially based on controlled statistical sampling, and they have a wide range of applications including, among others, statistical mechanics, biology, games, combinatorial optimization, engineering, etc.
There are several ways to summarize the behaviour of a sample of uni- or multi-dimensional data, either real or simulated data. Each technique is well suited for the enhancement of certain structures in the data: means, variances, quantiles, etc. and for the testing of different hypothesis about the underlying population.
It is commonplace, when simulation is used, to have several samples to be analyzed. Every sample could have been obtained, for example, by similar simulations of the same system and with different parameter values. What is to be known is, for example, which are the parameter values that render the most adequate model to represent a real system, or, maybe, how does the system behave under different parameter values.
A simulation study must be carefully planned, in order to obtain meaningful and useful results. It is to be always remembered that this kind of study (nothing but an experience with numbers) and experiences with animals, crops, etc. are alike. From this viewpoint, Monte Carlo experiences have the advantage of being wholly controlable, as is usually not the case in other laboratories of Applied Sciences.
Therefore, a Monte Carlo experience should be planned obeying the rules of Experiments Design. Many good ideas could be borrowed from this area of Statistics (see, for example, the book by Box et al. (1978) for a comprehensive introduction). We should identify the critical hypothesis in the considered model, and we should also obtain every needed output in order to isolate the effects of every factor, besides considering groups of relevant factors.
In principle, a simulation model has two components: one is given by the parameters and interaction structures among the random variables: the input; the other is the response or output. There are several elements to be considered when planning the experience, for example:
The first three questions are of general interest, and applicable to every simulation or data analysis problem. A detailed discussion about these issues, specifically applied to simulation, could be seen in the work by Kleijnen (1975); Gruber and Freimann (1986) also treat this problem, in the context of comparison of estimators.
The two most relevant factors in stochastic simulation (whenever used as a study methodology in Statistics) are: the sample size and the distributions from which the samples are taken. Most of the works that use stochastic simulation (in Statistics) aim at comparing different techniques under different data distributions. Among these works we consider those devoted to the determination (or approximation) of the exact distribution of certain statistics.
Some common situations, when the interest is in comparing performances of different statistics, are:
The following questions could be answered in order to assess performances:
A quite general setup for a Monte Carlo experiment, suited for the answering of these questions, could be:
In order to have the study complete, the setup above could be modified or repeated for:
A first step towards increasing the accuracy of a certain estimator, could be the searching of efficient techniques for the calculation of the used quantities; or increasing the number of replications (M) without overweighting the computational cost, using, for instance, faster generation techniques. For example, if the programs were developed in a high level programming language, such as FORTRAN, the experimenter should try to write them down without too much effort in a lower level language, such as C or even ASSEMBLER. This could yield faster generations and calculations... though there might appear formidable programming problems, increasing the risk of bugs, mistakes, etc. The only rule about it we know is: be sensible!
The tools devoted to improve the accuracy of simulation results are
generically known as Variance Reduction Techniques.
This name could be justified by considering what is usually done in a
simulation experiment: let
be a cummulative distribution function; let X
be a random variable with distribution given by F above, and let
be a measurable function.
Problem: estimate
, where
are independent identically distributed random variables with common
distribution F.
The raw Monte Carlo estimator is:
The variance reduction techniques consist of modifying this setup in
order to obtain a variance reduction of the estimator of .
For instance, through modifying the way in which the random variables
are generated, or by incorporating analytical knowledge about the
distribution F.
In most problems, could be a vector, and g and F could
have quite complicated forms; in such cases, only the use of some
variance reduction technique would ensure dependable results.