Validity, Accuracy and Reliability Explained with Examples

This is part of the NSW HSC science curriculum part of the Working Scientifically skills.

Part 1 – Validity

Part 2 – Accuracy

Part 3 – Reliability

Science experiments are an essential part of high school education, helping students understand key concepts and develop critical thinking skills. However, the value of an experiment lies in its validity, accuracy, and reliability. Let's break down these terms and explore how they can be improved and reduced, using simple experiments as examples.

Target Analogy to Understand Accuracy and Reliability

The target analogy is a classic way to understand the concepts of accuracy and reliability in scientific measurements and experiments.

Accuracy refers to how close a measurement is to the true or accepted value. In the analogy, it's how close the arrows come to hitting the bullseye (represents the true or accepted value).
Reliability refers to the consistency of a set of measurements. Reliable data can be reproduced under the same conditions. In the analogy, it's represented by how tightly the arrows are grouped together, regardless of whether they hit the bullseye. Therefore, we can have scientific results that are reliable but inaccurate.
Validity refers to how well an experiment investigates the aim or tests the underlying hypothesis. While validity is not represented in this target analogy, the validity of an experiment can sometimes be assessed by using the accuracy of results as a proxy. Experiments that produce accurate results are likely to be valid as invalid experiments usually do not yield accurate result.

Validity

Validity refers to how well an experiment measures what it is supposed to measure and investigates the aim.

Ask yourself the questions:

"Is my experimental method and design suitable?"

"Is my experiment testing or investigating what it's suppose to?"

For example, if you're investigating the effect of the volume of water (independent variable) on plant growth, your experiment would be valid if you measure growth factors like height or leaf size (these would be your dependent variables).

However, validity entails more than just what's being measured. When assessing validity, you should also examine how well the experimental methodology investigates the aim of the experiment.

Assessing Validity

An experiment’s procedure, the subsequent methods of analysis of the data, the data itself, and the conclusion you draw from the data, all have their own associated validities. It is important to understand this division because there are different factors to consider when assessing the validity of any single one of them. The validity of an experiment as a whole, depends on the individual validities of these components.

When assessing the validity of the procedure, consider the following:

Does the procedure control all necessary variables except for the dependent and independent variables? That is, have you isolated the effect of the independent variable on the dependent variable?

Does this effect you have isolated actually address the aim and/or hypothesis?

Does your method include enough repetitions for a reliable result? (Read more about reliability below)

When assessing the validity of the method of analysis of the data, consider the following:

Does the analysis extrapolate or interpolate the experimental data? Generally, interpolation is valid, but extrapolation is invalid. This because by extrapolating, you are ‘peering out into the darkness’ – just because your data showed a certain trend for a certain range it does not mean that this trend will hold for all.

Does the analysis use accepted laws and mathematical relationships? That is, do the equations used for analysis have scientific or mathematical base? For example, `F = ma` is an accepted law in physics, but if in the analysis you made up a relationship like `F = ma^2` that has no scientific or mathematical backing, the method of analysis is invalid.

Is the most appropriate method of analysis used? Consider the differences between using a table and a graph. In a graph, you can use the gradient to minimise the effects of systematic errors and can also reduce the effect of random errors. The visual nature of a graph also allows you to easily identify outliers and potentially exclude them from analysis. This is why graphical analysis is generally more valid than using values from tables.

When assessing the validity of your results, consider the following:

Is your primary data (data you collected from your own experiment) BOTH accurate and reliable? If not, it is invalid.

Are the secondary sources you may have used BOTH reliable and accurate?

When assessing the validity of your conclusion, consider the following:

Does your conclusion relate directly to the aim or the hypothesis?

How to Improve Validity

Ways of improving validity will differ across experiments. You must first identify what area(s) of the experiment’s validity is lacking (is it the procedure, analysis, results, or conclusion?). Then, you must come up with ways of overcoming the particular weakness.

Below are some examples of this.

Example – Validity in Chemistry Experiment

Let's say we want to measure the mass of carbon dioxide in a can of soft drink.

Heating a can of soft drink

The following steps are followed:

Weigh an unopened can of soft drink on an electronic balance.
Open the can.
Place the can on a hot plate until it begins to boil.
When cool, re-weigh the can to determine the mass loss.

To ensure this experiment is valid, we must establish controlled variables:

type of soft drink used

temperature at which this experiment is conducted

period of time before soft drink is re-weighed

Despite these controlled variables, this experiment is invalid because it actually doesn't help us measure the mass of carbon dioxide in the soft drink. This is because by heating the soft drink until it boils, we are also losing water due to evaporation. As a result, the mass loss measured is not only due to the loss of carbon dioxide, but also water. A simple way to improve the validity of this experiment is to not heat it; by simply opening the can of soft drink, carbon dioxide in the can will escape without loss of water.

Example – Validity in Physics Experiment

Let's say we want to measure the value of gravitational acceleration `g` using a simple pendulum system, and the following equation:

$$T = 2\pi \sqrt{\frac{l}{g}}$$

where:

`T` is the period of oscillation
`l` is the length of string attached to the mass
`g` is the acceleration due to gravity

Pendulum practical

The following steps are followed:

Cut a piece of a string or dental floss so that it is 1.0 m long.
Attach a 500.0 g mass of high density to the end of the string.
Attach the other end of the string to the retort stand using a clamp.
Starting at an angle of less than 10º, allow the pendulum to swing and measure the pendulum’s period for 10 oscillations using a stopwatch.
Repeat the experiment with 1.2 m, 1.5 m and 1.8 m strings.

The controlled variables we must established in this experiment include:

mass used in the pendulum
location at which the experiment is conducted

The validity of this experiment depends on the starting angle of oscillation. The above equation (method of analysis) is only true for small angles (`\theta < 15^{\circ}`) such that `\sin \theta = \theta`. We also want to make sure the pendulum system has a small enough surface area to minimise the effect of air resistance on its oscillation.

In this instance, it would be invalid to use a pair of values (length and period) to calculate the value of gravitational acceleration. A more appropriate method of analysis would be to plot the length and period squared to obtain a linear relationship, then use the value of the gradient of the line of best fit to determine the value of `g`.

Accuracy

Accuracy refers to how close the experimental measurements are to the true value.

Accuracy depends on

the validity of the experiment

the degree of error:

systematic errors are those that are systemic in your experiment. That is, they effect every single one of your data points consistently, meaning that the cause of the error is always present. For example, it could be a badly calibrated temperature gauge that reports every reading 5 °C above the true value.
random errors are errors that occur inconsistently. For example, the temperature gauge readings might be affected by random fluctuations in room temperature. Some readings might be above the true value, some might then be below the true value.

sensitivity of equipment used.

Assessing Accuracy

The effect of errors and insensitive equipment can both be captured by calculating the percentage error:

$$\text{% error} = \frac{\text{|experimental value – true value|}}{\text{true value}} \times 100%$$

Generally, measurements are considered accurate when the percentage error is less than 5%. You should always take the context of the experimental into account when assessing accuracy.

While accuracy and validity have different definitions, the two are closely related. Accurate results often suggest that the underlying experiment is valid, as invalid experiments are unlikely to produce accurate results.

In a simple pendulum experiment, if your measurements of the pendulum's period are close to the calculated value, your experiment is accurate. A table showing sample experimental measurements vs accepted values from using the equation above is shown below.

All experimental values in the table above are within 5% of accepted (theoretical) values, they are therefore considered as accurate.

How to Improve Accuracy

Remove systematic errors: for example, if the experiment’s measuring instruments are poorly calibrated, then you should correctly calibrate it before doing the experiment again.

Reduce the influence of random errors: this can be done by having more repetitions in the experiment and reporting the average values. This is because if you have enough of these random errors – some above the true value and some below the true value – then averaging them will make them cancel each other out This brings your average value closer and closer to the true value.

Use More Sensitive Equipments: For example, use a recording to measure time by analysing motion of an object frame by frame, instead of using a stopwatch. The sensitivity of an equipment can be measured by the limit of reading. For example, stopwatches may only measure to the nearest millisecond – that is their limit of reading. But recordings can be analysed to the frame. And, depending on the frame rate of the camera, this could mean measuring to the nearest microsecond.

Obtain More Measurements and Over a Wider Range: In some cases, the relationship between two variables can be more accurately determined by testing over a wider range. For example, in the pendulum experiment, periods when strings of various lengths are used can be measured. In this instance, repeating the experiment does not relate to reliability because we have changed the value of the independent variable tested.

Reliability

Reliability involves the consistency of your results over multiple trials.

Assessing Reliability

The reliability of an experiment can be broken down into the reliability of the procedure and the reliability of the final results.

The reliability of the procedure refers to how consistently the steps of your experiment produce similar results. For example, if an experiment produces the same values every time it is repeated, then it is highly reliable. This can be assessed quantitatively by looking at the spread of measurements, using statistical tests such as greatest deviation from the mean, standard deviations, or z-scores.

Ask yourself: "Is my result reproducible?"

The reliability of results cannot be assessed if there is only one data point or measurement obtained in the experiment. There must be at least 3. When you're repeating the experiment to assess the reliability of its results, you must follow the same steps, use the same value for the independent variable. Results obtained from methods with different steps cannot be assessed for their reliability.

Obtaining only one measurement in an experiment is not enough because it could be affected by errors and have been produced due to pure chance. Repeating the experiment and obtaining the same or similar results will increase your confidence that the results are reproducible (therefore reliable).

In the soft drink experiment, reliability can be assessed by repeating the steps at least three times:

The mass loss measured in all three trials are fairly consistent, suggesting that the reliability of the underly method is high.

The reliability of the final results refers to how consistently your final data points (e.g. average value of repeated trials) point towards the same trend. That is, how close are they all to the trend line? This can be assessed quantitatively using the `R^2` value. `R^2` value ranges between 0 and 1, a value of 0 suggests there is no correlation between data points, and a value of 1 suggests a perfect correlation with no variance from trend line.

In the pendulum experiment, we can calculate the `R^2` value (done in Excel) by using the final average period values measured for each pendulum length.

Here, a `R^2` value of 0.9758 suggests the four average values are fairly close to the overall linear trend line (low variance from trend line). Thus, the results are fairly reliable.

How to Improve Reliability

A common misconception is that increasing the number of trials increases the reliability of the procedure. This is not true. The only way to increase the reliability of the procedure is to revise it. This could mean using instruments that are less susceptible to random errors, which cause measurements to be more variable.

Increasing the number of trials actually increases the reliability of the final results. This is because having more repetitions reduces the influence of random errors and brings the average values closer to the true values. Generally, the closer experimental values are to true values, the closer they are to the true trend. That is, accurate data points are generally reliable and all point towards the same trend.

Reliable but Inaccurate / Invalid

It is important to understand that results from an experiment can be reliable (consistent), but inaccurate (deviate greatly from theoretical values) and/or invalid. In this case, your procedure is reliable, but your final results likely are not.

Examples of Reliability

Using the soft drink example again, if the mass losses measured for three soft drinks (same brand and type of drink) are consistent, then it's reliable.

Using the pendulum example again, if you get similar period measurements every time you repeat the experiment, it’s reliable.

However, in both cases, if the underlying methods are invalid, the consistent results would be invalid and inaccurate (despite being reliable).

Do you have trouble understanding validity, accuracy or reliability in your science experiment or depth study?

Consider getting personalised help from our 1-on-1 mentoring program!