FREQUENTLY ASKED QUESTIONS ???

This information is provided in response to questions about whole effluent toxicity (WET).  It was generated by the WET Expert Advisory Panels Steering Committee, all volunteers and all members of the Society of Environmental Toxicology and Chemistry (SETAC).  Each person is considered an expert in some aspect of WET, and the information provided here represents the consensus of the Committee’s collective expertise at the time this document was written (Feb., 1999).

This information is intended to stimulate further discussion about WET, WET-related research, and the science underlying WET.  The information is not to be construed as representing an official position of SETAC, the SETAC Foundation for Environmental Education, or the U.S. Environmental Protection Agency. [This information was produced under the WET Cooperative Agreement No. CX 824845-01-0.]

Is it possible that pathogenic organisms can influence the results of whole effluent toxicity tests?

If wastewater treatment plant effluent is discharged to a saltwater or brackish water body, should the species tested be a saltwater species (indicative of the receiving stream) or freshwater (indicative of effluent)?

How do you deal with discharges that are intermittent and/or the exposure duration to toxicants (e.g. biocides) is shorter than that used by standard WET test methods?

What do episodic pulses of toxicants to which Ceriodaphnia respond mean to an aquatic resource?

How will changes in calculation of the chronic growth endpoint affect testing results and why was this modification made?

Will changes in test organism age requirements affect testing results?

Whole effluent toxicity tests rely on the assumption that test organisms used are representative of a normal and healthy population. What indicators of test organism health are utilized in testing programs?

How can dilution water with significantly different characteristics than an effluent be used to characterize the effects of that wastewater?

What are the definitions of acceptability criteria for reference toxicant tests?

With regard to the quality assurance parameters of the testing methods, how should food suitability of the test be determined using a short-term chronic test?  What should the results of this test be?

How does increasing the difference in test concentration dilutions affect the prediction of response?

What options are available if synthetic water prepared does not meet the ranges for pH presented by the US EPA methods documents?

Why does US EPA use the biomass calculation in the current USEPA test procedure for the fathead minnow survival and growth test?

Can biologically significant levels of effect be selected for toxicity tests in order to reduce the reliance upon statistical significance in WET data interpretation?

How should WET tests be used to estimate effects on effluent dominated, ephemeral, and intermittent streams?

How should differences in exposure time and frequency between standard WET tests and actual receiving stream conditions be reconciled?

What are the influences of bacterial and/or fungal agents that are ubiquitous to soils and water as interferences in the chronic toxicity tests?

What are the different types of variability in whole effluent toxicity tests?

What specific factors influence WET test variability?

How can WET variability be quantified?

How can WET variability be reduced?

How are regulatory agencies currently addressing wet variability?

What are the basic concepts of WET test data analysis methods?

What are the advantages and disadvantages of hypothesis testing and point-estimates?

What biological conclusions can be made from the statistical analysis of toxicity tests?

Why are only certain species and methods used in the WET program?

The test species are not found in my receiving water, why should I use them?

What is the purpose of multiple species testing?

My effluent tests indicate there may be a problem but I can see fish in the area of my discharge – is there really a problem?

Is the use of alternate test species or methods allowable?

Can I use test organisms gathered from the wild (feral)?

How does the regulatory authority determine what type of test (acute or chronic) should be performed?

 I discharge a freshwater waste stream into an estuary/marine environment. Should I test with marine or freshwater test organisms?

How should the effects of genetic variability be considered in culturing test organisms and designing WET tests? 

What are the issues regarding "false positive" and "false negative" results?

In the USEPA Macrocystis pyrifera bioassay (West Coast Marine Species WET Test Methods:  Short Term Methods for Estimating the Chronic Toxicity of Effluents and Receiving Waters to West Coast Marine and Estuarine Organisms. USEPA 600/R-95-136, USEPA, 1995), the germ tube length endpoint is obtained by converting ocular micrometer units to the nearest micron using a calibration factor.  Typically, this factor is on the order of >2.0.   This result is an uncertainty of greater than +/- 2 microns for each measurement.  Once the mean germ-tube length of 10 germinated spores has been determined for each test chamber, should this value then be rounded to the nearest even numbered whole micron (assuming a conversion factor of 2.0) prior to statistical analysis to accurately reflect the resolution of the measurement ?


Stormwater related WET FAQs

What is meant by the term “first flush” when referring to collection of stormwater samples?

Are States using toxicity testing as a stormwater monitoring tool?  

a)      Have agricultural, urban, and industrial runoff toxicity been assessed with WET tests?  If so, what toxicant(s) have been identified?  

b)   Are acute and/or chronic test method(s) being used to assess stormwaters?

c)   Are there differences to consider when assessing continuous effluent discharges (e.g. POTW) vs. storm drain discharges that occur only when sufficient runoff is generated?

d)   Is capturing the first flush important?

How are stormwaters with extreme ranges of pH, dissolved oxygen (DO), conductivity and/or hardness evaluated for WET when these parameters are not within the ranges needed for conducting the standard toxicity test method?

Are single concentrations (100% stormwater, for example) compared to a control in WET stormwater tests or are multiple dilutions of the stormwater tested?

When would a multiple dilution test be performed if a single concentration test is initially conducted?

Is adherence to the 36 hours holding requirement necessary?

How can the standard test renewal practices specified in the testing manuals be followed given that storm events may be of short duration? 

Is timing of sample collection to a flow measurement important?


RETURN TO FAQs

Please submit the descriptions of WET issues that you would like to see evaluated by the SETAC expert advisory panels 

Questions, comments, and requests should be e-mailed to:
Society of Environmental Toxicology and Chemistry (SETAC)
1010 North 12th Avenue
Pensacola, FL 32501-3367 U.S.A.
T 850-469-1500
F 850-469-9778
e-mail setac@setac.org


Please send comments or questions regarding this page to our Webmaster

All materials copyright Society of Environmental Toxicology and Chemistry (SETAC), 1999, and may not be used
without written permission.

 

 

 

AND THE ANSWER IS.........

If wastewater treatment plant effluent is discharged to a saltwater or brackish water body, should the species tested be a saltwater species (indicative of the receiving stream) or freshwater (indicative of effluent)?

US EPA has provided guidance on this issue which can be found in the Technical Support Document, page 61, section 3.3.6 (US EPA, 1991).  If the regulatory authority regulates at the end of pipe (i.e., allows no mixing zone), then the options of the test species are limited to freshwater organisms.  If a mixing zone is permitted, than other options may be considered which include freshwater species as well as saltwater species which can be tested in a salted effluent.  In the cases of saltwater discharges into freshwater receiving systems where there is no mixing zone allowed, then the options of test species are limited to saltwater test species.  If a mixing zone is allowed, then the toxicity tests need to determine if the salts in the effluent contribute to toxicity in the receiving system.  For this reason freshwater organism are recommended. 

US EPA. 1991. Technical support document for water quality-based toxics control.  EPA/505/2-90-001.  U.S. Environmental Protection Agency, Office of Water, Washington, DC.

RETURN TO FAQs

How do you deal with discharges that are intermittent and/or the exposure duration to toxicants (e.g. biocides) is shorter than that used by standard WET test methods?

Toxicity is a function of exposure which may be defined as the magnitude, duration, and frequency with which organisms interact with biologically available toxicants.    To achieve a good understanding of that exposure, an adequate sampling program for WET that is characteristic of the discharge should be implemented.  WET is meant as a level screening to identify potentially toxic effluents.  WET tests were developed to be generic to all situations and it is in the TRE phase that specific conditions such as pulse toxicity over short durations are identified.  In our opinion when the duration portion of exposure is well defined it should be simulated in tests. 

RETURN TO FAQs

What do episodic pulses of toxicants to which Ceriodaphnia respond mean to an aquatic resource?

It is the Steering Committee’s opinion (as well as the Pellston workshop proceedings) that if exposure is appropriate C. dubia is a good surrogate of potential instream toxicity.  However on a scientific basis, C. dubia like any single species, does not model all systems, all times.  Using WET C. dubia testing as the sole criterion for judging adequate protection of the aquatic resource is not appropriate. >From a scientific objective, toxicity impact should be judged on its impact or potential impact to the aquatic resource being protected.  A carefully designed, with adequate statistical power, bioassessment may be a more representative tool for evaluating impact from storm water events. 

RETURN TO FAQs

Earlier editions of the US EPA short term chronic toxicity manuals (US EPA, 1988, US EPA, 1989) defined calculation of results of organism growth using the total weight of surviving organisms per replicate by the number of surviving organisms in that replicate. The newly promulgated methods (US EPA, 1995) modified this specification such that total weight per replicate be divided by the initial number of organisms exposed.

How will changes in calculation of the chronic growth endpoint affect testing results and why was this modification made?

The Data Interpretation Panel is requesting an official statement from US EPA regarding the decision to switch from growth to biomass as an endpoint. This panel has begun deliberations specifically on these issues and will attempt to summarize the basis for current practice, evaluate and discuss the technical options.  This panel will also be participating in development of a course on statistical issues in toxicity testing that will further cover this topic. This course was be presented at the SETAC annual meeting in Charlotte, NC in November, 1998 Further information may appear in a special WET issue of Environmental Toxicology and Chemistry. 

US EPA. 1988. Short-term Methods for Estimating the Chronic Toxicity of Effluents and Receiving Waters to Marine and Estuarine Organisms.  Weber, CI, et. al. (eds.).  Office of Research and Development, Cincinnati, OH.  EPA/600/4-87/028.

US EPA. 1989. Short-term Methods for Estimating the Chronic Toxicity of Effluents and Receiving Waters to Freshwater Organisms.  Second Edition.  Weber, CI, et. al. (eds.). Office of Research and Development, Cincinnati, OH.  EPA 600/4-89/001.

US EPA. 1995. Short-term Methods for Estimating the Chronic Toxicity of Effluents and Receiving Waters  to West Coast Marine and Estuarine Organisms.  Chapman, GA, et. al. (eds). National Exposure Research Laboratory, Cincinnati, OH. EPA/R-95-136.

RETURN TO FAQs

Earlier editions of the US EPA acute toxicity manuals (US EPA, 1985) had defined allowable age range for fathead minnows as less than 90 days. The newly promulgated methods have modified this specification to younger test organisms, typically less than 14 days old. Will changes in test organism age requirements affect testing results?

 It is a logical assumption that minimizing allowable age ranges will reduce variability in results and will likely result in test organisms more sensitive to toxicants thus indicating toxicity where none was previously observed, though some exceptions may exist. The Data Interpretation Panel is requesting an official statement from US EPA regarding the decision to switch to the younger age class and how this may affect interpretation of test results. [Similar to the issue above, it is anticipated that this information will be discussed in the short course at the SETAC annual meeting in Charlotte, NC in November, 1998 and that further information may appear in a special WET issue of Environmental Toxicology and Chemistry.] 

US EPA, 1985. Methods for Measuring the Acute Toxicity of Effluents to Aquatic Organisms. Third Edition.  Peltier W, et. al., Office of Research and Development, Cincinnati, OH. EPA 600/4-85/013.

RETURN TO FAQs

Whole effluent toxicity tests rely on the assumption that test organisms used are representative of a normal and healthy population. What indicators of test organism health are utilized in testing programs?

Both subjective and objective (e.g. test acceptability criteria) indicators of organism health are available, some described within the methods manuals. Some national indicators exist which allow comparison of analytical results between laboratories (i.e. the DMRQA program for major NPDES facilities) or regional activities such as State WET certification programs which provide round-robin validation of test practice including organism health (e.g. North Carolina’s Biological Laboratory Certification program). Other national programs like the National Environmental Laboratory Accreditation Program (NELAP) are being followed by the WET EAP SC. Commonly used indicators of organism health are the required reference toxicity analyses and individual test acceptability criteria. Tests properly utilizing randomization procedures along with required and suggested quality control standards retain many built-in checks of typical organism response. 

RETURN TO FAQs

In freshwater analyses, moderately hard reconstituted water or 20% dilute mineral water required by testing methods may have total hardness, pH, or dissolved carbon characteristics  different than effluents. How can dilution water with significantly different characteristics than an effluent be used to characterize the effects of that wastewater?

It is not reasonable to assume any inherent link between dilution characteristics and those of the effluent since it is measurement of the effects of the effluent and its divergence from natural characteristics that is the goal of the analysis. There should however, be a reasonable relationship between the dilution water and natural surface water, to the extent practical in developing a standard capable of approximating natural conditions within the bounds established by the method. 

RETURN TO FAQs

What are the definitions of acceptability criteria for reference toxicant tests?

 Reference toxicant tests should meet the same test acceptability criteria as those of compliance test. With regard to assessment of organism health and the overall test practice, US EPA has recommended that routine reference toxicant tests be performed to establish a CUSUM or cumulative summation chart of testing results. Normal results should lie within plus or minus two standard deviations of the cumulative mean value of point estimate endpoints. Values falling outside of those ranges should result in careful scrutiny of the data and testing systems. Data produced during these “out of control” conditions should be considered suspect. 

RETURN TO FAQs

With regard to the quality assurance parameters of the testing methods, how should food suitability of the test be determined using a short-term chronic test?  What should the results of this test be?

For the chronic methods, when foods other than the recommended food are used, the foods must be evaluated similarly to the entire test system.  Side-by-side tests with the required food and the alternative food are required as well as reference toxicant analyses, this will cover all aspects and variables of the testing system, including food quality.  The results of these tests must meet the test acceptability requirements and should fall within control bounds for each species. 

RETURN TO FAQs

How does increasing the difference in test concentration dilutions affect the prediction of response?

Better resolution around threshold effect concentrations provide better input to mathematical models to predict point estimations of effect and reduce uncertainty in hypothesis tests of effect. Reducing the distance between effluent dilutions should be encouraged. There may be some confusion about US EPA's specification of dilution series in these cases. The methods specify a minimum set of dilutions, i.e. no wider than 0.5 dilution between concentrations. No limitations on added concentrations within that range exist. Experimental design should account for concentrations of concern and should attempt to maximize resolution in that range. Test design should maximize test concentrations around the effect concentration of concern, i.e. the instream waste concentration or limited concentration of a discharging facility, in order to minimize the need for interpolation of effects between tested concentrations. 

RETURN TO FAQs

What options are available if synthetic water prepared does not meet the ranges for pH presented by the US EPA methods documents?

Though this is a rare occurrence, it apparently does happen. We would encourage you first to identify what characteristics of the base water and constituents forces pH beyond the listed range (7.4-7.8 for moderately hard reconstituted using reagent grade chemicals and 7.9-8.3 for moderately hard prepared using mineral water). We emphasize that prepared water needs to be prepared according to directions provided by the methods and receive the 24 hours of vigorous aeration suggested in order to stabilize. Assuming all else has failed, we are aware that some regulatory authorities have allowed pH modification of dilution water according to methods prescribed for altering effluent pH (e.g. section 8.8.8 of the freshwater chronic manual, 3rd. ed.) using 1N NaOH or HCl dropwise. You should validate this interpretation with your specific state or regional regulatory agency. 

RETURN TO FAQs

Why does US EPA use the biomass calculation in the current USEPA test procedure for the fathead minnow survival and growth test?

 The most recent short-term chronic test method for the fathead minnow (USEPA, 1994) requires that biomass (obtained by dividing final dry weight by the number of original larvae in the test replicate) be used for the growth endpoint rather than average dry weight per surviving larvae used in previous procedures (e.g., US EPA, 1989).  The main concern of the regulated community is not so much the biomass concept but the change from growth to biomass as an endpoint since it can increase the perceived toxicity of the sample.  The biomass endpoint may or may not result in toxicity estimates that are more sensitive than the previously used growth endpoint.  The WET Expert Advisory Panel’s Data Interpretation Panel (SETAC, 1998) hopes to obtain a formal response from US EPA as to why the change was made and provide data which demonstrate the problem.  As an advisory panel, we do not intend to ask USEPA to change policy. 

SETAC. 1998.  SETAC Whole Effluent Toxicity Expert Advisory Panel on Performance Evaluation and Interpretation of WET Data.  Society of Environmental Toxicology and Chemistry (SETAC) Foundation for Environmental Education, Pensacola, FL.

US EPA. 1989.  Short-term methods for estimating the chronic toxicity of effluent and receiving waters to freshwater organisms.  Environmental Monitoring Systems Laboratory, Cincinnati, OH.  EPA/600/4-89/001.

US EPA. 1994.  Short-term methods for estimating the chronic toxicity of effluent and receiving waters to freshwater organisms.  Environmental Monitoring Systems Laboratory, Cincinnati, OH. EPA/600/4-89/002.

RETURN TO FAQs

Can biologically significant levels of effect be selected for toxicity tests in order to reduce the reliance upon statistical significance in WET data interpretation?

This is a question discussed since the early days of environmental toxicology.  The existence of a standard or even method/endpoint specific effect level which can be deemed biologically significant in all cases is doubtful.  Much of the concern is due to the use of hypothesis tests to determine the NOEC. Grothe et al. (1996) (sections 3.3 - 3.8) discusses the advantages and disadvantages of NOEC’s and point estimates in the analysis of toxicity test data and makes specific recommendations for future needs.  Among the recommended research is the use of MSD criteria and evaluation of appropriate effect levels for point estimators but such work is beyond the scope of the Expert Advisory Panels. 

Grothe, D. R., K. L. Dickson, and D. K. Reed-Judkins, eds.1996. Whole Effluent Toxicity Testing: An Evaluation of Methods and Prediction of Receiving System Impacts, SETAC Press, Pensacola, FL, USA. 340 p.

RETURN TO FAQs

How should WET tests be used to estimate effects on effluent dominated, ephemeral, and intermittent streams?

 Assessment of wastewater effects on effluent dominated, ephemeral, or intermittent streams require some consideration of the beneficial uses of the water body when developing WET permits.  If resident fish and invertebrate populations are consistent with the intended use of the stream, then limits and/or test parameters should be developed which protect the health of this ecosystem.  However, if the intended use of the water body is not being realized, then WET tests designed to achieve this goal are appropriate.  Next to clearly defined beneficial use designations, open-minded and informed communication between the dischargers, regulators, and testing facilities are probably the most vital component in the development of WET monitoring for such streams. 

RETURN TO FAQs

How should discrepancies between WET and total chemical results be interpreted particularly when the bioavailability of the chemical is reduced due to effluent or receiving water matrix effects

The difficulty of determining the bioavailability of potentially toxic compounds is one of the fundamental reasons for the use of toxicity tests (which inherently measure biological availability) to regulate discharges and receiving water.  Total chemical and WET analyses provide vital information that should be considered in developing permit limits for a facility.  Communication between the permit holder and the regulating authority during permit development should be and is vital in such situations.

RETURN TO FAQs

How should differences in exposure time and frequency between standard WET tests and actual receiving stream conditions be reconciled?

WET tests should be conducted to be predictive estimators of in-stream conditions.  For effective control of toxicity to occur it is important that appropriate exposure conditions be considerations.  Some investigators have used in situ bioassays as a way to obtain realistic exposure conditions but few, if any NPDES permits utilize such methods.  One should bear in mind though that WET permit limits are intended to be predictive and protective of a design condition.  It is a fact that WET tests may predict a response that is not being realized instream during less-than-worst case conditions. 

RETURN TO FAQs

What are the influences of bacterial and/or fungal agents that are ubiquitous to soils and water as interferences in the chronic toxicity tests?

Biological interference in this context is used to mean organisms pathogenic to the test organisms that obscure the interpretation of toxicity effects.  The SC has received several letters related to the problem of pathogenic interference in toxicity tests and the WET EAP Performance Evaluation and Interpretation of WET Data has drafted a document which describes the common characteristics associated with biological interference in WET tests and some suggested solutions.  The paper is being reviewed at this time and will be posted on the SETAC website when finalized. 

RETURN TO FAQs

What are the different types of variability in whole effluent toxicity tests?

Variability is inherent in any analytical procedure. The precision of a method describes the closeness of agreement between test results obtained from repeated testing of a prescribed method. WET test precision can be categorized by: 1) intratest (within-test) variability, 2) intralaboratory (within-laboratory) variability, and 3) interlaboratory (between-laboratory) variability. Intratest variability can be attributed to variables such as the number of treatment replicates, the number of test organisms exposed per replicate, and the sensitivity differences between individual organisms(i.e., genetic variability). Intralaboratory variability is that which is measured when tests are conducted under reasonably constant conditions in the same laboratory (e.g., reference toxicant or effluent sample tested over time) . Sources of intralaboratory variability include those factors described for intratest variability, as well as differences: 1) in test conditions (e.g., seasonal differences in dilution water quality, differences in environmental conditions), 2) from test to test in organism condition/health, and 3) in analyst performance from test to test. Interlaboratory variability reflects the degree of precision that is measured when the same sample or reference toxicant is analyzed by multiple laboratories using the same methods. Variability measured between laboratories is a consequence of variability associated with both intratest and intralaboratory variability factors, as well as differences allowed within the test methods themselves (e.g., source of dilution water), technician training programs, sample and organism culturing/shipping effects, testing protocols, food quality, and testing facilities.

Two general categories of variability are of greatest concern: 1) analyst experience, and 2) test organism condition/health. The experience and qualifications of the analyst who actually performs the toxicity test in the laboratory will dictate how well the culture and test methods are followed and the extent to which good judgment is exercised when difficulties/issues arise in the process of conducting the test, analyzing the data, and interpreting the results. Improper utilization of WET methods can have a substantial impact on test result variability. Guidance for specific test conditions and standard methods to control many causes of variability are found in the USEPA (U.S. Environmental Protection Agency) methods manuals (USEPA 1993, USEPA 1994a, USEPA 1994b, USEPA 1995). Strict adherence to these methods can greatly reduce variability.

USEPA. 1993. Methods for measuring the acute toxicity of effluents and receiving waters to freshwater and marine organisms. 4th ed. Weber C.I., editor. Cincinnati: U.S. Environmental Protection Agency (USEPA) Office of Research and Development. EPA/600/4-90/027F. 293 p.

USEPA. 1994a. Short-term methods for estimating the chronic toxicity of effluents and receiving waters to marine and estuarine organisms. 2nd ed. Klemm, D.J., Morrison, G.E., Norberg-King, T.J., Peltier, W.H. and Heber, M.A., editors. Cincinnati: U.S. Environmental Protection Agency (USEPA) Office of Research and Development. EPA/600/4-91/003. 341 p.

USEPA. 1994b. Short-term methods for estimating the chronic toxicity of effluents and receiving waters to freshwater organisms. 3rd ed. Lewis, P.A., Klemm, D.J., Lazorchak, J.M., Norberg-King, T.J., Peltier, W.H. and Heber, M.A., editors. Cincinnati: U.S. Environmental Protection Agency (USEPA) Office of Research and Development. EPA/600/4-91/002. 341 p.

USEPA. 1995. Short-term methods for estimating the chronic toxicity of effluents and receiving waters to west coast marine and estuarine organisms. Chapman, G.A., Denton, D.I., Lazorchak, J.M., editors. Cincinnati: U.S. Environmental Protection Agency (USEPA) Office of Research and Development. EPA/600/R-95-136. 661 p.

RETURN TO FAQs

What specific factors influence WET test variability?

There are a number of factors that can meaningfully influence the variability of test results. These factors include, but are not limited to, those listed below.

Sample Characteristics The nature of the sample collected can have a significant influence on the outcome of a WET test. Care must be exercised to collect the most representative sample possible during the time frame of interest. Sample volume can influence the outcome of a toxicity test. For example, if the sample-to-container-wall ratio is small, or if the sample-container contact time is especially long before the sample is refrigerated; certain particulate-active constituents such as zinc (Chapter 5 in Groethe et al. 1996), polymeric substances, charged materials, or hydrophobic chemicals in a sample can interact with the container. Samples too small in volume may also increase the potential of collecting a non-representative fraction of a non-homogenous sample stream. The type of sample (i.e., grab or composite) may influence the outcome of a WET test and contribute to variability. Grab samples may hit or miss toxicity spikes thus possibly increasing the variability between samples taken at different times at the same outfall. Composite samples will average concentrations over the entire collection period, possibly smoothing peaks and valleys of toxicity in variable water media. The various USEPA method manuals review the importance of using appropriate sample types for different types of effluents. Storage and handling can affect the toxicity and variability of samples. The general assumption is that the toxicity of a sample is most likely to decrease with holding time due to factors such as biodegradation, hydrolysis, and adsorption. These factors are minimized by "cold" storage and shipment on ice as well as test initiation within the specified USEPA guidelines. Water samples for WET testing may be manipulated in a variety of ways to comply with special requirements or circumstances. This applies, for example, when freshwater effluents are discharged to a saline receiving stream and marine or estuarine organisms are used for testing. Care must be taken, in this case, that ionic strength and composition are within levels tolerated by the specific test organisms or results may not be representative of actual toxicity or comparable between labs.

Abiotic conditions Abiotic conditions can strongly influence the variability of WET test results. For that reason, most of the abiotic conditions that should be standardized during WET testing (DO, light, hardness, alkalinity, etc.) are specified in protocols contained in the USEPA methods manuals. While these factors may not be problematic sources of variability within tests, they may be of major concern across tests (both within and among laboratories). Very small ranges of temperatures are specified for WET testing. Test solution pH can influence the bioavailability and toxicity of chemical constituents, such as some metals (e.g., Cu, Zn) and ammonia. Careful use of dilution waters, salinity adjustments, aeration, feeding, and other factors causing shifts in pH will help to reduce variability.

Exposure In WET testing, we seek a balance between realistically mimicking exposure scenarios and evaluating effluents with sufficient testing while controlling testing costs. Variability in test results can be greatly influenced by the method of exposure chosen (i.e., static, static renewal, and flow-through). For example, tests of samples with nonpersistent toxicants or with chambers with high loading rates will be influenced to a greater degree using a static design rather than a flow-through design. As the number of variables which influence test results increases, overall test variability increases unless those variables are controlled. However, flow-through tests are much more costly than static tests. The number of concentrations and dilution series may influence variability of the test results. Point estimate models will more precisely estimate the statistical endpoint if the test concentrations are near the actual LCx (concentration that is lethal to x% of organisms), ECx (concentration that affects x% of organisms), or ICx (concentration that inhibits response by x%). In contrast, as the NOEC approaches the concentration at which effects begin to be observed (i.e., LOEC), estimates may show greater variation. Many NPDES permits include a test dilution that is consistent with the Instream Waste Concentration (IWC) based upon dilution in the receiving system. The minimum number of tested dilutions recommended can be increased, particularly in the range of expected effects (if known), in order to improve resolution of the acute or chronic endpoint. Costs of increased dilutions testing are incremental to the cost of a typical test, but such testing is cost effective in cases where small changes in organism responses may affect compliance.

The WET endpoint is a function of test duration, in most cases (percent mortality after a period of time, for example). Test duration can be a function of the endpoint that is to be assessed. In at least one situation, the C. dubia survival and reproduction test, exposure duration is governed by the amount of time needed for 60% of the control organisms to produce a third brood (up to 8 days), at which time the test is repeated if the control performance is not acceptable (USEPA 1994b). The timing for test termination can therefore vary between 6 and 8 days. This introduces the possibility of intertest variability in terms of both number of young produced and test sensitivity due to exposure duration. The cost of reducing test duration variability is small; the corresponding reduction in test results variability could, however, be significant.

Sample Toxicity The exposure-response relationship can be affected by the sensitivity of the test species to the individual and combined chemicals of a sample as well as the concentrations of those chemicals in that sample. Testing of samples which exhibit high slopes in their concentration-response curves at the test statistical endpoint (LCx, ECx, and ICx) tends to provide less variable (intratest and inter-test) results than tests of samples exhibiting low slopes in their concentration-response curves. The sensitivity of different species to any single chemical or mixture of chemicals can also be quite different, even when all variables are held constant. For example, rainbow trout are approximately an order of magnitude more acutely sensitive to cadmium than daphnids (USEPA 1985a) while daphnids are approximately 2.5 times more acutely sensitive to chlorine than rainbow trout (USEPA 1985b). Herbicides (i.e., atrazine) are more acutely toxic to plants than fish (Solomon et al. 1996). This is why vertebrates, invertebrates, and plants are recommended for testing effluents in the NPDES program.

Food Food quality can vary in a number of ways. Organisms whose diets vary in nutritional quality and size, before and during testing, may respond differently to the same sample under identical test conditions. For example, brine shrimp nauplli that are less than 24 hours old are required in all tests using these organisms as food to maintain the nutritional quality of the nauplii and to keep their size at the optimum for consumption by test organisms. The YCT and algal diet for Ceriodaphnia dubia should contain specific concentrations of solids and algal cells as outlined in the manual. The quantity of food available can affect dissolved oxygen and pH levels within a test chamber and act as a substrate for the absorption and adsorption of toxic chemicals from the tested sample, thus reducing bioavailability.

Dilution water Optimally, the dilution water should replicate the quality of the receiving water. However, if the objective of the test is to estimate the absolute toxicity of the sample (effluent), which is the primary objective of NPDES permit-related toxicity testing, then a synthetic (standard) dilution water is used (USEPA 1993, USEPA 1994a, USEPA 1994b). If the objective is to estimate the toxicity of the sample in uncontaminated receiving water, then the test may be conducted using non-toxic receiving water. Dilution water quality can affect the toxicity of effluent, surface water, and stormwater dilutions by modifying the bioavailability of toxic chemicals in the sample. In addition, parameters such as TDS (hardness, salinity, conductivity), turbidity, DO, pH, micronutrients, and bacteria counts can impact test organism physiology, sensitivity, and biological response. Therefore, test variability at all levels can be affected by variability in dilution water quality. Synthetic dilution water quality can also vary with the age of the prepared water in relation to the exposure of test organisms and with the source and quality of the base water.

Organism history and handling Perhaps one of the most important considerations in controlling WET variability is an organism's pretest history of health and maintenance, which consists of four factors: collection, culture, acclimation, and handling specific to the test. Organism history can be evaluated through charting performance of laboratory controls with a reference toxicant over time. All practical attempts should be made to avoid use of field-collected animals for WET testing. The most common sources of test organisms for WET tests are in-house cultures and/or organism suppliers. Organisms to be tested, whether field-collected or cultured, may require acclimation to test conditions. Variation in acclimation practices between tests can result in the use of organisms of varying sensitivity between tests. The importance of analyst technique is most pronounced when the analyst handles organisms before and during the test.

Randomization Results will be variable in all analytical techniques, not just WET, despite all efforts to eliminate and reduce sources of variability. The randomization approach used to assign test replicates within an incubator or water bath and the approach used to assign test organisms to test replicates are attempts to evenly distribute this variability within the testing environment and between organisms. All test methods include procedures for randomization which must be followed.

Organism numbers The number of organisms exposed in a toxicity test has a direct and calculable bearing on the ability of that test to detect and estimate effects resulting from that exposure. Generally, as the total number of organisms increases in a test, the ability to detect effects (i.e., statistical power in a hypothesis test) and the certainty in point estimates increases. Differences in number of organisms per replicate and treatment can be due to the loss of individuals or replicates through analyst errors or to the death or lack of response of all organisms in one or more replicates. The former reduces power or effect-estimate certainty (point estimate confidence intervals) by reducing sample size. The latter may reduce power or effect-estimate certainty by increasing variation in response relative to other replicates and treatments. Intra- and interlaboratory variability can include the factors discussed above, as well as possible differences in study design (total number of organisms and total number of replicates).

Organism age and quality The recommended ages of test organisms for established protocols have two general considerations: (1) relative physical sensitivity of different life stages to the test conditions, independent of the challenges of a toxicant and, (2) relative sensitivity of different life stages to toxic constituents. Young organisms are often considered more sensitive to toxic and physical stressors than their older counterparts. For this reason, the use of early life stages, such as first instars of daphnids and juvenile mysids and fish, is recommended for all tests.

The effects of organism age on WET variability are potentially greatest between tests and between laboratories where age differences may be greater. As examples, all C. dubia used in a reproduction test must be within 8 hours of age but can be up to 24 h old; and fathead minnow larvae used in the growth test must be within 24 hours of age in a single test but could range between 1 to 2 days depending on whether the organisms are cultured in-house or shipped from an off-site culture facility. In the acute tests with fathead and sheepshead minnows, the age difference between tests can range from <24 h to 14 d.

Grothe, D. R., K. L. Dickson, and D. K. Reed-Judkins, eds.1996. Whole Effluent Toxicity Testing: An Evaluation of Methods and Prediction of Receiving System Impacts, SETAC Press, Pensacola, FL, USA. 340 p.

Solomon, K.R., D.B. Baker, R.P. Richards, K.R. Dixon, S.J. Klaine, T.W. LaPoint, R.J. Kendall, J.M. Giddings, J.P. Giesy, L.W. Hall, Jr. and W.M. Williams. 1996. Ecological risk assessment of atrazine in North America surface waters. Environ. Toxicol. Chem. 15:31-76.USEPA. 1985a. Ambient water quality criteria for cadmium - 1984. EPA 440/5-84-032. Office of Regulations and Standards, Washington, DC.

USEPA. 1985b. Ambient water quality criteria for chlorine - 1984. EPA 440/5-84-030. Office of Regulations and Standards, Washington, DC.

USEPA. 1993. Methods for measuring the acute toxicity of effluents and receiving waters to freshwater and marine organisms. 4th ed. Weber C.I., editor. Cincinnati: U.S. Environmental Protection Agency (USEPA) Office of Research and Development. EPA/600/4-90/027F. 293 p.

USEPA. 1994a. Short-term methods for estimating the chronic toxicity of effluents and receiving waters to marine and estuarine organisms. 2nd ed. Klemm, D.J., Morrison, G.E., Norberg-King, T.J., Peltier, W.H. and Heber, M.A., editors. Cincinnati: U.S. Environmental Protection Agency (USEPA) Office of Research and Development. EPA/600/4-91/003. 341 p.

USEPA. 1994b. Short-term methods for estimating the chronic toxicity of effluents and receiving waters to freshwater organisms. 3rd ed. Lewis, P.A., Klemm, D.J., Lazorchak, J.M., Norberg-King, T.J., Peltier, W.H. and Heber, M.A., editors. Cincinnati: U.S. Environmental Protection Agency (USEPA) Office of Research and Development. EPA/600/4-91/002. 341 p.

RETURN TO FAQs

How can WET variability be quantified?

lntratest variability Intratest variability is the variability of the responses (survival, growth, or reproduction), both among and between concentrations of the test material for a given test. Hypothesis test intratest variability is derived for an individual test by pooling the variability at each concentration including the control to obtain an estimate of the random error for the test. The intratest variability is used to determine the amount of difference from the control that can be detected statistically. When adjusted for the control mean, the minimum significant difference (MSD) represents the amount of difference expressed as a percentage of the control response (MSD%). Intratest variability for the point estimate approach is also represented by an estimate of the random error for the test, the mean square error (MSE). The MSE is one component in the calculation of confidence intervals for a point estimate, thus the width of a 95% confidence interval provides an indication of the magnitude of the intratest variability.

The intratest variability is the foremost single measure used to indicate the statistical sensitivity of a WET test analyzed with the hypothesis test approach. Statistical sensitivity, in this case, equates to a test's ability to distinguish a difference between an exposure concentration and the control. Controlling or reducing the amount of variability within a single test will increase the power of the test and therefore the ability of the test to detect responses that differ from the control response (decrease MSD). Increased power will also increase certainty in the determination of a difference from controls, which is important to regulators and the regulated community. However, minimal variability in all treatments of a test may lead to such high statistical power that detected differences may not be biologically significant. Such tests should be interpreted with caution. Although there is no specific guidance from the USEPA on statistical versus biological significance, various States and USEPA Regions have developed some guidelines (e.g., see FAQ on addressing variability). Close attention to the factors described under the FAQ on factors affecting variability will tend to decrease heterogeneity among replicates and decrease intratest variability. In addition, increasing the number of replicates will also lead to an increase in the sensitivity of the test by decreasing the MSD.

Intratest variability is also important in representing the uncertainty associated with point estimates of toxicity. As the 95% confidence intervals of the point estimate increases, the uncertainty in that estimate of the statistical endpoint increases. The confidence intervals for chronic endpoints are directly influenced by the variability of response between replicates in each treatment and the model used to interpolate the point estimate. The confidence intervals for acute test results using a point estimate approach, however, are not influenced by variability between replicates but by the characteristics of the dose-response relationship. As discussed before, the certainty in point estimates is also a function of the dilutions tested and their proximity to the actual statistical endpoint being calculated. One will get a better estimate of the LC50 (tighter confidence intervals) if dilutions are tested near the concentration which actually results in 50% mortality.

Evaluation of a number of existing data sets by members of the Pellston workgroup (Sessions 3 and 4)(Groethe, et al, 1996) seemed to indicate that, for most WET test methods, MSDs of <40% were achievable. MSD's for most methods examined ranged from 18% to 40%. The consensus of the workgroup is that an additional study is necessary to determine the acceptable level of intratest variability for each USEPA recommended toxicity method, although some participants proposed that sufficient data exists to select MSD criteria. In the proposed study, data would be used to establish variability limits from laboratories that document data quality and adhere to USEPA method guidelines. Study data from each assay evaluation would include expected CVs, MSD, MSD%, MSE, and American Society for Testing and Materials (ASTM, 1992) "h" and "k" statistics. The "h" statistic represents a measure of the reproducibility between laboratories while the "k" statistic represents the repeatability within laboratories. Distributions of these values would be examined to determine criterion levels for intratest variability, and probabilities of laboratories exceeding the criterion levels would be calculated. The direct advantages of an acceptability criterion for intratest variability are 1) establishing a minimum protection level, 2) setting the power of a test to detect a toxic sample for each method, and 3) decreasing intra- and interlaboratory variability. Acceptability criteria will also allow users of WET data to better evaluate test acceptability, laboratory performance, and program effectiveness.

lntertest and interlaboratory variability The scientific community familiar with analytical procedures, not just WET, recognizes that tests performed on presumably identical materials in presumably identical circumstances do not typically yield identical results. An indication of a test method's consistency is its repeatability and its reproducibility with repeatability defined as the variability between independent test results obtained from the same laboratory in a short period of time and reproducibility defined as the variability between test results obtained from different laboratories.

Several measures of repeatability and reproducibility have been proposed. The simplest of these is the intra- and interlaboratory CV (standard deviation (s) of repeated test results, divided by the mean (m) of the repeated test results, multiplied by 100 (CV = (s/m) x 100). The intralaboratory CV is generated by test results from repeated tests performed in the same laboratory, while the interlaboratory CV is obtained from test results from several different laboratories. The use of the CV removes from consideration the units of the measurement and allows the analyst to compare variability of different types of test methods (i.e., WET tests with analytical chemistry tests). It also allows analysts to compare tests that use different scales of measurement.

However, CVs alone cannot be used as diagnostic tools to help identify unusual test values or outliers. Since the CV is a function of the standard deviation of a set of test results, the measure suffers from the same problems associated with standard deviations, and there is no common agreement on what is an acceptable standard deviation. For instance, the range of test values is an easier descriptive statistic to understand. In addition, the value of the standard deviation is affected by extreme values in the data set; single large or small test values inflate the standard deviation. The CV also ignores the 95% confidence intervals (uncertainty) associated with each point estimate and can only be calculated for point estimates. CVs are not appropriate for hypothesis test endpoint comparisons since the effect levels are fixed by the choice of test concentrations.

Quality management considerations Reference toxicant tests are typically used to monitor a laboratory's performance. Charting the performance of a laboratory's controls relative to its reference toxicant test results is a good way to track the laboratory's performance and to identify when the laboratory's performance is not acceptable. The width of a control chart's limits is an indication of a laboratory's capability to reproduce the desired endpoints of a reference toxicant test. However, control chart limits are a function of the reference toxicant, test species, test type (acute or chronic) and biological endpoint (survival, growth, etc.). These factors must be considered before drawing conclusions regarding laboratory performance. Performance on reference toxicant tests as recorded by control charts should be a criterion that is used by permittees in selecting which laboratories to use for WET tests.

Laboratories with very wide control limits, and/or many points outside of the control limits, should investigate problems related to the quality of the data being produced. Laboratories should monitor at a minimum, using control charts, the calculated endpoints for each test type/species combination. Laboratories can also monitor the control treatment mean response for survival, growth, and reproduction. In addition, laboratories can chart the control treatment replicate variance, or standard deviation. Reference toxicant tests are very important to track analyst technique and the health and condition of the test organisms. It is particularly important when performing these tests (as with all compliance toxicity tests) that the analysts precisely follow the published test methods, without deviation between tests.

ASTM-American Society for Testing and Materials. 1992. Standard practice for conducting an interlaboratory study to determine precision of a test method, E691-92. In: Annual Book of ASTM Standards, Vol. 14.02. Philadelphia, PA.

Grothe, D. R., K. L. Dickson, and D. K. Reed-Judkins, eds.1996. Whole Effluent Toxicity Testing: An Evaluation of Methods and Prediction of Receiving System Impacts, SETAC Press, Pensacola, FL, USA. 340 p.

RETURN TO FAQs

How can WET variability be reduced?

Groethe, et al. (1996) concluded that a number of factors are important in reducing variability. Some of the most important factors include establishing performance criteria associated with acceptable levels of test sensitivity and variability, careful adherence to test guidelines, adequate analyst expertise, and conscientious selection of contract laboratories.

Establishing performance criteria One way to control for excessive intratest variability is to establish performance criteria associated with acceptable levels of test sensitivity and variability. At present there are no codified performance criteria for acceptable levels of test sensitivity in the three promulgated USEPA WET manuals for acute and chronic toxicity tests. The only requirements for determining test acceptability are minimum levels for survival, growth and reproduction. Maximum MSDs are required in the west coast marine manual (USEPA 1995) as an additional test acceptability criterion, but these methods have not gone through the final promulgation process. Because the magnitude of the intratest variance may also differ between test species, performance criteria need to be established for each toxicity test method and endpoint. To establish performance criteria for intratest variability, multiple factors need to be considered: (1) balancing the level of statistical sensitivity and statistical power while at the same time controlling the costs of conducting WET tests, (2) level of intratest variability that can be achieved routinely by experienced laboratories and, (3) evaluation of existing databases to establish species-specific performance criteria and the determination of the most appropriate measure of intratest variability, (i.e., MSD, MSD%, control CV, treatment CV).

Following testing guidelines Testing guidelines, for WET or any other analytical procedure, are required both to instruct individuals in how to conduct the tests and to minimize the variability in how any one test is conducted within and between laboratories. When analysts deviate from the standard WET test methods through changes in the stated requirements the level of variability may be increased to a point that may adversely affect the test results. Failure to follow test methods could lead to unnecessary additional testing and expenditures in reducing toxicity to comply with a limit. Failure to follow test methods could also result in false positive or false negative indications of toxicity. The former would trigger additional testing even though toxicity is not present. The latter may postpone resolution of an actual toxicity problem resulting in potential impact to receiving water biota and its designated uses. One must be aware, however, that random and/or infrequent minor deviations from test guidelines may occur with the best analysts. These deviations should be reviewed on a case-by-case basis to determine if the outcome of the test was influenced by the deviation(s). Variation in test methods between States and USEPA Regions is also an issue because different conclusions regarding toxicity may be reached depending on the geographical location of the test. This reinforces the need for national consistency in WET testing methods. Also, State and Regional modifications to test procedures may not have undergone the same rigorous review process as the standards established by the national USEPA program. In the final analysis, to deviate from the established test method is to ensure a degree of increased variability that could be costly to the permittee or environmentally damaging to aquatic life.

Increasing analyst expertise Analyst experience, as well as organism health and performance, are probably the two most important aspects of any WET test. The ability to successfully complete toxicity tests is a direct function of the training and expertise that the analyst has accumulated to date. Over the last three years, the USEPA and state regulatory agencies have conducted Performance Audit Inspections on permittees' laboratories and their contract laboratories as part of the NPDES Compliance Inspection Program. It is not uncommon for laboratories to have staff who are responsible for the toxicity testing program but have no training in the biological sciences or little practical experience in WET testing. This lack of experience can be a major source of variability. As stated in USEPA (1993): "These methods are restricted to use by or under the supervision of analysts experienced in the use or conduct of aquatic toxicity testing and the interpretation of data from aquatic toxicity testing. Each analyst must demonstrate the ability to generate acceptable test results with these methods using the procedures described in this methods manual".

Experienced analysts are limited in quantity and always in demand. However, until it is recognized that increased analyst knowledge (or "training") and experience leads to decreased variability, the problem of analyst-induced variability will continue to influence WET testing. One would not consider chemical analysis of water samples, including effluents, without adequate analyst expertise and proper QA/QC. Since WET test results are as important as chemical analyses in most regulatory decisions, it seems only appropriate to have the same high standards for these test methods.

One of the criticisms of WET testing implementation is that permittees and regulators do not understand the science and regulations due to the lack of available training. There will always be a need for periodic training and routine evaluation of regulators working in WET implementation. The USEPA has responded to this need by collaborating with SETAC in developing and implementing a course in WET testing and the WET program which will be provided across the nation. This effort is led by SETAC’s Expert Advisory Panel on WET Training, which has just recently begun to offer the course. The National Environmental Laboratory Accreditation Conference (NELAC) is also planning to establish national standards for labs conducting WET testing. NELAC’s schedule to address standards for WET testing is unknown at this time.

Selecting contract laboratories Along with organism health (which is also linked to the quality of the laboratory, especially if the laboratory cultures its own test organisms), laboratory quality was considered by the Pellston workshop participants to be one of the most important factors affecting test variability. Because laboratory selection can be such an important factor in test results, it is important that the experience of the analysts be carefully considered. The educational qualifications and experience of the laboratory individuals who will actually perform the tests, as well as the qualifications of the supervisory staff, should be reviewed prior to laboratory selection. The toxicity testing laboratory should demonstrate a serious commitment to a quality assurance/control program that extends beyond analyst experience. Most qualified laboratories will have QA/QC manuals for review. Considerations such as an ongoing reference toxicant program, a two-tiered review process for all toxicity test data and summary reports, a sample custody tracking system that is always used, proper equipment maintenance, dilution water quality monitoring, facility maintenance, and attention to test organism health are all characteristics of a laboratory that is committed to generating quality data.

The costs associated with more experienced and better qualified laboratories can sometimes be higher than those of less qualified laboratories, depending on region and volumes of samples tested. Many permittees are constrained by existing procurement regulations that require the selection of the least expensive (and potentially least qualified) bidder. Perhaps one way to improve this situation is to convince the individuals responsible for making procurement decisions that WET testing is a professional service (much like engineering and chemical analyses services), which may give more latitude in selecting better qualified laboratories, rather than simply those that charge the least. A more direct approach is to better define laboratory acceptance criteria in the specifications released for bid. Only those labs which can meet the specifications would be considered responsive, and only the lowest bid among those responsive would be considered for contract.

Probably the best laboratory-selecting tool is obtaining recommendations from other individuals who have the expertise to critique lab performance. Since WET testing is required for many reasons, one can find several individuals or firms who have been required to perform compliance toxicity tests, and it is very easy and straightforward to obtain information from them on how well their toxicity testing laboratory met their needs. The regulated community has every reason to be honest in their assessments (in fact, there is a real incentive not to be dishonest if they value their relationship with the person asking the question), and as a consequence, the workgroup felt that this is the best source of information currently available for making a decision regarding selection of a toxicity testing laboratory.

Grothe, D. R., K. L. Dickson, and D. K. Reed-Judkins, eds.1996. Whole Effluent Toxicity Testing: An Evaluation of Methods and Prediction of Receiving System Impacts, SETAC Press, Pensacola, FL, USA. 340 p.

USEPA. 1993. Methods for measuring the acute toxicity of effluents and receiving waters to freshwater and marine organisms. 4th ed. Weber C.I., editor. Cincinnati: U.S. Environmental Protection Agency (USEPA) Office of Research and Development. EPA/600/4-90/027F. 293 p.

USEPA. 1995. Short-term methods for estimating the chronic toxicity of effluents and receiving waters to west coast marine and estuarine organisms. Chapman, G.A., Denton, D.I., Lazorchak, J.M., editors. Cincinnati: U.S. Environmental Protection Agency (USEPA) Office of Research and Development. EPA/600/R-95-136. 661 p.

RETURN TO FAQs

How are regulatory agencies currently addressing wet variability?

The WET Expert Advisory Panel (EAP) on Performance Evaluation and Interpretation of Data contacted over 20 different regulatory agencies, consisting of States and USEPA Regions, to collect information regarding WET variability and how it is addressed within the context of the NPDES program. These agencies were chosen based on information provided by various EAP members as well as the response of agencies to a recent survey on WET conducted by the Water Environment Research Foundation. The discussion presented here is not intended to represent a complete survey of agencies on this topic. It is likely that there are agencies which were not contacted but which have implemented controls for WET variability. Each of the selected agencies was asked to describe how it characterizes WET variability and the type of control measures that they have in place to address WET variability.

The majority of the agencies contacted indicated that various forms of WET variability were either not being addressed on a routine basis or being addressed using best professional judgment without any specific (Federal Register, etc.) guidance from their State or Federal agencies. Many agencies did indicate that they recognize the impact of WET variability on test and lab performance, in addition to data interpretation; and that professional judgment was the only resource available to them when interpreting WET data. However, there were agencies who did not acknowledge that variability is an issue in WET testing. Approaches used by some of these agencies to qualify WET test results include review of dose response, minimum significant differences (MSDs), point estimate confidence interval width, and reference toxicant control chart variability. Several agencies voiced concern that national guidance on how to deal with WET variability was not available although it is very much needed. Four of the agencies contacted have adopted and implemented control measures and/or interpretation guidelines designed to address WET variability, most of which apply to chronic tests. Each of these agencies and their approaches are outlined below.

North Carolina

NC uses hypothesis testing to calculate chronic endpoints

- Practical Sensitivity Criteria (PSC) of 20% adopted for C. dubia reproduction. Decisions on the use of "PSC" were based on actual results of over 4,000 C. dubia tests and actual laboratory performance in those tests.

If the treatment response is statistically less than the control response, but the difference between the treatment and control is less than 20% (PSC), then toxicity has been detected below the PSC and the treatment is in compliance with permit decision criteria

NC also limits the maximum degree of variability that can occur in the C. dubia control response for reproduction at a coefficient of variation (CV) of 40%

USEPA Region VI

uses hypothesis testing to calculate chronic endpoints

a CV limit of 40% on controls and instream waste concentrations (IWCs) has been adopted to qualify valid tests

increases the number of replicates for all acute tests and the chronic P. promelas test beyond that required in 40 CFR Part 136 methods to reduce the frequency of false positive and false negative indications of toxicity

test meets permit decision criteria if there is 80% or greater survival in IWC treatment and all lower concentrations, regardless if statistical difference found at these same concentrations

averages chronic test results for each species and endpoint per monitoring period to determine permit compliance, rather than using each individual test result to determine compliance

USEPA Region IX

maximum %MSD upper limits have been adopted for marine chronic tests and tests exhibiting MSDs above the limit are determined not to meet test acceptability criteria (TAC) (USEPA 1995)

the % MSD limits range from 20-50%, depending on test and endpoint

State of Washington

acute or chronic tests must be able to detect a 30% or 40% difference, respectively, from controls.

to address false positive indications of toxicity, alpha is switched from 0.05 to 0.01 when differences in acute or chronic tests are less than 10% and 20%, respectively,

The following agencies were contacted directly by the WET EAP on Performance Evaluation and Interpretation of Data to collect information included in the text above: CA Regional Water Control Board (RWCB) #2, CA RWCB#3, DE, FL, GA, IL, MD, MI, MN, NC, NJ, NV, OH, PA, RI, SC, TN, VA, VT, WA, WV, USEPA Regions VI and IX. Contacts for these agencies can be provided upon request or may be available on the SETAC WET EAP contacts page

USEPA. 1995. Short-term methods for estimating the chronic toxicity of effluents and receiving waters to west coast marine and estuarine organisms. Chapman, G.A., Denton, D.I., Lazorchak, J.M., editors. Cincinnati: U.S. Environmental Protection Agency (USEPA) Office of Research and Development. EPA/600/R-95-136. 661 p.

RETURN TO FAQs

What are the basic concepts of WET test data analysis methods?

The purpose of a Whole Effluent Toxicity (WET) test is to assess the toxicity of an effluent, surface water, or other water sample(s) in a relatively brief time period. The toxicity evaluation usually occurs in one of two ways: an evaluation based on an observed result ( i.e. hypothesis test) or an evaluation based on a standard level of effect ( e.g. point estimate). The intended use of the toxicity evaluation and the design of the WET test should be considered in the decision to follow the concept of standard level of effect versus the concept of the observed result.

When a statistical hypothesis testing approach is used in WET testing, there is an underlying assumption that statistical significance equals toxicity and. statistically insignificant effects are considered non-toxic. However, while these effects are considered non-toxic, failing to reject the null hypothesis does not mean that the effects were biologically insignificant. For example, a sample may have elicited a 30% effect but the hypothesis test may be unable to detect a statistical difference at this level of effect. Similarly, when a statistical difference is detected and the sample is considered toxic, it does not necessarily reflect biologically significant effects. For example, a sample which displayed a 3% effect relative to the control response can be statistically significant but this level of effect may not be biologically significant.

In WET testing, the simplest hypothesis testing situation is comparing a single concentration of interest, such as a surface water sample or the in-stream waste concentration (IWC), with a control condition. If there is only one concentration of interest, then it is logical to ask the question: Is there any difference between the sample response and the control response? In this situation, it is intuitively appealing to do a hypothesis test on the equality of the two conditions. A natural extension is the case where several serial dilutions of a sample are compared to the control condition to find the lowest concentration where differences exist. In the multi-concentration case, the no observable effect concentration (NOEC) is defined as the highest concentration that is not statistically different from the control condition. The ability to detect differences, in either the multi-concentration or single-concentration case, is dependent upon the statistical design used for the experiment, the variability of the biological responses, and the subsequent statistical analysis.

The use of point estimates to assess toxicity requires the selection of an effect level above which a sample is deemed toxic. Conversely, if the effect is less than the regulatory effect (e.g. 25 %) then the sample is considered non-toxic. The difficulty in selecting and justifying an appropriate effect level has likely hindered the widespread use of this EPA approved method for chronic toxicity data analysis in WET monitoring programs. When used, significant levels of effect have been based upon a statistical model (LC50 for probit analysis) or the performance characteristics (IC25 » NOEC) of the toxicity test. Selection of effect levels based upon coefficient of variations (CV’s) from multiple reference toxicant test results has also been proposed (Grothe et al, 1996). While none of these approaches are based upon ecological significance, it is comparable to hypothesis tests where the effect level considered statistically significant (toxic) is, to a large extent, determined by the test design, conduct, and replicate variability. Therefore, point estimates may be as reliable in representing the toxicity of a sample as hypothesis tests with the added benefit of a concentration-response curve and confidence intervals of the estimate for compliance determination.

Grothe, D. R., K. L. Dickson, and D. K. Reed-Judkins, eds.1996. Whole Effluent Toxicity Testing: An Evaluation of Methods and Prediction of Receiving System Impacts, SETAC Press, Pensacola, FL, USA. 340 p.

RETURN TO FAQs

What are the advantages and disadvantages of hypothesis testing and point-estimates?

WET compliance decisions are based upon statistical methods which are influenced by the experimental design and have inherent strengths and weaknesses. A comprehensive discussion of the advantages and disadvantages of hypothesis testing and point estimates as they relate to WET testing can be found in the SETAC publication on Whole Effluent Toxicity Testing (Chapman et al., 1996) and are summarized below.

Advantages of Hypothesis Testing

1. Well suited for comparing a treatment with the control.

2. Relatively simple to calculate.

Disadvantages of Hypothesis Testing

1. Dependent on concentrations tested.

2. Statistical power is influenced by variability.

3. Inability to calculate confidence intervals.

4. Confounded by hormesis or poorly behaved data.

5. Frequently need to use non-parametric statistical methods.

Advantages of Point Estimates

1. Establishes a concentration response relationship using data from all treatments.

2. Point estimates do not have to be a tested concentration.

3. Precision estimates and confidence intervals are provided.

4. Many model choices.

5. Can be used for all types of data.

6. Capability for advanced applications.

Disadvantages of Point Estimates

1. Level of effect must be selected.

2. Accuracy compromised with lack of partial responses.

3. Is model-fit dependent.

4. Construction and behavior of confidence intervals for low effect levels.

5. More sophisticated computations required.

6. Requires greater knowledge of statistical tools.

7. Is not widely used for chronic toxicity tests in the WET program.

The SETAC publication also discusses ways to address or investigate these disadvantages, however, it must be recognized that the connection between biological and statistical significance is based more on how the tests are being performed (consideration of exposure variables) than the statistical analysis employed.

Chapman GA, Anderson BS, Bailer AJ, Baird RB, Berger R, Burton DT, Denton DL, Goodfellow WL, Heber MA, McDonald LL, Norberg-King TJ, Ruffier PJ, 1996. Discussion Synopsis. In: Grothe, D. R., K. L. Dickson, and D. K. Reed-Judkins, eds.1996. Whole Effluent Toxicity Testing: An Evaluation of Methods and Prediction of Receiving System Impacts, SETAC Press, Pensacola, FL, USA. 51-78 p.

RETURN TO FAQs

What biological conclusions can be made from the statistical analysis of toxicity tests?

A significant conclusion of the Pellston Conference of WET was that these tests are effective tools for predicting environmental impacts. However, further field bioassessment studies are needed to examine the relationship between WET tests and ecosystems other than small rivers and lakes such as wetlands, estuaries, and large rivers and in sediments (Black et al., 1996). This research is necessary because the relationship between toxicity in an effluent toxicity test and the biological or ecological impact in the receiving stream is not direct. Such tests need to compare the sensitivity of the organisms as well as the exposure conditions (magnitude, duration, frequency, and water chemistry) between the effluent test and receiving water when using the results from an effluent toxicity test to predict the biological/ecological impact. Further, WET tests were not designed nor can they detect other biological effects such as bioaccumulation, genotoxicity (mutagenicity, teratogenicity, carcinogenicity), hormone disruption, indirect biotic effects (competition), and eutrophication. It is also important to recognize that the only toxicological/biological conclusions of which we are reasonably certain based upon a single toxicity test result are limited to that laboratory test and may vary with test design and conduct. A weight of evidence approach using sufficient chemical, bioassessment and toxicity test data is an effective way to address the uncertainty of a response predicted by the results of a single toxicity test.

Black JA, Burton DT, DeGraeve GM, Heber MA, LeBlanc NE, Lewis MC, 1996. Workshop summary and conclusions. In: Grothe, D. R., K. L. Dickson, and D. K. Reed-Judkins, eds.1996. Whole Effluent Toxicity Testing: An Evaluation of Methods and Prediction of Receiving System Impacts, SETAC Press, Pensacola, FL, USA. 331-340 p.

RETURN TO FAQs

Why are only certain species and methods used in the WET program?

The WET test methods and species were designated to provide a standardized and equitable approach to WET testing, including the laboratory methods and species used.   The approved species were chosen based on a number of criteria, including:

a.   Sensitivity – Although no species has been shown to be the most sensitive to all toxicants, the species selected were determined to be representative of a range of sensitivity with respect to most known pollutants common at that time. The species selected do not represent the most sensitive species tested or available; however the selected invertebrates have generally been found to be more sensitive to most toxicants than the selected vertebrates.

b.   Ecological Importance - The species selected are considered to be representative of important constituents of the aquatic community, including the basic elements of an aquatic food chain - algae, invertebrates and small fish, all of which contribute to a balanced ecosystem. Species commercial and recreational value was also considered.

c.   Availability - The species selected are commonly available as stock cultures and are reasonably amenable to lab culture for testing labs. This allows production of organisms year-round to meet testing needs. The selected species and life stages are also tolerant of handling during testing.

d.   Precision - The methods and species were chosen based on reproducibility of responses, given standardized wastewater exposures, within and between labs performing toxicity tests.

RETURN TO FAQs

The test species are not found in my receiving water, why should I use them?

The WET test species were selected as being representative of species, that should be present in a wide variety of receiving streams under varying conditions, or species that occupy similar niches in aquatic ecosystems. The Environmental Protection Agency’s (EPA’s) Technical Support Document for Water Quality-based Toxics Control (EPA/505/2-90-001) states that the currently approved standard test species represent the sensitive range of all ecosystems analyzed.

Test species response is an important factor, particularly when an effluent demonstrates toxicity. Use of standardized species and methods enhances the potential for successful identification of toxicants in toxic effluents since toxicity data for these organisms are already available for a variety of pollutants.

Kimerle, R.A., A.F. Werner, and W.J. Adams. 1984.  Aquatic Hazard Evaluation Principals Applied to the Development of Water Quality Criteria.  In Aquatic Toxicology and Hazard Assessment, Seventh Symposium. ASTM STP 854.  Ed.  R.D. Cardwell, R. Purdy, and R.C. Bahner.  American Society for Testing and Materials, Philadelphia, PA.).

RETURN TO FAQs

What is the purpose of multiple species testing?

Multiple species testing is a means to measure potential environmental impacts across a range of toxicants and test organism sensitivity levels. For example, while the daphnids are generally more sensitive to pesticides and metals than the minnows, the minnows are generally more sensitive to ammonia than the daphnids.

RETURN TO FAQs

My effluent tests indicate there may be a problem but I can see fish in the area of my discharge – is there really a problem?

WET testing is typically designed to ensure that discharges do not cause toxicity to relatively sensitive test species during critical life stages in the receiving stream under low dilution scenarios. Adequate growth and reproduction are important to ensure that organisms instream can maintain balanced populations. Observations of organisms in the area of the outfall do not mean that more subtle impacts are not occurring or that the organisms that are present are sensitive enough to represent most organisms instream.  Death or impairment of growth or reproduction to instream organisms, represented by test organisms, reduces the availability of food sources for larger organisms and can eventually translate into adult population declines and an unbalanced ecosystem. Additionally, the presence of pollution tolerant organisms does not prove that more sensitive organisms have not already been impacted, either in a subtle or very profound manner. As a predictive tool, WET testing can be used to assess the probability of an effluent to cause ecosystem impairment under stressful conditions (e.g., flow and temperature extremes) which may increase toxicity. Also, when prevailing discharge conditions are more favorable than the permit conditions (i.e., the discharge flow is less than the permitted maximum, and/or the receiving stream flow is greater than the critical low flow) it may be difficult to establish agreement between WET test results and instream biological conditions.

RETURN TO FAQs

 Is the use of alternate test species or methods allowable?

Federal regulations at 40 CFR Part 136.3 allow for the development of alternative test methods, subject to approval by the appropriate regulatory authority and providing the regulatory authority does not object to the proposed test method or species. The requirements for alternate test species are:

a.   The alternate species/lifestage must represent the same environmental niche and phylogenetic grouping as the species it replaces, and should represent the sensitivities of species that would normally be found in the receiving water.

b.   The alternate species/life stage must be of equivalent or greater sensitivity when compared to the approved species. Sensitivity comparisons should be based on toxicants and exposure conditions similar to those used to establish sensitivity for the standard test organism. For example, it would be inappropriate to use an alternate test organism that shows no chronic response to copper after seven days exposure when sensitivity of the approved species is based on 28 days of exposure.

c.   A standardized, peer-reviewed protocol must be developed for the particular test and species.

d.   Testing with the alternate species must provide reproducible responses to toxicants, either found in effluents or reference toxicant testing programs based on intra- and inter-lab variability comparisons with current methods and species.

e.   QA/QC activities and requirements applied to conventional methods and species will apply to the methods for testing an alternate species.

f.    Early life stages of the alternate species must be readily available for testing.  Generally this would mean that the alternate species must be amenable to lab culture in order to ensure adequate availability year-round.

g.   As noted previously, the use of alternate species may also create additional difficulties when toxicity is demonstrated. Lack of toxicological response information for an alternate species may make toxicant identification much more difficult.

RETURN TO FAQs

Can I use test organisms gathered from the wild (feral)?

There are a number of concerns regarding the use of feral organisms:

a.   Supply - Is there an adequate supply of the test organisms to perform the standard testing and any required additional testing on a year-round basis?

b.   Handling - Is the test organism able to withstand the rigors of capture, shipping, acclimation and taxonomic verification prior to testing? The organisms should be in peak condition at test initiation.

c.   Life Stage - Can the age of the test organisms be verified, bearing in mind the constraints of current methods (e.g., all organisms hatched within an eight hour period)?

d.   Identification - Taxonomic verification of all test organisms is required. This process, particularly for invertebrates, may be extremely difficult and stressful to the organisms and may impact their sensitivity during testing.

e.   Health - The use of feral test organisms imparts a significant unknown into the test including a predisposition to contaminants, potential for disease, and /or increased weakness or tolerance due to exposure in the water from which they are collected.  This can affect the test, plus may be a source of contamination for other tests and lab cultures.

f.   Sensitivity - Feral test organisms are subject to the same requirements for reference toxicant testing, as are the approved test species. The nature of wild collection will require that a reference toxicant test be performed concurrent with every test performed on the alternate species. Additionally, feral organisms may be more or less tolerant to lab conditions and effluent exposure based on local conditions where the organisms were obtained.

g.   Test Protocol – The appropriate test procedure for the feral species would need approval through the alternate test method protocol with demonstrated success.

Due to these factors, the use of feral organisms may result in higher test variability, independent of lab performance.

RETURN TO FAQs

How does the regulatory authority determine what type of test (acute or chronic) should be performed?

Testing requirement decisions (acute or chronic, freshwater or marine) are usually based on criteria established in the regulatory authority’s water quality standards implementation procedures or the goals of the test. EPA’s Technical Support Document for Water Quality-based Toxics Control (TSD) provides guidance on choosing tests and test organisms. This guidance generally attempts to mimic instream exposure with a similar type of test exposure, i.e. a short duration exposure with an acute test, longer-term exposure with a chronic test.

 The decision regarding acute or chronic testing may be either a policy decision (if involving a regulatory agency) or a function of available dilution, depending on the goals of the test. This decision is also dependent on whether acute or chronic data characterizing a particular discharge are already available. As an example of a policy decision and where historical data is not available, the regulatory authority may elect to disallow dilution, and require end-of-pipe testing. In this case the instream waste concentration is considered to be 100% effluent regardless of the ratio of effluent flow to receiving stream flow.

 Where the decision allows consideration of available dilution, chronic testing may be required where the instream waste concentration exceeds a specific benchmark. Most regulatory agencies require acute testing at some point in a testing program, and dilution may or may not be allowed.

 

RETURN TO FAQs

I discharge a freshwater waste stream into an estuary/marine environment. Should I test with marine or freshwater test organisms?

 The choice of marine or freshwater test organisms is generally dependent on the salinity of the receiving stream. The term ‘salinity’ as used here relates to the ions and ratios of ions normally found in estuaries, bays or the open ocean. Inland saline waters usually contain ions and ratios of ions that differ significantly from natural ocean waters. For this reason, the marine test species are usually not adaptable for testing freshwater modified by other salt mixes. Other issues to consider are the buoyancy and mixing of freshwater streams discharged to the marine environment and the use of diffusers to enhance mixing.  These factors influence instream dilution and the concentrations of wastewater that should be considered.

EPA’s marine toxicity test methods allow for testing of freshwater effluent discharges to marine or estuarine receiving waters by requiring that the salinity of the effluent be adjusted to approximate the salinity of the receiving water. Salinity adjustment is necessary when effluent concentrations to be tested are high enough to reduce test dilution salinity below the acceptable range. To maintain acceptable salinity, these higher test concentrations of effluent