Title: | SEAWAVE-Q Model |
---|---|
Description: | A model and utilities for analyzing trends in chemical concentrations in streams with a seasonal wave (seawave) and adjustment for streamflow (Q) and other ancillary variables. See Ryberg and York, 2020, <doi:10.3133/ofr20201082>. |
Authors: | Karen R. Ryberg [aut, cre], Aldo V. Vecchia [aut], Benjamin York [ctb] |
Maintainer: | Karen R. Ryberg <[email protected]> |
License: | Unlimited | file LICENSE |
Version: | 2.0.2 |
Built: | 2024-11-15 04:52:39 UTC |
Source: | https://github.com/cran/seawaveQ |
An R package for the U.S. Geological Survey seawaveQ model, a parametric regression model specifically designed for analyzing seasonal- and flow-related variability and trends in pesticide concentrations. See Vecchia and others (2008) for the original description of the model and see Sullivan and others (2009), Vecchia and others (2009), Ryberg and others (2010), Ryberg and others (2014), Ryberg and Gilliom (2015), and Oelsner and others (2017) and for applications of the model.
Package: | seawaveQ |
Type: | Package |
Version: | 2.0.2 |
Date: | 2020--12--14 |
License: | Unlimited | file LICENSE |
Karen R. Ryberg [email protected] and Aldo V. Vecchia [email protected]
Oelsner, G.P., Sprague, L.A., Murphy, J.C., Zuellig, R.E., Johnson, H.M., Ryberg, K.R., Falcone, J.A., Stets, E.G., Vecchia, A.V., Riskin, M.L., De Cicco, L.A., Mills, T.J., and Farmer, W.H., 2017, Water-quality trends in the nation's rivers and streams, 1972–2012—Data preparation, statistical methods, and trend results: U. S. Geological Survey Scientific Investigations Report 2017–5006, 136 p., https://doi.org/10.3133/sir20175006.
Ryberg, K.R. and Gilliom, R.J., 2015, Trends in pesticide concentrations and use for major rivers of the United States: Science of The Total Environment, v. 538, p. 431–444, https://doi.org/10.1016/j.scitotenv.2015.06.095.
Ryberg, K.R. and Vecchia, A.V., 2013, seawaveQ—An R package providing a model and utilities for analyzing trends in chemical concentrations in streams with a seasonal wave (seawave) and adjustment for streamflow (Q) and other ancillary variables: U.S. Geological Survey Open-File Report 2013–1255, 13 p., with 3 appendixes, https://dx.doi.org/10.3133/ofr20131255.
Ryberg, K.R. and York, B.C., 2020, seawaveQ—An R package providing a model and utilities for analyzing trends in chemical concentrations in streams with a seasonal wave (seawave) and adjustment for streamflow (Q) and other ancillary variables, Version 2.0.0: U.S. Geological Survey Open-File Report 2020–1082, 25 p., with 4 appendixes.
Ryberg, K.R., Vecchia, A.V., Gilliom, R.J., and Martin, J.D., 2014, Pesticide trends in major rivers of the United States, 1992–2010: U.S. Geological Survey Scientific Investigations Report 2014–5135, 74 p., https://pubs.er.usgs.gov/publication/sir20145135.
Ryberg, K.R., Vecchia, A.V., Martin, J.D., Gilliom, R.J., 2010, Trends in pesticide concentrations in urban streams in the United States, 1992–2008: U.S. Geological Survey Scientific Investigations Report 2010–5139, 101 p., https://pubs.usgs.gov/sir/2010/5139/.
Sullivan, D.J., Vecchia, A.V., Lorenz, D.L., Gilliom, R.J., Martin, J.D., 2009, Trends in pesticide concentrations in corn-belt streams, 1996–2006: U.S. Geological Survey Scientific Investigations Report 2009–5132, 75 p., https://pubs.usgs.gov/sir/2009/5132/.
Vecchia, A.V., Gilliom, R.J., Sullivan, D.J., Lorenz, D.L., and Martin, J.D., 2009, Trends in concentrations and use of agricultural herbicides for Corn Belt rivers, 1996–2006: Environmental Science and Technology, v. 43, no. 24, p. 9096–9102.
Vecchia, A.V., Martin, J.D., and Gilliom, R.J., 2008, Modeling variability and trends in pesticide concentrations in streams: Journal of the American Water Resources Association, v. 44, no. 5, p. 1308–1324, https://dx.doi.org/10.1111/j.1752-1688.2008.00225.x.
Function to generate a scatter plot that indicates censored and estimated water-quality concentrations.
cenScatPlot( data, datescol = "dates", pname, qwcols = c("R", "P"), site = "", xlabel = "", ylabel = "Concentration", legpos = "topright", legcex = 1, ... )
cenScatPlot( data, datescol = "dates", pname, qwcols = c("R", "P"), site = "", xlabel = "", ylabel = "Concentration", legpos = "topright", legcex = 1, ... )
data |
is the dataset with columns that begin with P followed
by alphanumeric characters indicating concentration data and columns
that begin with R followed by alphanumeric characters that match those
of the concentration data indicating qualification codes. See example
datasets for more information about the data format, see
|
datescol |
is the column label for the dates column. |
pname |
is the column heading (parameter name) for the particular water-quality constituent to be plotted (omit the starting character, for example for sulfate data indicated by P00945, enter "00945"). |
qwcols |
is a character vector with the beginning of the column headers for remarks code (default is R), and beginning of column headers for concentration data (default is P for parameter). |
site |
is a label for the plot title indicating the site where the water-quality samples were collected. |
xlabel |
is the label for the x-axis, defaults to no label. |
ylabel |
is the label for the y-axis. |
legpos |
is the position of the legend, see legend. |
legcex |
is a numerical value giving the amount by which the legend text and symbols should be magnified relative to the default, 1. |
... |
arguments to be passed to plot method. |
This function uses the qualification, or remark, column associated with water-quality concentration values to indicate which samples are unqualified, which are estimated, and which are censored. A blank remark field or an "_" indicates that the concentration value is not qualified; an "E" indicates the value has been estimated; and a less than symbol, "<", indicates the value has been censored as less than a minimum reporting level. See Oblinger Childress and others (1999) for information on the minimum reporting level and the definition of "E" for U.S. Geological Survey data. Other users may have a different definition of the minimum reporting level, but censored values need to be qualified with a "<". Using the "E" code is optional.
A scatter plot
Karen R. Ryberg
Oblinger Childress, C.J., Foreman, W.T., Connor, B.F., and Maloney, T.J., 1999, New reporting procedures based on long-term method detection levels and some considerations for interpretations of water-quality data provided by the U.S. Geological Survey: U.S. Geological Survey Open-File Report 99–193, 19 p., https://water.usgs.gov/owq/OFR_99-193/index.html.
data(swData) # scatter plot of Simazine concentrations cenScatPlot(IllRivValleyCty, pname = "04035") # scatter plot with many additional plotting arguments par(las = 1, tcl = 0.5) cenScatPlot(IllRivValleyCty, pname = "04035", site = "05586100 Illinois River at Valley City, IL", ylabel = "Simazine concentration, in micrograms per liter", legcex = 0.7, ylim = c(0, 0.4), yaxs = "i", cex.lab = 0.9, cex.axis = 0.9, xlim = c(as.Date("1996-01-01", "%Y-%m-%d"), as.Date("2012-01-01", "%Y-%m-%d")), xaxs = "i", xaxt = "n") axdates <- c("1996-01-01", "2000-01-01", "2004-01-01", "2008-01-01", "2012-01-01") axis(1, as.Date(axdates, "%Y-%m-%d"), labels = c("1996", "2000", "2004", "2008", "2012"), cex.axis = 0.9)
data(swData) # scatter plot of Simazine concentrations cenScatPlot(IllRivValleyCty, pname = "04035") # scatter plot with many additional plotting arguments par(las = 1, tcl = 0.5) cenScatPlot(IllRivValleyCty, pname = "04035", site = "05586100 Illinois River at Valley City, IL", ylabel = "Simazine concentration, in micrograms per liter", legcex = 0.7, ylim = c(0, 0.4), yaxs = "i", cex.lab = 0.9, cex.axis = 0.9, xlim = c(as.Date("1996-01-01", "%Y-%m-%d"), as.Date("2012-01-01", "%Y-%m-%d")), xaxs = "i", xaxt = "n") axdates <- c("1996-01-01", "2000-01-01", "2004-01-01", "2008-01-01", "2012-01-01") axis(1, as.Date(axdates, "%Y-%m-%d"), labels = c("1996", "2000", "2004", "2008", "2012"), cex.axis = 0.9)
Function to combine water-quality sample data and continuous (daily) ancillary variables and drop unnecessary columns.
combineData(qwdat, cqwdat, qwcols = c("staid", "dates", "R", "P"))
combineData(qwdat, cqwdat, qwcols = c("staid", "dates", "R", "P"))
qwdat |
is the dataset containing water-quality sample data with columns that begin with a P (or other user-defined indicator) followed by alphanumeric characters. These columns are concentration data. In addition there need to be columns that begin with an R (or other user- defined indicator) followed by alphanumeric characters that match those of the associated concentration data. The R columns contain data qualification codes. See example datasets for more information about the data format, IllRivValleyCty and qwMoRivOmaha. This package assumes that the data are in micrograms per liter. |
cqwdat |
is the dataset containing variables that can be used as explanatory variables for the seawaveQ model. See example dataset for more information about the data format cqwMoRivOmaha. These are daily values with no missing values allowed between the first and the last date in the dataset. |
qwcols |
is a character vector with column headings for a station (location) identifier, a dates column identifier, beginning of column headers for remarks code (default is R), and beginning of column headers for concentration data (default is P for parameter). |
A data frame with the number of rows equal to the number of rows in the data frame indicated by qwdat. The number of columns depend on the two input data frames. Minimally there will be a station identification column, a dates column, a column of qualification codes, and a column of water-quality data.
A data frame
The columns indicated by qwcols[1:2] are used to combine the datasets. The first column is the station identifier and the second column is the date column. These two column headings must be the same in the two datasets being combined and the dates in the datasets being combined must be of class Date and use the same format.
Karen R. Ryberg and Aldo V. Vecchia
data(swData) MoRivOmaha <- combineData(qwdat = qwMoRivOmaha, cqwdat = cqwMoRivOmaha, qwcols = c("staid", "dates", "R", "P"))
data(swData) MoRivOmaha <- combineData(qwdat = qwMoRivOmaha, cqwdat = cqwMoRivOmaha, qwcols = c("staid", "dates", "R", "P"))
Function to compute seasonal wave.
compwaveconv(cmaxt, jmod, hlife)
compwaveconv(cmaxt, jmod, hlife)
cmaxt |
is the time of the maximum chemical concentration, decimal time in years. |
jmod |
is the choice of model or pulse input function, an integer 1 through 14. |
hlife |
is the model half-life in months, 1 to 4 months |
A numeric vector of size 361 with discrete values of the
seasonal wave for decimal season seq(0,1,1/360)
.
The seasonal wave is a dimensionless, periodic function of time with an annual cycle, similar to a mixture of sine and cosine functions often used to model seasonality in concentration data. However, the seasonal wave is better suited for modeling seasonal behavior of pesticide data than a mixture of sines and cosines. The pulse input function, represented by jmod, has either one or two distinct application seasons (when pesticides may be transported to the stream) of lengths from 1 to 6 months. Therefore, 56 (14x4) choices for the wave function are available. The numeric vector is a discrete approximation of the continuous wave function defined on the interval 0 to 1.
Aldo V. Vecchia
Vecchia, A.V., Martin, J.D., and Gilliom, R.J., 2008, Modeling variability and trends in pesticide concentrations in streams: JAWRA Journal of the American Water Resources Association, v. 44, p. 1308–1324, https://doi.org/10.1111/j.1752-1688.2008.00225.x.
# evaluate seasonal wave for specified decimal seasons # these example decimal dates represent days at points 0.25, 0.5, and # 0.75 percent of the way through the year and the end of the year dseas <- c(0.25, 0.5, 0.75, 1) swave <- compwaveconv(cmaxt = 0.483, jmod = 2, hlife = 4) swave[floor(360 * dseas)] plot(seq(0, 1, 1/360), swave, typ = "l")
# evaluate seasonal wave for specified decimal seasons # these example decimal dates represent days at points 0.25, 0.5, and # 0.75 percent of the way through the year and the end of the year dseas <- c(0.25, 0.5, 0.75, 1) swave <- compwaveconv(cmaxt = 0.483, jmod = 2, hlife = 4) swave[floor(360 * dseas)] plot(seq(0, 1, 1/360), swave, typ = "l")
Continuously monitored (daily) streamflow and sediment data for U.S. Geological Survey streamgage 06610000 Missouri River at Omaha, Neb., and streamflow and sediment anomalies.
cqwMoRivOmaha
cqwMoRivOmaha
A data frame containing 2,922 daily observations of two hydrologic variables, streamflow and sediment, and streamflow and sediment anomalies. There are eight variables.
staid | character | USGS Station identification number |
dates | date | date water-quality sample collected |
dflow | numeric | daily mean streamflow, cubic feet per second |
flowa30 | numeric | 30-day streamflow anomaly |
flowa1 | numeric | 1-day streamflow anomaly |
dsed | numeric | daily mean sediment concentration, milligrams per liter |
seda30 | numeric | 30-day sediment anomaly |
seda1 | numeric | 1-day sediment anomaly |
The streamflow and sediment anomalies were generated using the R package waterData (Ryberg and Vecchia, 2012).
See Ryberg and Vecchia (2012) for more information on calculating the anomalies and for additional references documenting the use of streamflow anomalies in water-quality trend analysis studies.
Data provided by Patrick Phillips, U.S. Geological Survey, New York Water Science Center.
Ryberg, K.R., and Vecchia, A.V., 2012, waterData—An R package for retrieval, analysis, and anomaly calculation of daily hydrologic time series data, version 1.0.0: U.S. Geological Survey Open-File Report 2012–1168, 8 p. [(]Also available at http://pubs.usgs.gov/of/2012/1168/.]
data(swData) # summary of water-quality concentrations apply(cqwMoRivOmaha[,3:8], 2, summary)
data(swData) # summary of water-quality concentrations apply(cqwMoRivOmaha[,3:8], 2, summary)
This is an example of the continuous ancillary data that is passed
internally to subfunctions of fitswavecav
. It is provided
here for use with examples of internal functions.
examplecavdat
examplecavdat
A data frame containing 2,893 data variables and 30-day and 1-day streamflow anomalies (Ryberg and Vecchia, 2012).
yrx | numeric | Year |
mox | numeric | Month |
dax | numeric | Day |
jdayx | numeric | Julian day from first day water year for start year in fitswavecav |
flowa30 | numeric | 30-day streamflow anomaly |
flowa1 | numeric | 1-day streamflow anomaly |
Internal data captured from the following function call:
fitswavecav(cdat=modMoRivOmaha, cavdat=cqwMoRivOmaha, tanm="myexample", pnames=c("04041"), yrstart=1995, yrend=2003, tndbeg=1995, tndend=2003, iwcav=c("flowa30","flowa1"), dcol="dates", qwcols=c("R","P"))
Ryberg, K.R., and Vecchia, A.V., 2012, waterData–An R package for retrieval, analysis, and anomaly calculation of daily hydrologic time series data, version 1.0: U.S. Geological Survey Open-File Report 2012–1168, 8 p. [Also available at http://pubs.usgs.gov/of/2012/1168/.]
data(swData) head(examplecavdat)
data(swData) head(examplecavdat)
This is an example of the continuous ancillary matrix that is passed
internally to subfunctions of fitswavecav
. It is provided
here for use with examples of internal functions.
examplecavmat
examplecavmat
A matrix containing 115 30-day and 1-day streamflow anomalies (Ryberg and Vecchia, 2012).
flowa30 | numeric | 30-day streamflow anomaly |
flowa1 | numeric | 1-day streamflow anomaly |
Internal data captured from the following function call:
fitswavecav(cdat=modMoRivOmaha, cavdat=cqwMoRivOmaha, tanm="myexample", pnames=c("04041"), yrstart=1995, yrend=2003, tndbeg=1995, tndend=2003, iwcav=c("flowa30","flowa1"), dcol="dates", qwcols=c("R","P"))
data(swData) head(examplecavmat)
data(swData) head(examplecavmat)
This is an example of the water-quality data that is passed
internally to subfunctions of fitswavecav
. It is provided
here for use with examples of internal functions.
examplecdatsub
examplecdatsub
A data frame containing 115 observations of 10 variables. The date variables
were internally calculated. The columns R04041 and P04041 are a subset of
qwMoRivOmaha
and the 30-day and 1-day streamflow and sediment anomalies
are a subset of cqwMoRivOmaha
.
yrc | numeric | Year |
moc | numeric | Month |
dac | numeric | Day |
jdayc | numeric | Julian day from first day of start year in fitswavecav |
flowa30 | numeric | 30-day streamflow anomaly |
flowa1 | numeric | 1-day streamflow anomaly |
seda30 | numeric | 30-day sediment anomaly |
seda1 | numeric | 1-day sediment anomaly |
Internal data captured from the following function call:
fitswavecav(cdat=modMoRivOmaha, cavdat=cqwMoRivOmaha, tanm="myexample", pnames=c("04041"), yrstart=1995, yrend=2003, tndbeg=1995, tndend=2003, iwcav=c("flowa30","flowa1"), dcol="dates", qwcols=c("R","P"))
data(swData) head(examplecdatsub)
data(swData) head(examplecdatsub)
This is an example of data that is passed
internally to subfunctions of fitswavecav
. This logical
vector indicates which water-quality values are censored. It is provided
here for use with examples of the internal functions.
examplecentmp
examplecentmp
A logical vector of 115 observations.
Internal data captured from the following function call:
fitswavecav(cdat=modMoRivOmaha, cavdat=cqwMoRivOmaha, tanm="myexample", pnames=c("04041"), yrstart=1995, yrend=2003, tndbeg=1995, tndend=2003, iwcav=c("flowa30","flowa1"), dcol="dates", qwcols=c("R","P"))
data(swData) examplecentmp
data(swData) examplecentmp
This is an example of data that is used internally by fitMod and
passed to its subfunction seawaveQPlots
. This numeric
vector represents the base-10 logarithm of the water-quality concentrations.
It is provided here for use with examples of the internal functions.
exampleclog
exampleclog
A numeric vector of 115 observations.
Internal data captured from the following function call:
fitswavecav(cdat=modMoRivOmaha, cavdat=cqwMoRivOmaha, tanm="myexample", pnames=c("04041"), yrstart=1995, yrend=2003, tndbeg=1995, tndend=2003, iwcav=c("flowa30","flowa1"), dcol="dates", qwcols=c("R","P"))
data(swData) exampleclog
data(swData) exampleclog
This is an example of the character vector used to indicate which columns represent qualification codes and which represent water-quality concentration data. It is provided here for use with examples of the internal functions.
exampleqwcols
exampleqwcols
A numeric vector of 115 observations.
Internal data captured from the following function call:
fitswavecav(cdat=modMoRivOmaha, cavdat=cqwMoRivOmaha, tanm="myexample", pnames=c("04041"), yrstart=1995, yrend=2003, tndbeg=1995, tndend=2003, iwcav=c("flowa30","flowa1"), dcol="dates", qwcols=c("R","P"))
data(swData) exampleqwcols
data(swData) exampleqwcols
This is an example of data that is passed
internally to subfunctions of fitswavecav
. It is provided
here for use with examples of the internal functions.
examplestpars
examplestpars
A numeric matrix of two rows and 15 columns.
column description
1 mclass, model class, 1 for linear models, 2 for those with restricted cubic splines
2 model chosen (a number 1-56), this number represents both the pulse input function and the half-life
3 is the scale factor from the survreg.object
4 is the likelihood for the model chosen
5 is the coefficient for the model intercept
6 is the coefficient for the seasonal wave component of the model
7 is the coefficient for the trend component of the model
8 is the coefficient for the 30-day flow anomaly
9 is the coefficient for the 1-day flow anomaly
10 is the standard error for the intercept term
11 is the standard error for the seasonal wave term
12 is the standard error for the trend term
13 is the standard error for the 30-day flow anomaly term
14 is the standard error for the 1-day flow anomaly term
15 is cmaxt, the decimal season of maximum concentration
Internal data captured from the following function call:
fitswavecav(cdat=modMoRivOmaha, cavdat=cqwMoRivOmaha, tanm="myexample", pnames=c("04041"), yrstart=1995, yrend=2003, tndbeg=1995, tndend=2003, iwcav=c("flowa30","flowa1"), dcol="dates", qwcols=c("R","P"))
data(swData) examplestpars
data(swData) examplestpars
This is an example of data that is passed
internally to seawaveQPlots
. This numeric
vector contains trend coefficients for the water-quality samples.
It is provided here for use with examples of the internal functions.
exampletndlin
exampletndlin
A numeric vector of 115 observations.
Internal data captured from the following function call:
fitswavecav(cdat=modMoRivOmaha, cavdat=cqwMoRivOmaha, tanm="myexample", pnames=c("04041"), yrstart=1995, yrend=2003, tndbeg=1995, tndend=2003, iwcav=c("flowa30","flowa1"), dcol="dates", qwcols=c("R","P"))
data(swData) head(exampletndlin)
data(swData) head(exampletndlin)
This is an example of data that is passed
internally to seawaveQPlots
. This numeric
vector contains trend coefficients for a continuous water-quality prediction
based on the continuous ancillary variables.
It is provided here for use with examples of the internal functions.
exampletndlinpr
exampletndlinpr
A numeric vector of 2,893 observations.
Internal data captured from the following function call:
fitswavecav(cdat=modMoRivOmaha, cavdat=cqwMoRivOmaha, tanm="myexample", pnames=c("04041"), yrstart=1995, yrend=2003, tndbeg=1995, tndend=2003, iwcav=c("flowa30","flowa1"), dcol="dates", qwcols=c("R","P"))
data(swData) head(exampletndlinpr)
data(swData) head(exampletndlinpr)
This is an example of data that is passed
internally to seawaveQPlots
. This numeric
vector contains decimal seasonal (0-1) values for the water-quality
samples. It is provided here for use with examples of the internal
functions.
exampletseas
exampletseas
A numeric vector of 115 observations.
Internal data captured from the following function call:
fitswavecav(cdat=modMoRivOmaha, cavdat=cqwMoRivOmaha, tanm="myexample", pnames=c("04041"), yrstart=1995, yrend=2003, tndbeg=1995, tndend=2003, iwcav=c("flowa30","flowa1"), dcol="dates", qwcols=c("R","P"))
data(swData) head(exampletseas)
data(swData) head(exampletseas)
This is an example of data that is passed
internally to seawaveQPlots
. This numeric
vector contains decimal seasonal (0-1) values for the continuous
ancillary data. It is provided here for use with examples of the internal
functions.
exampletseaspr
exampletseaspr
A numeric vector of 2,893 observations.
Internal data captured from the following function call:
fitswavecav(cdat=modMoRivOmaha, cavdat=cqwMoRivOmaha, tanm="myexample", pnames=c("04041"), yrstart=1995, yrend=2003, tndbeg=1995, tndend=2003, iwcav=c("flowa30","flowa1"), dcol="dates", qwcols=c("R","P"))
data(swData) head(exampletseaspr)
data(swData) head(exampletseaspr)
This is an example of data that is passed
internally to seawaveQPlots
. This numeric
vector contains decimal dates for the water-quality samples.
It is provided here for use with examples of the internal functions.
exampletyr
exampletyr
A numeric vector of 115 observations.
Internal data captured from the following function call:
fitswavecav(cdat=modMoRivOmaha, cavdat=cqwMoRivOmaha, tanm="myexample", pnames=c("04041"), yrstart=1995, yrend=2003, tndbeg=1995, tndend=2003, iwcav=c("flowa30","flowa1"), dcol="dates", qwcols=c("R","P"))
data(swData) head(exampletyr)
data(swData) head(exampletyr)
This is an example of data that is passed
internally to seawaveQPlots
. This numeric
vector contains decimal dates for continuous ancillary variables.
It is provided here for use with examples of the internal functions.
exampletyrpr
exampletyrpr
A numeric vector of 2,893 observations.
Internal data captured from the following function call:
fitswavecav(cdat=modMoRivOmaha, cavdat=cqwMoRivOmaha, tanm="myexample", pnames=c("04041"), yrstart=1995, yrend=2003, tndbeg=1995, tndend=2003, iwcav=c("flowa30","flowa1"), dcol="dates", qwcols=c("R","P"))
data(swData) head(exampletyrpr)
data(swData) head(exampletyrpr)
fitMod is called from within fitswavecav but can be invoked directly. It fits the seawaveQ model and returns the results.
fitMod( cdatsub, cavdat, yrstart, yrend, tndbeg, tndend, tanm, pnames, qwcols, mclass = 1, numk = 4, plotfile = FALSE, textfile = FALSE )
fitMod( cdatsub, cavdat, yrstart, yrend, tndbeg, tndend, tanm, pnames, qwcols, mclass = 1, numk = 4, plotfile = FALSE, textfile = FALSE )
cdatsub |
is the concentration data. |
cavdat |
is the continuous (daily) ancillary data. |
yrstart |
is the starting year of the analysis (treated as January 1 of that year). |
yrend |
is the ending year of the analysis (treated as December 31 of that year). |
tndbeg |
is the beginning (in whole or decimal years) of the trend period. |
tndend |
is the end (in whole or decimal years) of the trend period. |
tanm |
is a character identifier that names the trend analysis run. It is used to label output files. |
pnames |
is the parameter (water-quality constituents) to analyze (if using USGS parameters, omit the starting 'P', such as "00945" for sulfate). |
qwcols |
is a character vector with the beginning of the column headers for remarks code (default is R), and beginning of column headers for concentration data (default is P for parameter). |
mclass |
indicates the class of model to use. A class 1 model is the the traditional SEAWAVE-Q model that has a linear time trend. A class 2 model is a newer option for longer trend periods that uses a set of restricted cubic splines on the time variable to provide a more flexible model. The default is 1. (Harrell, 2010, 2018). |
numk |
is the number of knots in the restricted cubic spline model (mclass = 2). The default is 4, and the recommended number is 3–7. |
plotfile |
is by default FALSE. True will write pdf files of plots to the user's file system. |
textfile |
is by default FALSE. True will write text output files to the user's file system. These files are useful for detailed model comparisons, documenting session information, and for model archives. |
A PDF file (if plotfile is TRUE) containing plots (see
seawaveQPlots
), a text file showing the best model survival
regression call and results, and a list. The first element of the list
contains information about the data and the model(s) selected (see
examplestpars
). The second element of the list contains
the summary of the survival regression call. The third element of the
list is itself a list containing the observed concentrations (censored
and uncensored) and the predicted concentrations used by
seawaveQPlots
or seawaveQPlots2
to generate the
plots.
Aldo V. Vecchia and Karen R. Ryberg
Allison, P.D., 1995, Survival analysis using the SAS system—A practical guide: Cary, N.C., SAS Institute, Inc., 304 p.
Harrell, F.E., Jr., 2010, Regression modeling strategies—With applications to linear models, logistic regression, and survival analysis: New York, Springer-Verlag, 568 p.
Harrell, F.E., Jr., 2018, rms—Regression modeling strategies: R package version 5.1-2, https://CRAN.R-project.org/package=rms.
data(swData) myRes <- fitMod(cdatsub = examplecdatsub, cavdat = examplecavdat, yrstart = 1995, yrend = 2003, tndbeg = 1995, tndend = 2003, tanm = "myfit3", pnames = c("04041"), qwcols = c("R", "P"), plotfile = FALSE, textfile = FALSE)
data(swData) myRes <- fitMod(cdatsub = examplecdatsub, cavdat = examplecavdat, yrstart = 1995, yrend = 2003, tndbeg = 1995, tndend = 2003, tanm = "myfit3", pnames = c("04041"), qwcols = c("R", "P"), plotfile = FALSE, textfile = FALSE)
Function to prepare data and fit the seawaveQ model.
fitswavecav( cdat, cavdat, tanm = "trend1", pnames, yrstart = 0, yrend = 0, tndbeg = 0, tndend = 0, iwcav = c("none"), dcol = "dates", qwcols = c("R", "P"), mclass = 1, numk = 4, alpha = 0.1, bootRCS = FALSE, nboot = 1000, plotfile = FALSE, textfile = FALSE )
fitswavecav( cdat, cavdat, tanm = "trend1", pnames, yrstart = 0, yrend = 0, tndbeg = 0, tndend = 0, iwcav = c("none"), dcol = "dates", qwcols = c("R", "P"), mclass = 1, numk = 4, alpha = 0.1, bootRCS = FALSE, nboot = 1000, plotfile = FALSE, textfile = FALSE )
cdat |
is the concentration data |
cavdat |
is the continuous (daily) ancillary data |
tanm |
is a character identifier that names the trend analysis run. It is used to label output files. |
pnames |
are the parameters (water-quality constituents) to analyze (omit the starting character, for example for sulfate data indicated by P00945, enter "00945"). |
yrstart |
is the starting year of the analysis (treated as January 1 of that year). Zero means the start date will be determined by the start date of cavdat, the continuous ancillary data. |
yrend |
is the ending year of the analysis (treated as December 31 of that year). Zero means the end date will be determined by the end date of cavdat, the continuous ancillary data. |
tndbeg |
is the beginning (in whole or decimal years) of the trend period. Zero means the begin date will be the beginning of the concentration data, cdat. |
tndend |
is the end of the trend (treated as December 31 of that year). Zero means the end date will be the end of the concentration data, cdat. |
iwcav |
is a character vector indicating which continuous ancillary variables to include, if none are used for analysis, use iwcav=c("none"). |
dcol |
is the column name for the dates, should be the same for both cdat and cavdat |
qwcols |
is a character vector with the beginning of the column headers for remarks code (default is R), and beginning of column headers for concentration data (default is P for parameter). |
mclass |
indicates the class of model to use. A class 1 model is the traditional SEAWAVE-Q model that has a linear time trend. A class 2 model is a newer option for longer trend periods that uses a set of restricted cubic splines on the time variable to provide a more flexible model. |
numk |
is the number of knots in the restricted cubic spline model. The default is 4, and the recommended number is 3–7. |
alpha |
is the significance level or alpha values for statistical significance and confidence intervals. |
bootRCS |
is a logical value indicating whether or not to perform block bootstrapping for an attained significance level for the trend with restricted cubic splines. No bootstrapping is performed for the linear trend model. |
nboot |
is the number of bootstrap replicates. A large number, 10,000, is recommended, but this takes a long time. |
plotfile |
is by default FALSE. True will write pdf files of plots to the user's file system. |
textfile |
is by default FALSE. True will write text output files to the user's file system. These files are useful for detailed model comparisons, documenting session information, and for model archives. |
The data frame returned as the first element of the output list has
one row for each chemical analyzed and the number of columns depends on the
number of continuous ancillary variables used. The general format is as
follows:
pname | character | parameter analyzed |
mclass | numeric | a value of 1 or 2 |
jmod | numeric | the choice of pulse input function, an integer 1--14 |
hlife | numeric | the model half-life in months, an integer, 1 to 4 months |
cmaxt | numeric | the decimal season of maximum concentration |
scl | numeric | the scale factor from the
survreg.object |
loglik | numeric | the log-likelihood for the model |
cint | numeric | coefficient for model intercept |
cwave | numeric | coefficient for the seasonal wave |
ctnd[alpahnumeric] | numeric | coefficient(s) for the trend component(s) of model |
c[alphanumeric] | numeric | 0 or more coefficients for the continuous ancillary variables |
seint | numeric | standard error for the intercept |
sewave | numeric | standard error for the seasonal wave |
setnd[alphanumeric] | numeric | standard error for the trend component(s) |
se[alphanumeric] | numeric | 0 or more standard errors for the continuous ancillary variables |
pvaltnd[alphanumeric] | numeric | the p-value for the trend component(s) |
The data frame returned as the sixth element of the output list has
one row for each chemical analyzed. The general format for linear models is
described in pesticideTrendCalcs
. The format for restricted
cubic spline models is as described as follows:
pname | character | parameter analyzed |
mclass | numeric | A value of 1 or 2 |
baseConc | numeric | the concentration at the beginning of the trend period |
endCon | numeric | the concentration at the end of the trend period |
rcsctndPpor | numeric | the concentration trend in percent over the trend period |
rcsctndOrigPORPercentBase | numeric | the conc. trend in original units over period of record (calc. based on percent per year and base conc.) |
pvalrcstnd | numeric | p-value, attained significance level, based on bootstrapping |
ctndlklhd | numeric | trend likelihood |
Fits the seawaveQ model (Vecchia and others, 2008) using a seasonal wave and continuous ancillary variables (streamflow anomalies and other continuous variables such as conductivity or sediment) to model water quality. The version in the 2.0.0 update to the R package has an option to use restricted cubic splines as a more flexible definition of the temporal trend.
A PDF file containing plots of the data and modeled
concentration, a text file containing a summary of the survival
regression call for each model selected, and a list. The first element
of the list is a data frame described under format. The second element
of the list is the summary of the survival regression call. The third
element is the observed concentration data (censored and uncensored).
The fourth element is the concentration data predicted by the model.
The fifth element provides summary statistics for the predicted
concentrations. The sixth element is a data frame that provides a summary of
the trends with the columns pname (parameter name), mclass (a value of 1,
indicating a linear trend model, a value of 2 for models using restricted
cubic splines), and the columns describing the trends. For linear models
see pesticideTrendCalcs
. For models with restricted cubic
splines, see the format section. See Ryberg and York (2020) for additional
details.
The assumed data format is one with columns for water-quality
concentration values and a related column for qualification of
those values, such as in the case of left-censored values less
than a particular value. For example, a water-quality sample
was collected and the laboratory analysis indicated that the
concentration was less than 0.01 micrograms per liter. The
USGS parameter code for simazine is 04035 (U.S. Geological Survey,
2018b). When the data are retrieved through the National Water
Information System: Web Interface
(https://waterdata.usgs.gov/nwis; U.S. Geological Survey, 2018a),
the concentration values are in a column labeled P04035 and the
qualification information, or remark codes, are in a column labeled
R04035. To use this function, the argument pnames would be the unique
identifier for simazine values and qualifications, 04035, and the
qwcols argument would be c("R", "P") to indicate that the
qualification column starts with an R and the values column starts with
a P.
Other users may have data in different formats that can be
modified to use with this function. For example, a user may have
concentration values and qualification codes in one column, such
as a column labeled simazine with the values 0.05, 0.10, <0.01,
<0.01, and 0.90. In this case, the less thans and any other
qualification codes should be placed in a separate column. The
column names for the qualification codes and the concentration values
should be the same with the exception of different beginning
letters to indicate which column is which. The columns could be
named Rsimazine and Psimazine. Then the argument pnames = "simazine"
and the argument qwcols = c("R", "P").
Users should exercise caution when their water-quality data have multiple censoring limits and may want to recensor the data to a single censoring level. Censoring and recensoring issues are discussed in the text and Appendix 1 of Ryberg and others (2010).
None of the variations of SEAWAVE is a simple model. The model complexity increases with flow anomalies and with the addition of restricted cubic splines. As the number of parameters in the model increases or the degree of censoring increases, the sample size must also increase. See Ryberg and others (2020) for more details on sample size.
Aldo V. Vecchia and Karen R. Ryberg
Ryberg, K.R. and York, B.C., 2020, seawaveQ—An R package providing a model and utilities for analyzing trends in chemical concentrations in streams with a seasonal wave (seawave) and adjustment for streamflow (Q) and other ancillary variables: U.S. Geological Survey Open-File Report 2020–1082, 25 p., with 4 appendixes.
Ryberg, K.R., Vecchia, A.V., Martin, J.D., and Gilliom, R.J., 2010, Trends in pesticide concentrations in urban streams in the United States, 1992–2008: U.S. Geological Survey Scientific Investigations Report 2010–5139, 101 p., https://pubs.usgs.gov/sir/2010/5139/.
U.S. Geological Survey, 2018a, National Water Information System: Web Interface, accessed July 7, 2018, at https://waterdata.usgs.gov/nwis.
U.S. Geological Survey, 2018b, Parameter code definition: National Water Information System: Web Interface, accessed July 18, 2018, at https://nwis.waterdata.usgs.gov/usa/nwis/pmcodes.
Vecchia, A.V., Martin, J.D., and Gilliom, R.J., 2008, Modeling variability and trends in pesticide concentrations in streams: Journal of the American Water Resources Association, v. 44, no. 5, p. 1308–1324, https://dx.doi.org/10.1111/j.1752-1688.2008.00225.x.
The functions that fitswavecav
calls internally: prepData
and fitMod
.
data(swData) modMoRivOmaha <- combineData(qwdat = qwMoRivOmaha, cqwdat = cqwMoRivOmaha) myfitLinearTrend <- fitswavecav(cdat = modMoRivOmaha, cavdat = cqwMoRivOmaha, tanm = "myfitLinearTrend", pnames = c("04035", "04037", "04041"), yrstart = 1995, yrend = 2003, tndbeg = 1995, tndend = 2003, iwcav = c("flowa30", "flowa1"), dcol = "dates", qwcols = c("R", "P")) # trend model results myfitLinearTrend[[1]] # example regression call myfitLinearTrend[[2]][[1]] # first few lines of observed concentrations head(myfitLinearTrend[[3]]) # first few lines of predicted concentrations head(myfitLinearTrend[[4]]) # summary statistics for predicted concentrations myfitLinearTrend[[5]] # summary of trends myfitLinearTrend[[6]] myfitRCSTrend <- fitswavecav(cdat = modMoRivOmaha, cavdat = cqwMoRivOmaha, tanm = "myfitRCSTrend", pnames = c("04035", "04037", "04041"), yrstart = 1995, yrend = 2003, tndbeg = 1995, tndend = 2003, iwcav = c("flowa30", "flowa1"), dcol = "dates", qwcols = c("R", "P"), mclass = 2, numk = 4, bootRCS = FALSE, plotfile = FALSE, textfile = FALSE)
data(swData) modMoRivOmaha <- combineData(qwdat = qwMoRivOmaha, cqwdat = cqwMoRivOmaha) myfitLinearTrend <- fitswavecav(cdat = modMoRivOmaha, cavdat = cqwMoRivOmaha, tanm = "myfitLinearTrend", pnames = c("04035", "04037", "04041"), yrstart = 1995, yrend = 2003, tndbeg = 1995, tndend = 2003, iwcav = c("flowa30", "flowa1"), dcol = "dates", qwcols = c("R", "P")) # trend model results myfitLinearTrend[[1]] # example regression call myfitLinearTrend[[2]][[1]] # first few lines of observed concentrations head(myfitLinearTrend[[3]]) # first few lines of predicted concentrations head(myfitLinearTrend[[4]]) # summary statistics for predicted concentrations myfitLinearTrend[[5]] # summary of trends myfitLinearTrend[[6]] myfitRCSTrend <- fitswavecav(cdat = modMoRivOmaha, cavdat = cqwMoRivOmaha, tanm = "myfitRCSTrend", pnames = c("04035", "04037", "04041"), yrstart = 1995, yrend = 2003, tndbeg = 1995, tndend = 2003, iwcav = c("flowa30", "flowa1"), dcol = "dates", qwcols = c("R", "P"), mclass = 2, numk = 4, bootRCS = FALSE, plotfile = FALSE, textfile = FALSE)
Scatterplots of water-quality data for 05586100 Illinois River at
Valley City, Ill.
IllRivValleyCty
IllRivValleyCty
A data frame containing 168 water-quality samples for 4 constituents. There are 20 variables.
site | character | Site abbreviation for study |
staid | character | USGS Station identification number |
dates | date | Date water-quality sample collected |
yrc | numeric | Year |
moc | numeric | Month |
dac | numeric | Day |
jdayc | numeric | Julian day from first day of associated streamflow data used |
R04035 | character | Remark code (blank, _, <, or E) |
P04035 | numeric | Simazine, water, filtered, recoverable, micrograms per liter |
R04037 | character | Remark code (blank, _, <, or E) |
P04037 | numeric | Prometon, water, filtered, recoverable, micrograms per liter |
R82630 | character | Remark code (blank, _, <, or E) |
P82630 | numeric | Metribuzin, water, filtered, recoverable, micrograms per liter |
R82668 | character | Remark code (blank, _, <, or E) |
P82668 | numeric | EPTC, water, filtered (0.7 micron glass fiber filter), recoverable, micrograms per liter |
dflow | numeric | Streamflow, cubic feet per second |
flowa30 | numeric | 30-day streamflow anomaly |
flowa1 | numeric | 1-day streamflow anomaly |
dsed | numeric | Sediment concentration, milligrams per liter |
seda30 | numeric | 30-day sediment anomaly |
seda1 | numeric | 1-day sediment anomaly |
Chemical concentration data are in the columns that start with a P and are followed by a number. Qualification codes for the concentration data are in the columns that start with an R followed by the same numbers as the associated concentration data. For example, column P04035 indicates simazine data, 04035, being the U.S. Geological Survey parameter code for simazine. The qualification codes for the simazine concentrations are found in the column R04035, indicating a U.S. Geological Survey remark code. Remark codes include _ or nothing, indicating no qualification of the value in the associated concentration field; <, indicating a censored value that is less than the number reported in the associated concentration field; and E, indicating that the value has been estimated. See Oblinger Childress and others (1999) for information on the remark codes used by the U.S. Geological Survey. The streamflow and sediment anomalies were generated using the R package waterData (Ryberg and Vecchia, 2012).
Data provided by Patrick Phillips, U.S. Geological Survey, New York Water Science Center.
Oblinger Childress, C.J., Foreman, W.T., Connor, B.F., and Maloney, T.J., 1999, New reporting procedures based on long-term method detection levels and some considerations for interpretations of water-quality data provided by the U.S. Geological Survey Open-File Report 99–193, 19 p. [Also available at https://water.usgs.gov/owq/OFR_99-193/index.html.]
data(swData) # summary of water-quality concentrations apply(IllRivValleyCty[, grep("P[[:digit:]]", dimnames(IllRivValleyCty)[[2]])], 2, summary) # scatter plot of simazine concentrations cenScatPlot(IllRivValleyCty, pname = "04035") # Simazine scatter plot with many additional plotting arguments par(las = 1, tcl = 0.5) cenScatPlot(IllRivValleyCty, pname = "04035", site = "05586100 Illinois River at Valley City, Ill.", ylabel = "Simazine concentration, in micrograms per liter", legcex = 0.7, ylim = c(0, 0.4), yaxs = "i", xlim = c(as.Date("1996-01-01"), as.Date("2012-01-01")), xaxs = "i", xaxt = "n") axdates <- c("1996-01-01", "2000-01-01", "2004-01-01", "2008-01-01", "2012-01-01") axis(1, as.Date(axdates), labels = c("1996", "2000", "2004", "2008", "2012")) # Prometon scatter plot cenScatPlot(IllRivValleyCty, pname = "04037", site = "05586100 Illinois River at Valley City, Ill.", ylabel = "Prometon concentration, in micrograms per liter", legcex = 0.7, ylim = c(0, 0.06), yaxs = "i", xlim=c(as.Date("1996-01-01"), as.Date("2012-01-01")), xaxs = "i", xaxt = "n") axdates<-c("1996-01-01", "2000-01-01", "2004-01-01", "2008-01-01", "2012-01-01") axis(1, as.Date(axdates), labels = c("1996", "2000", "2004", "2008","2012")) # Metribuzin scatter plot cenScatPlot(IllRivValleyCty, pname="82630", site="05586100 Illinois River at Valley City, Ill.", ylabel="Metribuzin concentration, in micrograms per liter", legcex=0.7, ylim=c(0,0.3), yaxs="i", cex.lab=0.9, cex.axis=0.9, xlim=c(as.Date("1996-01-01"), as.Date("2012-01-01")), xaxs="i", xaxt="n") axdates<-c("1996-01-01", "2000-01-01", "2004-01-01", "2008-01-01", "2012-01-01") axis(1, as.Date(axdates), labels=c("1996", "2000", "2004", "2008", "2012"), cex.axis=0.9) # EPTC scatter plot cenScatPlot(IllRivValleyCty, pname = "82668", site = "05586100 Illinois River at Valley City, Ill.", ylabel = "EPTC concentration, in micrograms per liter", legcex = 0.7, ylim = c(0, 0.08), yaxs = "i", xlim=c(as.Date("1996-01-01"), as.Date("2012-01-01")), xaxs = "i", xaxt = "n") axdates <- c("1996-01-01", "2000-01-01", "2004-01-01", "2008-01-01", "2012-01-01") axis(1, as.Date(axdates), labels = c("1996", "2000", "2004", "2008","2012"))
data(swData) # summary of water-quality concentrations apply(IllRivValleyCty[, grep("P[[:digit:]]", dimnames(IllRivValleyCty)[[2]])], 2, summary) # scatter plot of simazine concentrations cenScatPlot(IllRivValleyCty, pname = "04035") # Simazine scatter plot with many additional plotting arguments par(las = 1, tcl = 0.5) cenScatPlot(IllRivValleyCty, pname = "04035", site = "05586100 Illinois River at Valley City, Ill.", ylabel = "Simazine concentration, in micrograms per liter", legcex = 0.7, ylim = c(0, 0.4), yaxs = "i", xlim = c(as.Date("1996-01-01"), as.Date("2012-01-01")), xaxs = "i", xaxt = "n") axdates <- c("1996-01-01", "2000-01-01", "2004-01-01", "2008-01-01", "2012-01-01") axis(1, as.Date(axdates), labels = c("1996", "2000", "2004", "2008", "2012")) # Prometon scatter plot cenScatPlot(IllRivValleyCty, pname = "04037", site = "05586100 Illinois River at Valley City, Ill.", ylabel = "Prometon concentration, in micrograms per liter", legcex = 0.7, ylim = c(0, 0.06), yaxs = "i", xlim=c(as.Date("1996-01-01"), as.Date("2012-01-01")), xaxs = "i", xaxt = "n") axdates<-c("1996-01-01", "2000-01-01", "2004-01-01", "2008-01-01", "2012-01-01") axis(1, as.Date(axdates), labels = c("1996", "2000", "2004", "2008","2012")) # Metribuzin scatter plot cenScatPlot(IllRivValleyCty, pname="82630", site="05586100 Illinois River at Valley City, Ill.", ylabel="Metribuzin concentration, in micrograms per liter", legcex=0.7, ylim=c(0,0.3), yaxs="i", cex.lab=0.9, cex.axis=0.9, xlim=c(as.Date("1996-01-01"), as.Date("2012-01-01")), xaxs="i", xaxt="n") axdates<-c("1996-01-01", "2000-01-01", "2004-01-01", "2008-01-01", "2012-01-01") axis(1, as.Date(axdates), labels=c("1996", "2000", "2004", "2008", "2012"), cex.axis=0.9) # EPTC scatter plot cenScatPlot(IllRivValleyCty, pname = "82668", site = "05586100 Illinois River at Valley City, Ill.", ylabel = "EPTC concentration, in micrograms per liter", legcex = 0.7, ylim = c(0, 0.08), yaxs = "i", xlim=c(as.Date("1996-01-01"), as.Date("2012-01-01")), xaxs = "i", xaxt = "n") axdates <- c("1996-01-01", "2000-01-01", "2004-01-01", "2008-01-01", "2012-01-01") axis(1, as.Date(axdates), labels = c("1996", "2000", "2004", "2008","2012"))
Function to calculate pesticide loads in kilograms per year and summarize trends.
loadCalculations( dailyDat, pestPredict, modRes, concTrends, yrtype = 1, alpha = 0.1 )
loadCalculations( dailyDat, pestPredict, modRes, concTrends, yrtype = 1, alpha = 0.1 )
dailyDat |
is the daily streamflow data in the form of a data frame with three columns representing a station ID, date, and streamflow. |
pestPredict |
is the continuous (daily) estimation of pesticide
concentrations for one or more pesticides at a single site. This should be
in the form of the fourth element of the list returned by |
modRes |
is the first element of the list returned by |
concTrends |
the SEAWAVE-Q trend in flow-normalized annual load. Cannot be different (computationally) from the trend in flow-normalized annual concentration when there is no trend in flow (Oelsner and others, 2017). |
yrtype |
allows one to calculate annual loads based on a calendar year or a water year, where a water year is the 12-month period October 1 through September 30 designated by the calendar year in which it ends. A yrtype of 1 represents a calendar year and is the default because that is the way the original model was developed. A yrtype of 2 represents a water year. |
alpha |
is the significance level or alpha value for statistical significance and confidence intervals. |
The first data frame returned has one row for each pesticide-year at
a particular site and four columns. The general format is as follows:
pstaid | character | the station identification number |
pcode | character | the parameter code for which a load was calculated |
year or wyear | numeric | the year or water year for which a load was calculated |
load | numeric | the load in kilograms per year |
The second data frame returned has one row for each pesticide at
a particular site and 11 columns. The general format is as follows:
pcode | character | the parameter code for which a load trend was calculated |
mclass | numeric | a value of 1 or 2 |
mclass | numeric | a value of 1 or 2 |
alpha | numeric | a significance level |
ltndPpor | numeric | the load trend in percent over the period of record |
luciPpor | numeric | the load upper confidence interval for the trend in |
percent over the period of record | ||
llciPpor | numeric | the load lower confidence interval for the trend in |
percent over the period of record | ||
baseLoad | numeric | the base load, the load for the first year of trend period |
ltndOrigPORPercentBase | numeric | the load trend in original units over |
the period of record | ||
(calculation based on percent per year and base load) | ||
luciOrigPORPercentBase | numeric | the load trend upper confidence interval |
for the trend in original units over the period of record | ||
(calculation based on percent per year and base load) | ||
llciOrigPORPercentBase | numeric | the load trend lower confidence interval |
for the trend in original units over the period of record | ||
(calculation based on percent per year and base load) | ||
ltndlklhd | numeric | is the load trend likelihood |
Parameter load (mass) is the product of water-quality concentration (a mass per volume) and an associated streamflow rate (volume per time). This function generates an annual time series of pesticide loads on either a calendar year basis or a water year basis and summarizes load trends.
Two data frames, the first contains the annual loads, the second contains the trend summary.
In this load calculation function, daily pesticide concentration
estimates provided by the fitswavecav
function are corrected for
retransformation bias (the concentration model is built on the base-10
logarithm of concentration; therefore, a bias correction is required when
transforming back to the original units) and then used to calculate daily
loads. The bias correction is based on the quasi-maximum likelihood estimator
(Cohn and others, 1989) that was developed for natural logarithms, with an
adjustment for the base-10 logarithm of the concentration. To calculate
loads, the bias-corrected concentration estimates (assumed to be in
micrograms per liter) are multiplied by daily streamflow and a constant,
0.892998605, which converts the load units
(micrograms per liter * cubic feet per second) to kilograms per year.
Daily loads are summed to annual values. See page 70 and equation 26 of
Oelsner and others (2017) for further details regarding the load
calculation and bias correction.
Users may modify this function to convert to units other than kilograms
per year.
Karen R. Ryberg
Cohn, T.A., DeLong, L.L., Gilroy, E.J., Hirsch, R.M., and Wells, D.K., 1989, Estimating constituent loads: Water Resources Research, v. 25, no. 5, p. 937–942.
Oelsner, G.P., Sprague, L.A., Murphy, J.C., Zuellig, R.E., Johnson, H.M., Ryberg, K.R., Falcone, J.A., Stets, E.G., Vecchia, A.V., Riskin, M.L., De Cicco, L.A., Mills, T.J., and Farmer, W.H., 2017, Water-quality trends in the Nation's rivers and streams, 1972–2012—Data preparation, statistical methods, and trend results (ver. 2.0, October 2017): U.S. Geological Survey Scientific Investigations Report 2017–5006, 136 p., https://doi.org/10.3133/sir20175006.
data(swData) modMoRivOmaha <- combineData(qwdat = qwMoRivOmaha, cqwdat = cqwMoRivOmaha) myfit1 <- fitswavecav(cdat = modMoRivOmaha, cavdat = cqwMoRivOmaha, tanm = "myfit1", pnames = c("04035", "04037", "04041"), yrstart = 1995, yrend = 2003, tndbeg = 1995, tndend = 2003, iwcav = c("flowa30", "flowa1"), dcol = "dates", qwcols = c("R", "P")) MoRivOmahaLoadsYr <- loadCalculations(cqwMoRivOmaha[, 1:3], myfit1[[4]], myfit1[[1]], myfit1[[6]]) MoRivOmahaLoadsYr
data(swData) modMoRivOmaha <- combineData(qwdat = qwMoRivOmaha, cqwdat = cqwMoRivOmaha) myfit1 <- fitswavecav(cdat = modMoRivOmaha, cavdat = cqwMoRivOmaha, tanm = "myfit1", pnames = c("04035", "04037", "04041"), yrstart = 1995, yrend = 2003, tndbeg = 1995, tndend = 2003, iwcav = c("flowa30", "flowa1"), dcol = "dates", qwcols = c("R", "P")) MoRivOmahaLoadsYr <- loadCalculations(cqwMoRivOmaha[, 1:3], myfit1[[4]], myfit1[[1]], myfit1[[6]]) MoRivOmahaLoadsYr
Internal function to summarize the trend results.
pesticideTrendCalcs( tndbeg, tndend, ctnd, pval, alpha, setnd, scl, baseConc, mclass )
pesticideTrendCalcs( tndbeg, tndend, ctnd, pval, alpha, setnd, scl, baseConc, mclass )
tndbeg |
is the beginning (in whole or decimal years) of the trend period. Zero means the begin date will be the beginning of the concentration data, cdat. |
tndend |
is the end of the trend (treated as December 31 of that year). Zero means the end date will be the end of the concentration data, cdat. |
ctnd |
is the concentration trend, the coefficient on the time variable. |
pval |
is the p-value for the linear trend component. |
alpha |
is the significance level or alpha value for statistical significance and confidence intervals. |
setnd |
is the standard error for the linear trend component. |
scl |
is the scale factor from the |
baseConc |
is the base concentration, the median concentration (midpoint of the trend line) for the first year of the trend analysis . |
mclass |
indicates the class of model to use. A class 1 model is the the traditional SEAWAVE-Q model that has a linear time trend. A class 2 model is a newer option for longer trend periods that uses a set of restricted cubic splines on the time variable to provide a more flexible model. |
The data frame returned has one row for each chemical analyzed
and the number of columns are defined as follows:
pname | character | parameter analyzed |
mclass | numeric | a value of 1 or 2 |
alpha | numeric | a significance level |
ctndPpor | numeric | the concentration trend in percent over the period of record |
cuciPpor | numeric | the concentration upper confidence interval for the trend in |
percent over the period of record | ||
clciPpor | numeric | the concentration lower confidence interval for the trend in |
percent over the period of record | ||
baseConc | numeric | the base concentration, median concentration or midpoint of |
trend line, for first year of trend period | ||
ctndOrigPORPercentBase | numeric | the concentration trend in original units over |
the period of record | ||
(calculation based on percent per year and base concentration) | ||
cuciOrigPORPercentBase | numeric | the concentration trend upper confidence interval |
for the trend in original units over the period of record | ||
(calculation based on percent per year and base concentration) | ||
clciOrigPORPercentBase | numeric | the concentration trend lower confidence interval |
for the trend in original units over the period of record | ||
(calculation based on percent per year and base concentration) | ||
ctndlklhd | numeric | is the concentration trend likelihood |
pesticideTrendCalcs is called from within fitswavecav
The data frame returned has one row for each chemical analyzed and summaries of the trend.
Based on trend calculations used to display and summarize pesticide trends here https://nawqatrends.wim.usgs.gov/swtrends/. A likelihood value that is the functional equivalent of the two-sided p-value associated with the significance level of the trend was determined as follows: Likelihood = (1 - (p-value / 2)), where p-value is the p-value for the trend coefficient (Oelsner and others, 2017).
Karen R. Ryberg
Oelsner, G.P., Sprague, L.A., Murphy, J.C., Zuellig, R.E., Johnson, H.M., Ryberg, K.R., Falcone, J.A., Stets, E.G., Vecchia, A.V., Riskin, M.L., De Cicco, L.A., Mills, T.J., and Farmer, W.H., 2017, Water-quality trends in the Nation's rivers and streams, 1972–2012—Data preparation, statistical methods, and trend results (ver. 2.0, October 2017): U.S. Geological Survey Scientific Investigations Report 2017–5006, 136 p., https://doi.org/10.3133/sir20175006.
prepData is usually called from within fitswavecav but can be invoked directly. It performs some date calculations, removes rows with missing values for concentration or continuous variables, and returns the concentration and continuous ancillary data to be used by fitswavecav and its other internal functions.
prepData(cdat, cavdat, yrstart, yrend, dcol, pnames, iwcav, qwcols)
prepData(cdat, cavdat, yrstart, yrend, dcol, pnames, iwcav, qwcols)
cdat |
is the concentration data. |
cavdat |
is the continuous (daily) ancillary data. |
yrstart |
is the starting year of the analysis (treated as January 1 of that year). Zero means the start date will be determined by the start date of cavdat, the continuous ancillary data. |
yrend |
is the ending year of the analysis (treated as December 31 of that year). Zero means the end date will be determined by the end date of cavdat, the continuous ancillary data. |
dcol |
is the column name for the dates, should be the same for both cdat and cavdat. |
pnames |
are the parameters (water-quality constituents) to analyze (if using USGS parameters, omit the starting 'P', such as "00945" for sulfate). |
iwcav |
is a character variable indicating which continuous ancillary variables to include, if none use iwcav=c("none"). |
qwcols |
is a character vector with the beginning of the column headers for remarks code (default is R), and beginning of column headers for concentration data (default is P for parameter). |
A list. The first element is the concentration data with additional date information, missing values removed, and extra columns removed. The second element is the continuous ancillary data with additional date information, missing values removed, and extra columns removed.
Aldo V. Vecchia and Karen R. Ryberg
data(swData) modMoRivOmaha<-combineData(qwdat=qwMoRivOmaha, cqwdat=cqwMoRivOmaha) preppedDat <- prepData(modMoRivOmaha, cqwMoRivOmaha, yrstart=1995, yrend=2003, dcol="dates", pnames=c("04035", "04037", "04041"), iwcav=c("flowa30","flowa1"), qwcols=c("R","P"))
data(swData) modMoRivOmaha<-combineData(qwdat=qwMoRivOmaha, cqwdat=cqwMoRivOmaha) preppedDat <- prepData(modMoRivOmaha, cqwMoRivOmaha, yrstart=1995, yrend=2003, dcol="dates", pnames=c("04035", "04037", "04041"), iwcav=c("flowa30","flowa1"), qwcols=c("R","P"))
Scatterplots of water-quality data for 06610000 Missouri River at Omaha, Nebr.
qwMoRivOmaha
qwMoRivOmaha
A data frame containing 115 water-quality samples for eight chemical constituents. There are 20 variables.
staid | character | USGS Station identification number |
dates | date | Date water-quality sample collected |
times | numeric | Time sample was collected |
R04035 | character | Remark code (blank, _, <, or E) |
P04035 | numeric | Simazine, water, filtered, recoverable, micrograms per liter |
R04037 | character | Remark code (blank, _, <, or E) |
P04037 | numeric | Prometon, water, filtered, recoverable, micrograms per liter |
R04041 | character | Remark code (blank, _, <, or E) |
P04041 | numeric | Cyanazine, water, filtered, recoverable, micrograms per liter |
R39415 | character | Remark code (blank, _, <, or E) |
P39415 | numeric | Metolachlor, water, filtered, recoverable, micrograms per liter |
R46342 | character | Remark code (blank, _, <, or E) |
P46342 | numeric | Alachlor, water, filtered, recoverable, micrograms per liter |
R82630 | character | Remark code (blank, _, <, or E) |
P82630 | numeric | Metribuzin, water, filtered, recoverable, micrograms per liter |
R82661 | character | Remark code (blank, _, <, or E) |
P82661 | numeric | Trifluralin, water, filtered (0.7 micron glass fiber filter), recoverable, micrograms per liter |
R82668 | character | Remark code (blank, _, <, or E) |
P82668 | numeric | EPTC, water, filtered (0.7 micron glass fiber filter), recoverable, micrograms per liter |
Chemical concentration data are in the columns that start with a P and are followed by a number. Qualification codes for the concentration data are in the columns that start with an R followed by the same numbers as the associated concentration data. For example, column P04035 indicates simazine data, 04035, being the U.S. Geological Survey parameter code for simazine. The qualification codes for the simazine concentrations are found in the column R04035, indicating a U.S. Geological Survey remark code. Remark codes include _ or nothing, indicating no qualification of the value in the associated concentration field; <, indicating a censored value that is less than the number reported in the associated concentration field; and E, indicating that the value has been estimated. See Oblinger Childress and others (1999) for information on the remark codes used by the U.S. Geological Survey.
Data provided by Patrick Phillips, U.S. Geological Survey, New York Water Science Center.
Oblinger Childress, C.J., Foreman, W.T., Connor, B.F., and Maloney, T.J., 1999, New reporting procedures based on long-term method detection levels and some considerations for interpretations of water-quality data provided by the U.S. Geological Survey Open-File Report 99–193, 19 p. [Also available at http://water.usgs.gov/owq/OFR_99-193/index.html.]
data(swData) # summary of water-quality concentrations apply(qwMoRivOmaha[, grep("P[[:digit:]]", dimnames(qwMoRivOmaha)[[2]])], 2, summary) # scatter plot of Simazine concentrations cenScatPlot(qwMoRivOmaha, pname = "04035") # scatter plot with many additional plotting arguments par(las = 1, tcl = 0.5) cenScatPlot(qwMoRivOmaha, pname = "04035", site = "06610000 Missouri River at Omaha, Nebr.", ylabel = "Simazine concentration, in micrograms per liter", legcex = 0.7, qwcols = c("R", "P"), ylim = c(0, 0.1), yaxs = "i", xlim = c(as.Date("1996-01-01"), as.Date("2004-01-01")), xaxs = "i", xaxt = "n") axdates <- c("1996-01-01", "1998-01-01", "2000-01-01", "2002-01-01", "2004-01-01") axis(1, as.Date(axdates), labels = c("1996", "1998", "2000", "2002", "2004")) # Prometon scatter plot cenScatPlot(qwMoRivOmaha, pname = "04037", site = "06610000 Missouri River at Omaha, Nebr.", ylabel = "Prometon concentration, in micrograms per liter", legcex = 0.7, qwcols = c("R", "P"), ylim = c(0, 0.1), yaxs = "i", xlim = c(as.Date("1996-01-01"), as.Date("2004-01-01")), xaxs = "i", xaxt = "n") axdates <- c("1996-01-01", "1998-01-01", "2000-01-01", "2002-01-01", "2004-01-01") axis(1, as.Date(axdates), labels = c("1996", "1998", "2000", "2002", "2004")) # Cyanazine scatter plot cenScatPlot(qwMoRivOmaha, pname = "04041", site = "06610000 Missouri River at Omaha, Nebr.", ylabel = "Cyanazine concentration, in micrograms per liter", legcex = 0.7, qwcols = c("R", "P"), ylim = c(0.001, 5), yaxs = "i", xlim = c(as.Date("1996-01-01"), as.Date("2004-01-01")), xaxs = "i", xaxt = "n", log = "y") axdates <- c("1996-01-01", "1998-01-01", "2000-01-01", "2002-01-01", "2004-01-01") axis(1, as.Date(axdates), labels = c("1996", "1998", "2000", "2002", "2004")) # Metolachlor scatter plot cenScatPlot(qwMoRivOmaha, pname = "39415", site = "06610000 Missouri River at Omaha, Nebr.", ylabel = "Metolachlor concentration, in micrograms per liter", legcex = 0.7, qwcols = c("R", "P"), ylim = c(0.001,5), yaxs = "i", xlim = c(as.Date("1996-01-01"), as.Date("2004-01-01")), xaxs = "i", xaxt = "n", log = "y", legpos = "bottomleft") axdates <- c("1996-01-01", "1998-01-01", "2000-01-01", "2002-01-01", "2004-01-01") axis(1, as.Date(axdates), labels = c("1996", "1998", "2000", "2002", "2004")) # Alachlor scatter plot cenScatPlot(qwMoRivOmaha, pname = "46342", site = "06610000 Missouri River at Omaha, Nebr.", ylabel = "Alachlor concentration, in micrograms per liter", legcex = 0.7, qwcols = c("R", "P"), ylim = c(0, 0.1), yaxs = "i", xlim = c(as.Date("1996-01-01"), as.Date("2004-01-01")), xaxs = "i", xaxt = "n") axdates <- c("1996-01-01", "1998-01-01", "2000-01-01", "2002-01-01", "2004-01-01") axis(1, as.Date(axdates), labels = c("1996", "1998", "2000", "2002", "2004")) # Metribuzin scatter plot cenScatPlot(qwMoRivOmaha, pname = "82630", site = "06610000 Missouri River at Omaha, Nebr.", ylabel = "Metribuzin concentration, in micrograms per liter", legcex = 0.7, qwcols = c("R", "P"), ylim = c(0, 0.1), yaxs = "i", xlim = c(as.Date("1996-01-01"), as.Date("2004-01-01")), xaxs = "i", xaxt = "n") axdates <- c("1996-01-01", "1998-01-01", "2000-01-01", "2002-01-01", "2004-01-01") axis(1, as.Date(axdates), labels = c("1996", "1998", "2000", "2002", "2004")) # Trifluralin scatter plot cenScatPlot(qwMoRivOmaha, pname = "82661", site = "06610000 Missouri River at Omaha, Nebr.", ylabel = "Trifluralin concentration, in micrograms per liter", legcex = 0.7, qwcols = c("R", "P"), ylim = c(0, 0.03), yaxs = "i", xlim = c(as.Date("1996-01-01"), as.Date("2004-01-01")), xaxs = "i", xaxt = "n") axdates<-c("1996-01-01", "1998-01-01", "2000-01-01", "2002-01-01", "2004-01-01") axis(1, as.Date(axdates), labels = c("1996", "1998", "2000", "2002", "2004")) # EPTC scatter plot cenScatPlot(qwMoRivOmaha, pname = "82668", site = "06610000 Missouri River at Omaha, Nebr.", ylabel = "EPTC concentration, in micrograms per liter", legcex = 0.7, qwcols = c("R", "P"), ylim = c(0.001, 1), yaxs = "i", xlim = c(as.Date("1996-01-01"), as.Date("2004-01-01")), xaxs = "i", xaxt = "n", log = "y") axdates <- c("1996-01-01", "1998-01-01", "2000-01-01", "2002-01-01", "2004-01-01") axis(1, as.Date(axdates), labels =c ("1996", "1998", "2000", "2002", "2004"))
data(swData) # summary of water-quality concentrations apply(qwMoRivOmaha[, grep("P[[:digit:]]", dimnames(qwMoRivOmaha)[[2]])], 2, summary) # scatter plot of Simazine concentrations cenScatPlot(qwMoRivOmaha, pname = "04035") # scatter plot with many additional plotting arguments par(las = 1, tcl = 0.5) cenScatPlot(qwMoRivOmaha, pname = "04035", site = "06610000 Missouri River at Omaha, Nebr.", ylabel = "Simazine concentration, in micrograms per liter", legcex = 0.7, qwcols = c("R", "P"), ylim = c(0, 0.1), yaxs = "i", xlim = c(as.Date("1996-01-01"), as.Date("2004-01-01")), xaxs = "i", xaxt = "n") axdates <- c("1996-01-01", "1998-01-01", "2000-01-01", "2002-01-01", "2004-01-01") axis(1, as.Date(axdates), labels = c("1996", "1998", "2000", "2002", "2004")) # Prometon scatter plot cenScatPlot(qwMoRivOmaha, pname = "04037", site = "06610000 Missouri River at Omaha, Nebr.", ylabel = "Prometon concentration, in micrograms per liter", legcex = 0.7, qwcols = c("R", "P"), ylim = c(0, 0.1), yaxs = "i", xlim = c(as.Date("1996-01-01"), as.Date("2004-01-01")), xaxs = "i", xaxt = "n") axdates <- c("1996-01-01", "1998-01-01", "2000-01-01", "2002-01-01", "2004-01-01") axis(1, as.Date(axdates), labels = c("1996", "1998", "2000", "2002", "2004")) # Cyanazine scatter plot cenScatPlot(qwMoRivOmaha, pname = "04041", site = "06610000 Missouri River at Omaha, Nebr.", ylabel = "Cyanazine concentration, in micrograms per liter", legcex = 0.7, qwcols = c("R", "P"), ylim = c(0.001, 5), yaxs = "i", xlim = c(as.Date("1996-01-01"), as.Date("2004-01-01")), xaxs = "i", xaxt = "n", log = "y") axdates <- c("1996-01-01", "1998-01-01", "2000-01-01", "2002-01-01", "2004-01-01") axis(1, as.Date(axdates), labels = c("1996", "1998", "2000", "2002", "2004")) # Metolachlor scatter plot cenScatPlot(qwMoRivOmaha, pname = "39415", site = "06610000 Missouri River at Omaha, Nebr.", ylabel = "Metolachlor concentration, in micrograms per liter", legcex = 0.7, qwcols = c("R", "P"), ylim = c(0.001,5), yaxs = "i", xlim = c(as.Date("1996-01-01"), as.Date("2004-01-01")), xaxs = "i", xaxt = "n", log = "y", legpos = "bottomleft") axdates <- c("1996-01-01", "1998-01-01", "2000-01-01", "2002-01-01", "2004-01-01") axis(1, as.Date(axdates), labels = c("1996", "1998", "2000", "2002", "2004")) # Alachlor scatter plot cenScatPlot(qwMoRivOmaha, pname = "46342", site = "06610000 Missouri River at Omaha, Nebr.", ylabel = "Alachlor concentration, in micrograms per liter", legcex = 0.7, qwcols = c("R", "P"), ylim = c(0, 0.1), yaxs = "i", xlim = c(as.Date("1996-01-01"), as.Date("2004-01-01")), xaxs = "i", xaxt = "n") axdates <- c("1996-01-01", "1998-01-01", "2000-01-01", "2002-01-01", "2004-01-01") axis(1, as.Date(axdates), labels = c("1996", "1998", "2000", "2002", "2004")) # Metribuzin scatter plot cenScatPlot(qwMoRivOmaha, pname = "82630", site = "06610000 Missouri River at Omaha, Nebr.", ylabel = "Metribuzin concentration, in micrograms per liter", legcex = 0.7, qwcols = c("R", "P"), ylim = c(0, 0.1), yaxs = "i", xlim = c(as.Date("1996-01-01"), as.Date("2004-01-01")), xaxs = "i", xaxt = "n") axdates <- c("1996-01-01", "1998-01-01", "2000-01-01", "2002-01-01", "2004-01-01") axis(1, as.Date(axdates), labels = c("1996", "1998", "2000", "2002", "2004")) # Trifluralin scatter plot cenScatPlot(qwMoRivOmaha, pname = "82661", site = "06610000 Missouri River at Omaha, Nebr.", ylabel = "Trifluralin concentration, in micrograms per liter", legcex = 0.7, qwcols = c("R", "P"), ylim = c(0, 0.03), yaxs = "i", xlim = c(as.Date("1996-01-01"), as.Date("2004-01-01")), xaxs = "i", xaxt = "n") axdates<-c("1996-01-01", "1998-01-01", "2000-01-01", "2002-01-01", "2004-01-01") axis(1, as.Date(axdates), labels = c("1996", "1998", "2000", "2002", "2004")) # EPTC scatter plot cenScatPlot(qwMoRivOmaha, pname = "82668", site = "06610000 Missouri River at Omaha, Nebr.", ylabel = "EPTC concentration, in micrograms per liter", legcex = 0.7, qwcols = c("R", "P"), ylim = c(0.001, 1), yaxs = "i", xlim = c(as.Date("1996-01-01"), as.Date("2004-01-01")), xaxs = "i", xaxt = "n", log = "y") axdates <- c("1996-01-01", "1998-01-01", "2000-01-01", "2002-01-01", "2004-01-01") axis(1, as.Date(axdates), labels =c ("1996", "1998", "2000", "2002", "2004"))
seawaveQPlots is usually called from within fitMod
but
can be invoked directly. It generates plots of data and model results,
as well as diagnostic plots, and returns the observed and predicted
concentrations so that users may plot the concentrations using
their own functions.
seawaveQPlots( stpars, cmaxt, tseas, tseaspr, tndlin, tndlinpr, cdatsub, cavdat, cavmat, clog, centmp, yrstart, yrend, tyr, tyrpr, pnames, tanm, mclass = 1, plotfile = FALSE )
seawaveQPlots( stpars, cmaxt, tseas, tseaspr, tndlin, tndlinpr, cdatsub, cavdat, cavmat, clog, centmp, yrstart, yrend, tyr, tyrpr, pnames, tanm, mclass = 1, plotfile = FALSE )
stpars |
is a matrix of information about the best seawaveQ model
for the concentration data, see |
cmaxt |
is the decimal season of maximum chemical concentration. |
tseas |
is the decimal season of each concentration value in cdatsub. |
tseaspr |
is the decimal season date used to model concentration using the continuous data set cavdat. |
tndlin |
is the decimal time centered on the midpoint of the trend for the sample data, cdatasub. |
tndlinpr |
is the decimal time centered on the midpoint of the trend for the continuous data, cavdat. |
cdatsub |
is the concentration data. |
cavdat |
is the continuous (daily) ancillary data. |
cavmat |
is a matrix containing the continuous ancillary variables. |
clog |
is a vector of the base-10 logarithms of the concentration data. |
centmp |
is a logical vector indicating which concentration values are censored. |
yrstart |
is the starting year of the analysis (treated as January 1 of that year). |
yrend |
is the ending year of the analysis (treated as December 31 of that year). |
tyr |
is a vector of decimal dates for the concentration data. |
tyrpr |
is a vector of decimal dates for the continuous ancillary variables. |
pnames |
is the parameter (water-quality constituents) to analyze (if using USGS parameters, omit the starting 'P', such as "00945" for sulfate). |
tanm |
is a character identifier that names the trend analysis run. It is used to label output files. |
mclass |
has not been implemented yet, but will provide additional model options. |
plotfile |
is by default FALSE. True will write pdf files of plots to the user's file system. |
A PDF file containing plots of the data and modeled concentrations and regression diagnostic plots and a list containing the observed concentrations (censored and uncensored) and the predicted concentrations used for the plot.
The plotting position used for representing censored values in
the plots produced by seawaveQPlots
is an important
consideration for interpreting model fit. Plotting values obtained by
using the censoring limit, or something smaller such as one-half of the
censoring limit, produce plots that are difficult to interpret if there
are a large number of censored values. Therefore, to make the plots
more representative of diagnostic plots used for standard
(non-censored) regression, a method for substituting randomized
residuals in place of censored residuals was used. If a
log-transformed concentration is censored at a particular limit,
logC < L
, then the residual for that concentration is censored
as well, logC - fitted(logC) < L - fitted(logC) = rescen
. In
that case, a randomized residual was generated from a conditional
normal distribution resran <- scl * qnorm(runif(1) * pnorm(rescen / scl))
,
where scl
is the scale parameter from the survival regression model,
pnorm
is the R function for computing cumulative normal
probabilities, runif
is the R function for generating a
random variable from the uniform distribution, and qnorm
is the R function for computing quantiles of the normal distribution.
Under the assumption that the model residuals are uncorrelated,
normally distributed random variables with mean zero and standard
deviation scl
, the randomized residuals generated in this manner are an
unbiased sample of the true (but unknown) residuals for the censored
data. This is an application of the probability integral transform
(Mood and others, 1974) to generate random variables from continuous
distributions. The plotting position used a censored concentration is
fitted(logC) + resran
. Note that each time a new model fit is
performed, a new set of randomized residuals is generated and thus the
plotting positions for censored values can change.
Aldo V. Vecchia and Karen R. Ryberg
Mood, A.M., Graybill, F.A., and Boes, D.C., 1974, Introduction to the theory of statistics (3rd ed.): New York, McGraw-Hill, Inc., 564 p.
data(swData) myPlots <- seawaveQPlots(stpars=examplestpars, cmaxt=0.4808743, tseas=exampletseas, tseaspr=exampletseaspr, tndlin=exampletndlin, tndlinpr=exampletndlinpr, cdatsub=examplecdatsub, cavdat=examplecavdat, cavmat=examplecavmat, clog=exampleclog, centmp=examplecentmp, yrstart=1995, yrend=2003, tyr=exampletyr, tyrpr=exampletyrpr, pnames=c("04041"), tanm="examplePlots04041", plotfile = FALSE)
data(swData) myPlots <- seawaveQPlots(stpars=examplestpars, cmaxt=0.4808743, tseas=exampletseas, tseaspr=exampletseaspr, tndlin=exampletndlin, tndlinpr=exampletndlinpr, cdatsub=examplecdatsub, cavdat=examplecavdat, cavmat=examplecavmat, clog=exampleclog, centmp=examplecentmp, yrstart=1995, yrend=2003, tyr=exampletyr, tyrpr=exampletyrpr, pnames=c("04041"), tanm="examplePlots04041", plotfile = FALSE)
seawaveQPlots2 is usually called from within fitMod
but
can be invoked directly. It generates plots of data and model results,
as well as diagnostic plots, and returns the observed and predicted
concentrations so that users may plot the concentrations using
their own functions. This is the version for models that use
restricted cubic splines.
seawaveQPlots2( stpars, cmaxt, tseas, tseaspr, tndrcs, tndrcspr, cdatsub, cavdat, cavmat, clog, centmp, yrstart, yrend, tyr, tyrpr, pnames, tanm, mclass = 2, numk, plotfile = FALSE )
seawaveQPlots2( stpars, cmaxt, tseas, tseaspr, tndrcs, tndrcspr, cdatsub, cavdat, cavmat, clog, centmp, yrstart, yrend, tyr, tyrpr, pnames, tanm, mclass = 2, numk, plotfile = FALSE )
stpars |
is a matrix of information about the best seawaveQ model
for the concentration data, see |
cmaxt |
is the decimal season of maximum chemical concentration. |
tseas |
is the decimal season of each concentration value in cdatsub. |
tseaspr |
is the decimal season date used to model concentration using the continuous data set cavdat. |
tndrcs |
is the decimal time centered on the midpoint of the trend for the sample data, cdatasub, then converted to a linear tail-restricted cubic spline with a particular number of knots (Harrell, 2010, 2018). |
tndrcspr |
is the decimal time centered on the midpoint of the trend for the continuous data, cavdat, then converted to a linear tail-restricted cubic spline using the knots from tndrcs. |
cdatsub |
is the concentration data. |
cavdat |
is the continuous (daily) ancillary data. |
cavmat |
is a matrix containing the continuous ancillary variables. |
clog |
is a vector of the base-10 logarithms of the concentration data. |
centmp |
is a logical vector indicating which concentration values are censored. |
yrstart |
is the starting year of the analysis (treated as January 1 of that year). |
yrend |
is the ending year of the analysis (treated as December 31 of that year). |
tyr |
is a vector of decimal dates for the concentration data. |
tyrpr |
is a vector of decimal dates for the continuous ancillary variables. |
pnames |
is the parameter (water-quality constituents) to analyze (if using USGS parameters, omit the starting 'P', such as "00945" for sulfate). |
tanm |
is a character identifier that names the trend analysis run. It is used to label output files. |
mclass |
indicates the class of model to use. A class 1 model is the traditional SEAWAVE-Q model that has a linear time trend. A class 2 model is a newer option for longer trend periods that uses a set of restricted cubic splines on the time variable to provide a more flexible model. |
numk |
is the number of knots in the restricted cubic spline model. The default is 4, and the recommended number is 3–7. |
plotfile |
is by default FALSE. True will write pdf files of plots to the user's file system. |
A PDF file (if plotfile is TRUE) containing plots of the data and modeled concentrations and regression diagnostic plots and a list containing the observed concentrations (censored and uncensored) and the predicted concentrations used for the plot.
The plotting position used for representing censored values in
the plots produced by seawaveQPlots2
is an important
consideration for interpreting model fit. Plotting values obtained by
using the censoring limit, or something smaller such as one-half of the
censoring limit, produce plots that are difficult to interpret if there
are a large number of censored values. Therefore, to make the plots
more representative of diagnostic plots used for standard
(non-censored) regression, a method for substituting randomized
residuals in place of censored residuals was used. If a
log-transformed concentration is censored at a particular limit,
logC < L
, then the residual for that concentration is censored
as well, logC - fitted(logC) < L - fitted(logC) = rescen
. In
that case, a randomized residual was generated from a conditional
normal distribution resran <- scl * qnorm(runif(1) * pnorm(rescen / scl))
,
where scl
is the scale parameter from the survival regression model,
pnorm
is the R function for computing cumulative normal
probabilities, runif
is the R function for generating a
random variable from the uniform distribution, and qnorm
is the R function for computing quantiles of the normal distribution.
Under the assumption that the model residuals are uncorrelated,
normally distributed random variables with mean zero and standard
deviation scl
, the randomized residuals generated in this manner are an
unbiased sample of the true (but unknown) residuals for the censored
data. This is an application of the probability integral transform
(Mood and others, 1974) to generate random variables from continuous
distributions. The plotting position used for a censored concentration is
fitted(logC) + resran
. Note that each time a new model fit is
performed, a new set of randomized residuals is generated and thus the
plotting positions for censored values can change.
Aldo V. Vecchia and Karen R. Ryberg
Harrell, F.E., Jr., 2010, Regression modeling strategies—With applications to linear models, logistic regression, and survival analysis: New York, Springer-Verlag, 568 p.
Harrell, F.E., Jr., 2018, rms—Regression modeling strategies: R package version 5.1-2, https://CRAN.R-project.org/package=rms.
Mood, A.M., Graybill, F.A., and Boes, D.C., 1974, Introduction to the theory of statistics (3d ed.): New York, McGraw-Hill, Inc., 564 p.